The idea is to have thousands of virtual processors on each chip. Each virtual processor processes simultaneously, has access to all memory and physical processors like multipliers, is programmed normally, and consists of 512 bits containing the values that would be in a small microprocessor's most commonly used registers. The virtual processors travel to and use processor parts and blocks of memory.
This idea should revolutionize computing. The idea is to use many small (thousands on a chip) simple pipelined microprocessors (each with a block of memory) and keep the most commony used registers (512 bits) on a many latch bus that interconnects them. Not all the physical processors are identical. The interconnecting bus runs over the processors on the chip. This allows for much better parallel processing than the unmodified Von Neumann architecture. Each 512 bit state represents a simulated virtual microprocessor that travels from physical processor to physical processor. The less commonly used registers remain in the physical pipelined processors.
The modified von Neumann architecture (also known as the 'registers on bus' architecture) is a parallel processor architecture that is slightly different from the normal von Neumann architecture to allow for easier parallel processing.
The only difference between a regular von Neumann architecture processor and a modified von Neumann architecture processor is that some registers are read from a bus and the updated register values are then written to that bus. How could such a small change greatly improve the parallel processing capabilities of the von Neumann architecture? That is what this writing will try to explain and demonstrate.
The object of the architecture is to allow a thousand processors (with memory) on each chip, simplicity, special purpose processors and circuitry, easy interprocessor communication, and easy normal programming. This is accomplished by using virtual processors that move among the physical processors, using the physical processors as needed. Programs are written for the virtual processors rather than physical processors.
Some parallel processing designs with the normal von Neumann architecture have multiple processors using shared memory. The modified von Neumann architecture has multiple virtual processors using shared memory and shared physical processors. Each von Neumann processor can be broken up into a number of smaller, simpler processors (for example, a multiplier) that can be used simultaneously. Nevertheless, a modified von Neumann architecture processor is programmed almost exactly the same way as a normal von Neumann architecture processor. The physical processors, often including small blocks of memory, are small, allowing thousands of processors on a chip with roughly half of the chip being used for memory. A program is written for a virtual processor using many small simple physical processors almost exactly as if there were large physical processors as in the normal von Neumann architecture. There can be thousands of virtual processors on a chip being processed and processing data simultaneously.
The architecture is comprised of a many branched bus with processors at the leaves. A processor might be a multipier (with a little extra logic and a little memory). States (sets of register values representing virtual processors) travel along the bus to the processors and use them as needed. This contrasts with a regular von Neumann architecture processor that stores the state in registers and routes data from registers to circuits (like a multiplier) and from the circuit back to a register or over the data bus from a register to a latch in memory or from a latch in memory to a register.
This architecture solves the two main problems with many-processor Von Neumann processors, programming and communication.
Consider a processor that just has a multiplier and a little memory and logic. The memory and logic would allow you to send a fetch (instruction) instruction to the processor. When the fetch was done a multiply would also be done. Of course, instead of a multiplier, one could have an adder or programmable logic array (PLA). If it was just an adder, there would be room for other circuits and/or more memory. If there were also boolean operator circuits (like AND, OR, and EOR), then an opcode would have to specify what is done.
One can put an array of processing circuits (like multipliers, adders, memory, test, AND, OR, and PLA (programmable logic array)) on a chip and interconnect them with a bus. For serial programs/algorithms, the best way to interconnect them is with the von Neumann architecture where data flows over a data bus, one number at a time, between them. However, for parallel programs/algorithms, the best way to interconnect them is with the registers on bus architecture where data packets flow between them on a wide, for instance, binary tree bus (with latches on each branch in each direction) where the data packets hold the state of a simulated microprocessor (the register values). There can be about as many data packets on the bus as processors. It can be programmed almost the same way as a von Neumann architecture processor. The processors are pipelined so there can be more than one data packet (state) (simulated microprocessor) in each processor at a time. In some ways, it's similar to programmable logic chips, but programmed like a von Neumann processor.
Following is a very simple example of the registers on bus architecture or modified von Neumann architecture. It shows how the new architecture uses the exact same circuits and exact same registers and executes the exact same machine code with the exact same results as the von Neumann architecture in perhaps twice the time for the case of a serial program/algorithm. It also makes clear that it is faster for parallel algorithms. However, I'm not sure that the main advantage of the von Neumann architecture, dense memory, can justify using it at all in the long run. The advantage of everything, especially software, already developed certainly justifies it in the short run, though, with this being a coprocessor.
Just as the one instruction processor in the book 'How Computers Work' is a very simple example of a von Neumann architecture computer, so this one instruction computer is a very simple example of a registers on bus architecture computer.