Consider the case where there is a state (set of register values) in latch 1. Each cycle it will be clocked to the next latch and some part of doing the step will be done. Each time around the whole circuit, one step of an instruction will be done. In four times around the circuit, an instruction will be done.
It might seem that an instruction will take a long time, but because so little is done in each stage, a stage doen't take many picoseconds (a picosecond is a trillionth of a second (a thousandth of a billioth of a second)). Nevertheless, more speed is always desired. Therefore, to get more computation done per second, an instruction can have the 'C' bit have value 1 instead of the normal 0. At some early latch through the processing of the state in step 4 (fetch (copy from memory to the state) the next instruction), this 'C' bit of 1 will be detected and another state created. 'C' stands for 'Create.' The new state will be in step 4 as well and will be right behind the original state in the pipeline. The new state will simplly fetch a new instuction from memory to the state (register values).
The address it reads will be that of the next 256 bits of memory after the 256 bits fetched by the original instruction. All fetch addresses end in '000' because this is the address of a boundary between 256-bit (the instruction length) blocks of memory. At the address ending in '000' is a 32-bit word. The following address ending in '001' holds the next 32 bits of the instruction. This continues until the last address ending in '111' holds the last 32 bits of the 256-bit (8 times 32 equals 256) instruction. The original instruction's 'next instuction address must end in '0000' so that it is on a 512 bit boundary so that the new state's 'next instrction address ends in '1000.
Now there are two states going through the processor so that twice as much can be done per second. The new state can do some part of the calculation and leave the rusult in memory for the original state to use.
When the new state is no longer needed, the last instruction done by the new state can have the bit 'T' in the instuction have value 1. 'T' stands for 'Terminate.' At the end of step 3 of this instuction, the state will be deleted.
There are 256 loops in each latch to hold a state. However, there is also another loop that holds the 'valid state bit.' When the valid state bit is 1, the valid state bit indicates that there is a (valid) state in the latch. When the valid state bit is 0, the valid state bit indicates that there is NOT a (valid) state in the latch. The timing in the circuit is pretty simple. When a latch has a valid state bit with value 1 and the next latch (that it's trying to get to) has a valid state bit with value 0, then, during this cycle, the state is clocked from the first latch, through any following logic, to the second latch. The 'valid state bit' of the first latch is cleared to 0 to indicate that there is no longer a (valid) state in that latch. The second latch has a valid state bit with value 1 indicating that the second latch has a (valid) state in it. When a state is 'terminated,' the valid state bit of the state is cleared to 0, which indicates that there is no longer a valid state in the latch. The state is not processed to the next latch either.
There is a possible problem with all this. The first state might try to read the results calculated by the second state before the second state is done calculating them. To prevent this, there are flags. The instuction that creates the second state has bit 'C' (for 'Create') with value 1. It can also have bit FF1=0 and bit FF0=1. 'FF1' stands for 'Flag Function bit 1' and 'FF0' stands for 'Flag Function bit 0.' These two bits have the functions indicated in the following table.
FF1 FF0 Flag Function 0 0 Nothing 0 1 Clear Flag 1 0 Set Flag 1 1 Wait for Flag
There are 16 flags. Which flag is affected is determined by bits FL3, FL2, FL1, and FL0 as indicated in the following table.
FL3 FL2 FL1 FL0 Flag Affected 0 0 0 0 0 0 0 0 1 1 0 0 1 0 2 0 0 1 1 3 0 1 0 0 4 0 1 0 1 5 0 1 1 0 6 0 1 1 1 7 1 0 0 0 8 1 0 0 1 9 1 0 1 0 10 1 0 1 1 11 1 1 0 0 12 1 1 0 1 13 1 1 1 0 14 1 1 1 1 15
Each flag is a bit in the processor that is NOT part of the memory.
If the state creating instuction has C=1, FF1=0, FF0=1, FL3=0, FL2=0, FL1=0, and FL0=0, then a state is created and flag 0 is cleared to value 0.
When the first state is ready to read the results of the second state, then it should have an instruction with FF1=1, FF0=1, FL3=0, FL2=0, FL1=0, and FL0=0 so that it will wait for flag 0. That is, the state will check to see if flag 0 is set (to 1). If flag 0 is set (to 1), then it will simply continue processing and read the results of the second state from memory when it's ready. If flag 0 is 0, however, then it will be stored in a latch (not part of memory) and not come back out onto the bus until the flag is set (to 1).
When the second state has the data ready for the first state then the instruction should have FF1=1, FF0=0, FL3=0, FL2=0, FL1=0, and FL0=0 which will set flag 0 to value 1. Optionally, it can also have T=0 to terminate the second state.
This could all be done with software (instructions) and flag bits in memory, but it is done so often that it pays to have special hardware to implement this function.
Of course, the first state could create more than one state to help it and/or the second state could create a state to help it.