speedups and parallelism in programs
there are many approaches to increasing the number of instructions
microprocessors can do per clock tick:
- pipelining (described already)
- multiple pipelines = superscalar = multiple issue =
multiple execution units
- one "thread" of code
- single instruction stream
- programmer doesn't need to worry about or even know about this existence
of multiple pipes
- exploit instruction level parallelism
- pentium is a crippled dual issue design, crippled by unequal pipelines,
due to need for downards compatibility.
- 2nd flr. DEC alpha = chaos@eecg.toronto.edu has 4 equal pipelines
(4 execution units of depth 5)
- 4th flr. DEC alpha = eyetap.org has 6 equal pipelines
(6 execution units of depth 5)
- microprocessors with multiple execution units, can actually execute
more than one instruction per clock cycle, e.g. DEC alpha does up to
6 instructions per clock cycle. therefore, although it's only a
600MHz alpha, it's doing 3.6 gigahertz effectively.
(e.g. like a P1800 only better because of other architectural
factors)
- multiple microprocessors
- exploit loop level parallelism
- multiple "threads"
- forking
- there are many examples of loops that don't have data dependency
- useful when there is no data dependencies between instructions
- example: inner product (can give a copy of array to many processes)
- beowulf: can do parallelism in code
- SMP can also exploit loop level parallelism
problem: instruction fetching is always trying to fetch instructions, so there
is bus contention.
solution: introduce a prefetch queue
64bit x86
it's somewhere between CISC and VLIW
127 64bit registers
instructions are 256 bits
HP+intel joint venture
gnux (gnu linux) is currently the only operating system that will run on it
native
nobody could get win64 to run on it.