PORTLAND, Ore -- Today, direct-write cache memories are themainstay of microprocessors, because they lower memory latency in a mannertransparent to application programs. However, designers of advanced processorshave advocated a switch to software-managed scratchpads and message-passingtechniques for next-generation multi-core processors, such as the CellBroadband Engine Architecture developed by IBM, Toshiba and Sony, which is usedfor the PlayStation 3.
Unfortunately, software-managed scratchpads andmessage-passing techniques put an additional burden on application programmersand in that sense mark a step backwards in microprocessor evolution. NowSemiconductor Research Corp (SRC) claims to have solved the scaling problem fornext-generation processors with up to 512 cores, by using hierarchical hardwarecoherence that remains transparent to application programs as the naturalevolution of today's multi-level caches.
"Designers are worrying about latency for futuremulti-core microprocessors, advocating a move to software coherence usingscratchpad memories and message passing," said professor Dan Sorin at DukeUniversity, principle researcher on the project."But that would requirethe programmer to manage data movement, which is not the way the industryshould go."
Instead Sorin's SRC-funded study, performed in cooperationwith professor Milo Martin from the University of Pennsylvania and professorMark Hill from the University of Wisconsin, proposes a hierarchical hardwarecoherence technique, that the researchers claim scales as the square root ofthe number of cores, adding as little as 2% latency for processors with as manyas 512 cores. Likewise, traffic, storage and energy consumption all grow veryslowly as cores are added, allowing future processors to continue usingdirect-write caches with hardware coherence that is transparent to applicationprograms.
"These results will change the direction of computerarchitecture, by assuring designers that cache coherence will not hit thewall," said David Yeh director of integrated circuit and systems sciencesat SRC (Research Triangle, NC) "We now know there are ways around thewall. Designers can stop worrying. All the right techniques are available today-- you don't need new tricks to be invented, but just need to wisely using thetechnologies that are already available."
In particular, current direct-write hardware coherenceschemes can be evolved to keep traffic, storage, latency, and energy undercontrol as processors scale to more and more cores by using a synergisticcombination of shared caches augmented with hierarchical directories andexplicit cache eviction notifications. Thus, according to SRC, the roadmap tofuture massively parallel multi-core processors is clear and unobstructed.Details will be shared in an upcoming issue of the Transactions of theAssociation of Computing Machinery (ACM).
Single level flat-directory caches (blue) incur unacceptablelatency when scaling past 32 cores, but two- (red) and three-level (green)caches with hierarchical directories can scale to 512 cores with only two- tofour percent latency.
This story was originally posted by EETimes.