Ad hoc cache hierarchies make chips more efficient

  

The researchers tested their system on a simulation of a chip with 36 cores. They claim that, compared to its best-performing predecessors, the system increased processing speed by 20 to 30% while reducing energy consumption by 30 to 85%.

According to the researchers, each core in a multicore chip usually has two levels of private cache. All the cores share a third cache, which is broken up into discrete memory banks scattered around the chip. Some new chips also include a DRAM cache, which is etched into a second chip that is mounted on top of the first.

For a given core, accessing the nearest memory bank of the shared cache is more efficient than accessing more distant cores. Unlike today's cache management systems, Jenga is said to distinguish between the physical locations of the separate memory banks that make up the shared cache.

For each core, the researcht team claims Jenga knows how long it would take to retrieve information from any on-chip memory bank. Jenga can therefore evaluate the trade off between latency and space for two layers of cache simultaneously.

Adopting the computational short cut enables Jenga to update its memory allocations every 100ms, to accommodate changes in programs' memory-access patterns.

If multiple cores are retrieving data from the same DRAM cache, this can cause bottlenecks that introduce new latencies. So after Jenga has come up with a set of cache assignments, cores don't simply ‘dump’ all their data into the nearest available memory bank.

Instead, Jenga parcels out the data a little at a time, then estimates the effect on bandwidth consumption and latency. Thus, within the 100ms intervals between chip-wide cache re-allocations, Jenga adjusts the priorities that each core gives to the memory banks allocated to it.