Yesterday, AMD held a press presentation in Munich, Germany to update journalists about its upcoming K10 processor. AMD's Giuseppe Amato, Technical Director Sales and Marketing EMEA, had a few minutes to talk about the architecture at length.
The integrated memory controller (IMC) will get a few new features in the K10 core. When utilizing multiple memory modules, along with proper BIOS implementation and mainboard routing, the IMC can access memory in 64-bit channels (72-bit if you use ECC). This way it is possible to read and write data simultaneously, or improve efficiency for irregular access patterns which increasingly occur in a quad-core environment. This feature is available on AM2+ and F+ boards; on "old“ socket AM2 and F boards the usual 128-bit dual-channel mode is available.
Due to split power planes, the IMC can be clocked down independently of the CPU cores, along with reduced voltage. This also enables CPU overclocking without touching the memory frequency, something that may appeal to enthusiasts. These features are again dependent on Socket AM2+ and F+ platforms.
Amato explained how the quad-core design benefits from the internal crossbar switch the backbone of communication inside the K10 CPU. With Intel's current quad-core design there are cases where data needs to travel over the FSB -- in AMDs case all inter-CPU communication takes place on-die.
The crossbar switch of the K10 core is already prepared for up to 8 cores, Amato boasted. Amato wouldn't give even a vague timeframe for market availability of such a CPU, though he indicated the company is prepared for whatever the market demands. Amato made clear that octo-core is far away in the future – Shanghai will not get 8 cores.
K10 will introduce a shared L3 cache while the individual cores have dedicated L1 and L2 caches. As long as requested data lies in L1, it can be directly loaded. This also works if the data lies in the L1 cache of another core, in which case the communication works via the crossbar switch. In case requested data resides in the L2 cache, it will be loaded to L1 and then invalidated in L2 as AMD has an exclusive cache design. The L3 Cache, however, is not exclusive, but allows for a shared bit to be set. If a core loads data marked as shared, it will reside in the L3 cache and can be fetched by other cores as well.
Amato also mentioned an array of power saving measures which, in sum, allow AMD to deliver a quad-core CPU in the same thermal envelope as today’s dual-core CPUs.
K10 adds the capability of independently clocking all the CPU cores. In current K8 processors (and Intel's Core 2 generation), all cores are clocked at the same level all the time -- the P-state can only be changed synchronously. In case of a compute-intensive single-threaded process, all cores must run on the highest level P-state. On K10-based CPUs, the idle cores could be switched to the lowest P-state, while others are in different states, depending on load.
AMD, K10, Processor