Intel Demos 8-Core, 528-Thread PUMA Chip with 1 TB/s Silicon Photonics

fry · 2 years ago

Intel Demos 8-Core, 528-Thread PUMA Chip with 1 TB/s Silicon Photonics

AutoTL;DR · 2 years ago

This is the best summary I could come up with:

However, the eight-core 528-thread chip that Intel used for the demonstration stole the spotlight due to its unique architecture that sports 66 threads per core to enable up to 1TB/s of data throughput.

Intel’s PUMA (Programmable Unified Memory Architecture) chip is part of the DARPA HIVE program that focuses on improving performance in petabyte-scale graph analytics work to unlock a 1000X improvement in performance-per-watt in hyper-sparse workloads.

After characterizing the target workloads, Intel concluded that it needed to craft an architecture that solved the challenges associated with extreme stress on the memory subsystem, deep pipelines, branch predictors, and out-of-order logic created by the workload.

Intel fabbed the chip on TSMC’s 7nm process with 27.6 billion transistors spanning a 316mm^2 die.

The eight cores, which consume 1.2 billion transistors, run down the center of the die, flanked by eight custom memory controllers with an 8-byte access granularity.

The promise of optical interconnects has fueled an intensifying amount of research as the industry looks to future data transport methods that offer superior bandwidth, latency, and power consumption characteristics compared to traditional chip-to-chip communication techniques.

The original article contains 655 words, the summary contains 182 words. Saved 72%. I’m a bot and I’m open source!

@zoe@lemm.ee · 2 years ago

Intel unveiled its first direct mesh-to-mesh photonic fabric at the Hot Chips 2023 chip conference, highlighting its progress towards a future of optical chip-to-chip interconnects that are also championed by the likes of Nvidia and Ayar Labs. However, the eight-core 528-thread chip that Intel used for the demonstration stole the spotlight due to its unique architecture that sports 66 threads per core to enable up to 1TBs of data throughput. Surprisingly, the chip consumes only 75W of power, with 60 of the power being used by the optical interconnects, but the design could eventually enable systems with two million cores to be directly connected with under 400ns latency. Intels PUMA Programmable Unified Memory Architecture chip is part of the DARPA HIVE program that focuses on improving performance in petabyte-scale graph analytics work to unlock a 1000X improvement in performance-per-watt in hyper-sparse workloads. Surprisingly for an x86-centric company like Intel, the test chip utilizes a custom RISC architecture for streamlined performance in graph analytics workloads, delivering an 8X improvement in single-threaded performance. The chip is also created using TSMCs 7nm process, not Intels own internal nodes. After characterizing the target workloads, Intel concluded that it needed to craft an architecture that solved the challenges associated with extreme stress on the memory subsystem, deep pipelines, branch predictors, and out-of-order logic created by the workload. Intels custom core employs extreme parallelism to the tune of 66 hardware threads for each of the eight cores, large L1 instruction and data caches, and 4MB of scratchpad SRAM per core. The eight-core chip features 32 optical IO ports that operate at 32 GBsdir apiece, thus totaling 1TBs of total bandwidth. The chips drop into an eight-socket OCP server sled, offering up to 16 TBs of total optical throughput for the system, and each chip is fed by 32GB of custom DDR5-4000 DRAM. Intel fabbed the chip on TSMCs 7nm process with 27. 6 billion transistors spanning a 316mm2 die. The eight cores, which consume 1. 2 billion transistors, run down the center of the die, flanked by eight custom memory controllers with an 8-byte access granularity.

@A_A@lemmy.world · 2 years ago

Are we on track to Moore’s law with this ?