Advance Program
Tutorials: Sunday, August 24, 2025
| Time (PDT) | Title | Presenters |
|---|---|---|
| 7:45AM-8:30AM |
Breakfast/Registration |
|
| 8:30AM-10:30AM |
Tutorial 1: Datacenter Racks |
|
|
How AI workloads shape rack system architecture |
Frank Helms, AMD | |
|
Scaling fabric technologies |
Darrin Vallis, AMD | |
|
Liquid cooling with Google Characteristics |
Jorge Padilla, Google | |
|
Rearchitected power systems |
Harsha Bojja, Microsoft | |
| 10:30AM-10:55AM |
Coffee Break (1/2 hr) |
|
| 10:55AM-12:25PM |
Tutorial 1: Datacenter Racks (cont) |
|
|
Case study: Nvidia GB200NVL72 |
John Norton, NVIDIA | |
|
Case study: Meta’s Catalina (NVL72) |
William Arnold and Matt Bowman, Meta | |
|
Case study: Google TPU Rack |
Pankaj Makhija, Google | |
| 12:25PM-1:30PM |
Lunch (1 hr 5 min) |
|
| 1:30PM-3:30PM |
Tutorial 2: AI Kernel Programming |
|
|
Introduction |
Fredrik Kjolstad, Stanford | |
|
Decoupling Performance from Correctness with User-Schedulable Kernel Languages |
Andrew Adams, Adobe Research | |
|
Pallas: Using JAX to write custom kernels for GPUs and TPUs |
Sharad Vikram, Google | |
| 3:30PM-4:00PM |
Coffee Break (1/2 hr) |
|
| 4:00PM-5:30PM |
Tutorial 2: Kernel Programming (cont) |
|
|
Domain specific languages for GPU kernels and automatic kernel authoring with LLMs |
Tri Dao, Princeton, Together AI | |
|
Programming techniques for implementing ML models on GPUs |
Zihao Jia, CMU | |
| 5:30PM-7:30PM |
Reception |
Conference Day 1: Monday, August 25, 2025
| Time (PDT) | Title | Presenters |
|---|---|---|
| 7:45AM-9:15AM |
Breakfast/Registration |
|
| 9:15AM-9:30AM |
Welcome |
|
|
General Chair Welcome |
Jan-Willem van de Waerdt & Larry Yang, General Chair | |
|
Program Co-Chairs Welcome |
Ian Bratt & Nhon Quach, PC Co-Chairs | |
| 9:30AM-11:00AM |
CPU 1a Chair: Gabriel Southern |
|
|
Cuzco: A High-Performance RISC-V RVA23 Compatible CPU IP |
Ty Garibay & Shashank Nemawarkar, Condor Computing | |
|
PEZY-SC4s: The Fourth Generation MIMD Many-core Processor with High Energy Efficiency and Flexibility for HPC and AI Applications |
Naoya Hatta, PEZY Computing | |
|
IBM’s Next Generation Power Microprocessor |
William Starke, IBM | |
| 11:00AM-11:30AM |
Coffee Break |
|
| 11:30AM-12:00PM |
CPU 1b Chair: Gabriel Southern |
|
|
Introducing the Next Generation Intel® Xeon® Processor with Efficiency Cores |
Don Soltis, Intel | |
| 12:00PM-1:00PM |
Security Chair: Greg Papadopoulos |
|
|
Presto: A RISC-V-Compatible SoC for Unified Multi-Scheme FHE Acceleration over Module Lattice |
Luchang Lei & Hongyang Jia, Tsinghua University | |
|
Azure Secure Hardware Architecture: Establishing a Robust Security Foundation for Cloud Workloads |
Bryan Kelly, Microsoft | |
| 1:00PM-2:15PM |
Lunch (1 hr 15 min) |
|
| 2:15PM-3:15PM |
Keynote #1 Chair: Cliff Young |
|
|
Predictions for the Next Phase of AI |
Noam Shazeer, GDM VP, Engineering, Google | |
| 3:15PM-4:45PM |
Graphics Chair: Lavanya Subramanian |
|
|
AMD RDNA 4 and Radeon RX 9000 Series GPU |
Andy Pomianowski & Laks Pappu, AMD | |
|
RTX 5090: Designed for the Age of Neural Rendering |
Marc Blackstein, NVIDIA | |
|
Specialized SoC enabling low-power ‘World Lock Rendering’ in Augmented and Mixed Reality Devices |
Ohad Meitav & Jay Tsao, Meta | |
| 4:45PM-5:15PM |
Coffee Break (1/2 hr) |
|
| 5:15PM-7:15PM |
Networking Chair: Sherry Xu |
|
|
Intel Mount Morgan Infrastructure Processing Unit (IPU) |
Patrick Fleming, Intel | |
|
AMD Pensando™ Pollara 400 AI NIC Architecture and Application |
Kevin Chu, AMD | |
|
NVIDIA ConnectX-8 SuperNIC: A Programmable RoCE Architecture for AI Data Centers |
Idan Burstein, NVIDIA | |
|
Tomahawk Ultra - Ultra Low Latency, High Bandwidth Ethernet Switch chip for HPC and AI/ML applications |
Mohan Kalkunte & Asad Khamisy, Broadcom | |
| 7:15PM-9:00PM |
Reception |
Conference Day 2: Tuesday, August 26, 2025
| Time (PDT) | Title | Presenters |
|---|---|---|
| 7:45AM-8:30AM |
Breakfast/Registration |
|
| 8:30AM-10:30AM |
Optical Chair: Borivoje Nikolic |
|
|
Celestial AI Photonic Fabric Module (PF Module) - The world’s first SoC with in-die Optical IO |
Phil Winterbottom, Celestial AI | |
|
A UCIe Optical I/O Retimer Chiplet for AI Scale-up Fabrics |
Vladimir Stojanovic, Ayar Labs | |
|
Passage M1000: 3D photonic interposer for AI |
Darius Bunandar, Lightmatter | |
|
Co-Packaged Silicon Photonics Switches for Gigawatt AI Factories |
Gilad Shainer, NVIDIA | |
| 10:30AM-11:00AM |
Coffee Break |
|
| 11:00AM-12:30PM |
Power / Methodology Chair: Jae W. Lee |
|
|
ECAM Enabled Advanced Thermal Management Solutions for the AI Data Center |
Michael Matthews, Fabric8Labs | |
|
Everactive Self-Powered SoC with Energy Harvesting, Wakeup Receiver, and Energy-Aware Subsystem |
Ben Calhoun, Everactive | |
|
Taping Out Three Class Chips per Semester in Intel 16 Technology |
Lucy Revina, UC Berkeley | |
| 12:30PM-1:40PM |
Lunch (1 hr 15 min) |
|
| 1:40PM-1:45PM |
TCMM Awards Presentation |
|
| 1:45PM-2:45PM |
Keynote #2 Chair: Yasuo Ishii |
|
|
Up and Running with Rapidus: How Japan and Cutting-Edge Technologies are Transforming Semiconductor Manufacturing |
Dr. Atsuyoshi Koike, Rapidus | |
| 2:45PM-4:15PM |
Machine Learning 1 Chair: Ronny Krashinsky |
|
|
Memory: (Almost) the Only Thing That Matters |
Mark Kuemerle, Marvell | |
|
Corsair - An In-memory Computing Chiplet Architecture for Inference-time Compute Acceleration |
Sudeep Bhoja, d-Matrix | |
|
UB-Mesh: Huawei’s Next-Gen AI SuperComputer with A Unified-Bus Interconnect and nD-FullMesh Architecture |
Liao Heng, Huawei | |
| 4:15PM-4:45PM |
Coffee Break (1/2 hr) |
|
| 4:45PM-6:15PM |
Machine Learning 2 Chair: Pradeep Dubey |
|
|
NVIDIA’s GB10 SoC: AI Supercomputer On Your Desk |
Andi Skende, NVIDIA | |
|
4th Gen AMD CDNA™ Generative AI Architecture Powering AMD Instinct™ MI350 Series GPUs and Platforms |
Michael Floyd & Michael Steffen, AMD | |
|
Ironwood: Delivering best in class perf, perf/TCO and perf/Watt for reasoning model training and serving |
Norm Jouppi & Sridhar Lakshmanamurthy, Google | |
| 6:15PM-6:30PM |
Closing Remarks |
|
|
Closing Remarks |
Larry Yang, Vice Chair |
Posters
| Title | Authors & Affiliation |
|---|---|
| HyperAccel Adelia: A 4nm LLM Processor for Efficient Generative AI Inference | Seungjae Moon; Hyper Accel |
| Basilisk: A 34 mm² End-to-End Open-Source 64-bit Linux-Capable RISC-V SoC in 130nm BiCMOS | Philippe Sauter; ETH Zurich |
| Bit-Separable Transformer Accelerator Leveraging Output Activation Sparsity for Efficient DRAM Access | Seunghyun Park; Kyungpook National University |
| Multi-modal Few-step Diffusion Model Accelerator with Mixed-Precision and Reordered Group-Quantization for On-device Generative AI | Sangjin Kim; KAIST |
| An Energy-Efficient Spatial Computing SoC for Real-time Interactable-Rendering and Modeling with Surface-aware 3D Gaussian Splatting | Seokchan Song; KAIST |
| BROCA: A Low-power and Low-latency Conversational Agent RISC-V System-on-Chip for Voice-interactive Mobile Devices | Wooyoung Jo; KAIST |
| MEGA.mini: A NPU with Novel Heterogeneous AI Processing Architecture Balancing Efficiency, Performance, and Intelligence for the Era of Generative AI | Donghyeon Han; Chung-Ang University |
| High Density Si-IPD Technologies as enabler for High-Performance and Low-Power consumption Processor Chips | Mohamed Mehdi Jatlaoui; Murata |
| Clo-HDnn: Continual On-Device Learning Accelerator with Hyperdimensional Computing via Progressive Search | Chang Eun Song; UC San Diego |
| A 4.69mW LLM Processor with Binary/Ternary Weights for Billion-Parameter Llama Model | Sangyeob Kim; Yonsei University |
| KLIMA: Low-latency mixed-signal In-Memory Computing accelerator for solving arbitrary-order Boolean Satisfiability | Tinish Bhattacharya; UC Santa Barbara |