Advance Program

Tutorials: Sunday, August 24, 2025

Time (PDT) Title Presenters
7:45AM-8:30AM Breakfast/Registration
 
8:30AM-10:30AM Tutorial 1: Datacenter Racks
 
  How AI workloads shape rack system architecture
Frank Helms, AMD
  Scaling fabric technologies
Darrin Vallis, AMD
  Liquid cooling with Google Characteristics
Jorge Padilla, Google
  Rearchitected power systems
Harsha Bojja, Microsoft
10:30AM-10:55AM Coffee Break (1/2 hr)
 
10:55AM-12:25PM Tutorial 1: Datacenter Racks (cont)
 
  Case study: Nvidia GB200NVL72
John Norton, NVIDIA
  Case study: Meta’s Catalina (NVL72)
William Arnold and Matt Bowman, Meta
  Case study: Google TPU Rack
Pankaj Makhija, Google
12:25PM-1:30PM Lunch (1 hr 5 min)
 
1:30PM-3:30PM Tutorial 2: AI Kernel Programming
 
  Introduction
Fredrik Kjolstad, Stanford
  Decoupling Performance from Correctness with User-Schedulable Kernel Languages
Andrew Adams, Adobe Research
  Pallas: Using JAX to write custom kernels for GPUs and TPUs
Sharad Vikram, Google
3:30PM-4:00PM Coffee Break (1/2 hr)
 
4:00PM-5:30PM Tutorial 2: Kernel Programming (cont)
 
  Domain specific languages for GPU kernels and automatic kernel authoring with LLMs
Tri Dao, Princeton, Together AI
  Programming techniques for implementing ML models on GPUs
Zihao Jia, CMU
5:30PM-7:30PM Reception
 

Conference Day 1: Monday, August 25, 2025

Time (PDT) Title Presenters
7:45AM-9:15AM Breakfast/Registration
 
9:15AM-9:30AM Welcome
 
  General Chair Welcome
Jan-Willem van de Waerdt & Larry Yang, General Chair
  Program Co-Chairs Welcome
Ian Bratt & Nhon Quach, PC Co-Chairs
9:30AM-11:00AM CPU 1a

Chair: Gabriel Southern
 
  Cuzco: A High-Performance RISC-V RVA23 Compatible CPU IP
Ty Garibay & Shashank Nemawarkar, Condor Computing
  PEZY-SC4s: The Fourth Generation MIMD Many-core Processor with High Energy Efficiency and Flexibility for HPC and AI Applications
Naoya Hatta, PEZY Computing
  IBM’s Next Generation Power Microprocessor
William Starke, IBM
11:00AM-11:30AM Coffee Break
 
11:30AM-12:00PM CPU 1b

Chair: Gabriel Southern
 
  Introducing the Next Generation Intel® Xeon® Processor with Efficiency Cores
Don Soltis, Intel
12:00PM-1:00PM Security

Chair: Greg Papadopoulos
 
  Presto: A RISC-V-Compatible SoC for Unified Multi-Scheme FHE Acceleration over Module Lattice
Luchang Lei & Hongyang Jia, Tsinghua University
  Azure Secure Hardware Architecture: Establishing a Robust Security Foundation for Cloud Workloads
Bryan Kelly, Microsoft
1:00PM-2:15PM Lunch (1 hr 15 min)
 
2:15PM-3:15PM Keynote #1

Chair: Cliff Young
 
  Predictions for the Next Phase of AI
Noam Shazeer, GDM VP, Engineering, Google
3:15PM-4:45PM Graphics

Chair: Lavanya Subramanian
 
  AMD RDNA 4 and Radeon RX 9000 Series GPU
Andy Pomianowski & Laks Pappu, AMD
  RTX 5090: Designed for the Age of Neural Rendering
Marc Blackstein, NVIDIA
  Specialized SoC enabling low-power ‘World Lock Rendering’ in Augmented and Mixed Reality Devices
Ohad Meitav & Jay Tsao, Meta
4:45PM-5:15PM Coffee Break (1/2 hr)
 
5:15PM-7:15PM Networking

Chair: Sherry Xu
 
  Intel Mount Morgan Infrastructure Processing Unit (IPU)
Patrick Fleming, Intel
  AMD Pensando™ Pollara 400 AI NIC Architecture and Application
Kevin Chu, AMD
  NVIDIA ConnectX-8 SuperNIC: A Programmable RoCE Architecture for AI Data Centers
Idan Burstein, NVIDIA
  Tomahawk Ultra - Ultra Low Latency, High Bandwidth Ethernet Switch chip for HPC and AI/ML applications
Mohan Kalkunte & Asad Khamisy, Broadcom
7:15PM-9:00PM Reception
 

Conference Day 2: Tuesday, August 26, 2025

Time (PDT) Title Presenters
7:45AM-8:30AM Breakfast/Registration
 
8:30AM-10:30AM Optical

Chair: Borivoje Nikolic
 
  Celestial AI Photonic Fabric Module (PF Module) - The world’s first SoC with in-die Optical IO
Phil Winterbottom, Celestial AI
  A UCIe Optical I/O Retimer Chiplet for AI Scale-up Fabrics
Vladimir Stojanovic, Ayar Labs
  Passage M1000: 3D photonic interposer for AI
Darius Bunandar, Lightmatter
  Co-Packaged Silicon Photonics Switches for Gigawatt AI Factories
Gilad Shainer, NVIDIA
10:30AM-11:00AM Coffee Break
 
11:00AM-12:30PM Power / Methodology

Chair: Jae W. Lee
 
  ECAM Enabled Advanced Thermal Management Solutions for the AI Data Center
Michael Matthews, Fabric8Labs
  Everactive Self-Powered SoC with Energy Harvesting, Wakeup Receiver, and Energy-Aware Subsystem
Ben Calhoun, Everactive
  Taping Out Three Class Chips per Semester in Intel 16 Technology
Lucy Revina, UC Berkeley
12:30PM-1:40PM Lunch (1 hr 15 min)
 
1:40PM-1:45PM TCMM Awards Presentation
 
1:45PM-2:45PM Keynote #2

Chair: Yasuo Ishii
 
  Up and Running with Rapidus: How Japan and Cutting-Edge Technologies are Transforming Semiconductor Manufacturing
Dr. Atsuyoshi Koike, Rapidus
2:45PM-4:15PM Machine Learning 1

Chair: Ronny Krashinsky
 
  Memory: (Almost) the Only Thing That Matters
Mark Kuemerle, Marvell
  Corsair - An In-memory Computing Chiplet Architecture for Inference-time Compute Acceleration
Sudeep Bhoja, d-Matrix
  UB-Mesh: Huawei’s Next-Gen AI SuperComputer with A Unified-Bus Interconnect and nD-FullMesh Architecture
Liao Heng, Huawei
4:15PM-4:45PM Coffee Break (1/2 hr)
 
4:45PM-6:15PM Machine Learning 2

Chair: Pradeep Dubey
 
  NVIDIA’s GB10 SoC: AI Supercomputer On Your Desk
Andi Skende, NVIDIA
  4th Gen AMD CDNA™ Generative AI Architecture Powering AMD Instinct™ MI350 Series GPUs and Platforms
Michael Floyd & Michael Steffen, AMD
  Ironwood: Delivering best in class perf, perf/TCO and perf/Watt for reasoning model training and serving
Norm Jouppi & Sridhar Lakshmanamurthy, Google
6:15PM-6:30PM Closing Remarks
 
  Closing Remarks
Larry Yang, Vice Chair

Posters

Title Authors & Affiliation
HyperAccel Adelia: A 4nm LLM Processor for Efficient Generative AI Inference Seungjae Moon; Hyper Accel
Basilisk: A 34 mm² End-to-End Open-Source 64-bit Linux-Capable RISC-V SoC in 130nm BiCMOS Philippe Sauter; ETH Zurich
Bit-Separable Transformer Accelerator Leveraging Output Activation Sparsity for Efficient DRAM Access Seunghyun Park; Kyungpook National University
Multi-modal Few-step Diffusion Model Accelerator with Mixed-Precision and Reordered Group-Quantization for On-device Generative AI Sangjin Kim; KAIST
An Energy-Efficient Spatial Computing SoC for Real-time Interactable-Rendering and Modeling with Surface-aware 3D Gaussian Splatting Seokchan Song; KAIST
BROCA: A Low-power and Low-latency Conversational Agent RISC-V System-on-Chip for Voice-interactive Mobile Devices Wooyoung Jo; KAIST
MEGA.mini: A NPU with Novel Heterogeneous AI Processing Architecture Balancing Efficiency, Performance, and Intelligence for the Era of Generative AI Donghyeon Han; Chung-Ang University
High Density Si-IPD Technologies as enabler for High-Performance and Low-Power consumption Processor Chips Mohamed Mehdi Jatlaoui; Murata
Clo-HDnn: Continual On-Device Learning Accelerator with Hyperdimensional Computing via Progressive Search Chang Eun Song; UC San Diego
A 4.69mW LLM Processor with Binary/Ternary Weights for Billion-Parameter Llama Model Sangyeob Kim; Yonsei University
KLIMA: Low-latency mixed-signal In-Memory Computing accelerator for solving arbitrary-order Boolean Satisfiability Tinish Bhattacharya; UC Santa Barbara