Advance Program

Tutorials: Sunday, August 24, 2025

Time (PDT)	Title	Presenters
7:45AM-8:30AM	Breakfast/Registration
8:30AM-10:30AM	Tutorial 1: Datacenter Racks
	How AI workloads shape rack system architecture	Frank Helms, AMD
	Scaling fabric technologies	Darrin Vallis, AMD
	Liquid cooling with Google Characteristics	Jorge Padilla, Google
	Rearchitected power systems	Harsha Bojja, Microsoft
10:30AM-10:55AM	Coffee Break (1/2 hr)
10:55AM-12:25PM	Tutorial 1: Datacenter Racks (cont)
	Case study: Nvidia GB200NVL72	John Norton, NVIDIA
	Case study: Meta’s Catalina (NVL72)	William Arnold and Matt Bowman, Meta
	Case study: Google TPU Rack	Pankaj Makhija, Google
12:25PM-1:30PM	Lunch (1 hr 5 min)
1:30PM-3:30PM	Tutorial 2: AI Kernel Programming
	Introduction	Fredrik Kjolstad, Stanford
	Decoupling Performance from Correctness with User-Schedulable Kernel Languages	Andrew Adams, Adobe Research
	Pallas: Using JAX to write custom kernels for GPUs and TPUs	Sharad Vikram, Google
3:30PM-4:00PM	Coffee Break (1/2 hr)
4:00PM-5:30PM	Tutorial 2: Kernel Programming (cont)
	Domain specific languages for GPU kernels and automatic kernel authoring with LLMs	Tri Dao, Princeton, Together AI
	Programming techniques for implementing ML models on GPUs	Zihao Jia, CMU
5:30PM-7:30PM	Reception

Conference Day 1: Monday, August 25, 2025

Time (PDT)	Title	Presenters
7:45AM-9:15AM	Breakfast/Registration
9:15AM-9:30AM	Welcome
	General Chair Welcome	Jan-Willem van de Waerdt & Larry Yang, General Chair
	Program Co-Chairs Welcome	Ian Bratt & Nhon Quach, PC Co-Chairs
9:30AM-11:00AM	CPU 1a Chair: Gabriel Southern
	Cuzco: A High-Performance RISC-V RVA23 Compatible CPU IP	Ty Garibay & Shashank Nemawarkar, Condor Computing
	PEZY-SC4s: The Fourth Generation MIMD Many-core Processor with High Energy Efficiency and Flexibility for HPC and AI Applications	Naoya Hatta, PEZY Computing
	IBM’s Next Generation Power Microprocessor	William Starke, IBM
11:00AM-11:30AM	Coffee Break
11:30AM-12:00PM	CPU 1b Chair: Gabriel Southern
	Introducing the Next Generation Intel® Xeon® Processor with Efficiency Cores	Don Soltis, Intel
12:00PM-1:00PM	Security Chair: Greg Papadopoulos
	Presto: A RISC-V-Compatible SoC for Unified Multi-Scheme FHE Acceleration over Module Lattice	Luchang Lei & Hongyang Jia, Tsinghua University
	Azure Secure Hardware Architecture: Establishing a Robust Security Foundation for Cloud Workloads	Bryan Kelly, Microsoft
1:00PM-2:15PM	Lunch (1 hr 15 min)
2:15PM-3:15PM	Keynote #1 Chair: Cliff Young
	Predictions for the Next Phase of AI	Noam Shazeer, GDM VP, Engineering, Google
3:15PM-4:45PM	Graphics Chair: Lavanya Subramanian
	AMD RDNA 4 and Radeon RX 9000 Series GPU	Andy Pomianowski & Laks Pappu, AMD
	RTX 5090: Designed for the Age of Neural Rendering	Marc Blackstein, NVIDIA
	Specialized SoC enabling low-power ‘World Lock Rendering’ in Augmented and Mixed Reality Devices	Ohad Meitav & Jay Tsao, Meta
4:45PM-5:15PM	Coffee Break (1/2 hr)
5:15PM-7:15PM	Networking Chair: Sherry Xu
	Intel Mount Morgan Infrastructure Processing Unit (IPU)	Patrick Fleming, Intel
	AMD Pensando™ Pollara 400 AI NIC Architecture and Application	Kevin Chu, AMD
	NVIDIA ConnectX-8 SuperNIC: A Programmable RoCE Architecture for AI Data Centers	Idan Burstein, NVIDIA
	Tomahawk Ultra - Ultra Low Latency, High Bandwidth Ethernet Switch chip for HPC and AI/ML applications	Mohan Kalkunte & Asad Khamisy, Broadcom
7:15PM-9:00PM	Reception

Conference Day 2: Tuesday, August 26, 2025

Time (PDT)	Title	Presenters
7:45AM-8:30AM	Breakfast/Registration
8:30AM-10:30AM	Optical Chair: Borivoje Nikolic
	Celestial AI Photonic Fabric Module (PF Module) - The world’s first SoC with in-die Optical IO	Phil Winterbottom, Celestial AI
	A UCIe Optical I/O Retimer Chiplet for AI Scale-up Fabrics	Vladimir Stojanovic, Ayar Labs
	Passage M1000: 3D photonic interposer for AI	Darius Bunandar, Lightmatter
	Co-Packaged Silicon Photonics Switches for Gigawatt AI Factories	Gilad Shainer, NVIDIA
10:30AM-11:00AM	Coffee Break
11:00AM-12:30PM	Power / Methodology Chair: Jae W. Lee
	ECAM Enabled Advanced Thermal Management Solutions for the AI Data Center	Michael Matthews, Fabric8Labs
	Everactive Self-Powered SoC with Energy Harvesting, Wakeup Receiver, and Energy-Aware Subsystem	Ben Calhoun, Everactive
	Taping Out Three Class Chips per Semester in Intel 16 Technology	Lucy Revina, UC Berkeley
12:30PM-1:40PM	Lunch (1 hr 15 min)
1:40PM-1:45PM	TCMM Awards Presentation
1:45PM-2:45PM	Keynote #2 Chair: Yasuo Ishii
	Up and Running with Rapidus: How Japan and Cutting-Edge Technologies are Transforming Semiconductor Manufacturing	Dr. Atsuyoshi Koike, Rapidus
2:45PM-4:15PM	Machine Learning 1 Chair: Ronny Krashinsky
	Memory: (Almost) the Only Thing That Matters	Mark Kuemerle, Marvell
	Corsair - An In-memory Computing Chiplet Architecture for Inference-time Compute Acceleration	Sudeep Bhoja, d-Matrix
	UB-Mesh: Huawei’s Next-Gen AI SuperComputer with A Unified-Bus Interconnect and nD-FullMesh Architecture	Liao Heng, Huawei
4:15PM-4:45PM	Coffee Break (1/2 hr)
4:45PM-6:15PM	Machine Learning 2 Chair: Pradeep Dubey
	NVIDIA’s GB10 SoC: AI Supercomputer On Your Desk	Andi Skende, NVIDIA
	4th Gen AMD CDNA™ Generative AI Architecture Powering AMD Instinct™ MI350 Series GPUs and Platforms	Michael Floyd & Michael Steffen, AMD
	Ironwood: Delivering best in class perf, perf/TCO and perf/Watt for reasoning model training and serving	Norm Jouppi & Sridhar Lakshmanamurthy, Google
6:15PM-6:30PM	Closing Remarks
	Closing Remarks	Larry Yang, Vice Chair

Posters

Title	Authors & Affiliation
HyperAccel Adelia: A 4nm LLM Processor for Efficient Generative AI Inference	Seungjae Moon; Hyper Accel
Basilisk: A 34 mm² End-to-End Open-Source 64-bit Linux-Capable RISC-V SoC in 130nm BiCMOS	Philippe Sauter; ETH Zurich
Bit-Separable Transformer Accelerator Leveraging Output Activation Sparsity for Efficient DRAM Access	Seunghyun Park; Kyungpook National University
Multi-modal Few-step Diffusion Model Accelerator with Mixed-Precision and Reordered Group-Quantization for On-device Generative AI	Sangjin Kim; KAIST
An Energy-Efficient Spatial Computing SoC for Real-time Interactable-Rendering and Modeling with Surface-aware 3D Gaussian Splatting	Seokchan Song; KAIST
BROCA: A Low-power and Low-latency Conversational Agent RISC-V System-on-Chip for Voice-interactive Mobile Devices	Wooyoung Jo; KAIST
MEGA.mini: A NPU with Novel Heterogeneous AI Processing Architecture Balancing Efficiency, Performance, and Intelligence for the Era of Generative AI	Donghyeon Han; Chung-Ang University
High Density Si-IPD Technologies as enabler for High-Performance and Low-Power consumption Processor Chips	Mohamed Mehdi Jatlaoui; Murata
Clo-HDnn: Continual On-Device Learning Accelerator with Hyperdimensional Computing via Progressive Search	Chang Eun Song; UC San Diego
A 4.69mW LLM Processor with Binary/Ternary Weights for Billion-Parameter Llama Model	Sangyeob Kim; Yonsei University
KLIMA: Low-latency mixed-signal In-Memory Computing accelerator for solving arbitrary-order Boolean Satisfiability	Tinish Bhattacharya; UC Santa Barbara