[2021-10-12] ISCA 2021 Section 3A: Machine Learning 1

1 minute read

Published: October 12, 2021

This reading blog is about three papers in Section 3A: Machine Learning 1 of ISCA 2021.

RaPiD: AI Accelerator for Ultra-Low Precision Training and Inference

This accelerator supports mixed precisions for both training and inference: 16 and 8-bit floating-point and 4 and 2-bit fixed point. It imporves both performance(TOPS) and energy efficiency(TOPS/W) at ultra-low preceision. In my opinion, this work contributes more on the engineering part (architecture).

MPE Array: Mixed-Precision PE Array

RaPiD MPE block design Here are the challenges and solutions for scaled precisions. 1) Challenge: Support both INT and FP pipelines. Solution: Seperation of the integer and floating point pipelines. 2) Challenge: FP8 for training has two format, one for forward(1,5,3), another for backward(1,4,3). Solution: On-the-fly conversion. Both ceonver to 9bit (1,5,3) 3) Challenge: performance scaling from FP16 to FP8. Solution: sub-SIMD partition. 4) Challenge: IN4/INT2 inference circuit-level optimizations. Solution: double INT4/INT2 engines; Operand Reuse: Sub-SIMD

Sparsity-aware Zero-gating and Frequency Throttling

1） Zero-gating for zero operands 2) Sparsity-aware frequency throttling. Use clock throttling rather than DVFS. (Although I don’t know what clock throttling); not in critical path

Data Communication Among Cores and Memory

Bi-directional ring interconnnection to communicate data between cores and memory. Data fetch latency can be hidden by double-buffering data in L1 overlapped with computations; Assign unique identification tags; Support multi-cast comminucations; Support for “request aggregation”

Share on

Twitter Facebook LinkedIn

Background and Motivation

It is challenging to accelerate Graph Convolutional Networks because: (1) substantial and irregular data communication to propagate information within the graph (2) intensive computation to propagate information along the neural network layers (3) Degree-imbalance ofgraph nodes can significantly degrade the performance of feature propagation.

Yunjie Pan

[2021-10-12] ISCA 2021 Section 3A: Machine Learning 1

RaPiD: AI Accelerator for Ultra-Low Precision Training and Inference

MPE Array: Mixed-Precision PE Array

Sparsity-aware Zero-gating and Frequency Throttling

Data Communication Among Cores and Memory

Share on

You May Also Enjoy

[2021-09-24] Ten Lessons From Three Generations Shaped Google’s TPUv4i : Industrial Product

[2021-04-20] DiAG: A Dataflow-Inspired Architecture for General-Purpose Processors

[2021-02-20] Analyzing and Mitigating Data Stalls in DNN Training

Background

[2021-01-26] GraphACT: Accelerating GCN Training on CPU-FPGA Heterogeneous Platforms

Background and Motivation