BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks

Convolutional Neural Networks (CNNs) have demonstrated remarkable performance across a wide range of machine learning tasks. However, the high accuracy usually comes at the cost of substantial computation and energy consumption, making it difficult to be deployed on mobile and embedded devices. In CNNs, the compute-intensive convolutional layers are usually followed by a ReLU activation layer, which clamps negative outputs to zeros, resulting in large activation sparsity. By exploiting such sparsity in CNN models, we propose a software-hardware co-design BitSET, that aggressively saves energy during CNN inference. The bit-serial BitSET accelerator adopts a prediction-based bit-level early termination technique that terminates the ineffectual computation of negative outputs early. To assist the algorithm, we propose a novel weight encoding that allows more accurate predictions with fewer bits. BitSET leverages the bit-level computation reduction both in the predictive early termination algorithm and in the non-predictive, energy-efficient bit-serial architecture. Compared to UNPU, an energy-efficient bit-serial CNN accelerator, BitSET yields an average 1.5× speedup and 1.4× energy efficiency improvement with no accuracy loss due to a 48% reduction in bit-level computations. Relaxing the allowed accuracy loss to 1% increases the gains to an average of 1.6× speedup and 1.4× energy efficiency improvement.

[1] Yunjie Pan, Jiecao Yu, Andrew Lukefahr, Reetuparna Das, and Scott Mahlke. 2023. BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks. ACM Trans. Embed. Comput. Syst. 22, 5s, Article 98 (October 2023), 24 pages. https://doi-org.proxy.lib.umich.edu/10.1145/3609093