Read the original document by opening this link in a new tab.
Table of Contents
Abstract
I. Introduction
II. Categorization of DNN Compression Techniques
III. Network Pruning
A. Channel Pruning
B. Filter Pruning
C. Connection Pruning
D. Layer Pruning
IV. Sparse Representation
A. Quantization
B. Multiplexing
C. Weight Sharing
V. Bits Precision
A. Estimation Using Integer
B. Low Bits Representation
C. Binarization
VI. Knowledge Distillation
A. Logits Transfer
B. Teacher Assistant
C. Domain Adaptation
VII. Miscellaneous
Summary
Deep Neural Network (DNN) has gained unprecedented performance due to its automated feature extraction capability, leading to significant incorporation in IoT applications. However, the resource requirements of DNN models make deployment on resource-constrained IoT devices prohibitive. This paper presents a comprehensive review of existing literature on compressing DNN models to reduce storage and computation requirements. Techniques like network pruning, sparse representation, bits precision, knowledge distillation, and miscellaneous approaches are discussed. Various methods for each technique are explored with a focus on reducing the accuracy compromise. The paper concludes with discussions on future directions in DNN compression techniques.