Pruning Vs Quantization: Which Is Better?

By Andrey K et al
Read the original document by opening this link in a new tab.

Table of Contents

1. Abstract 2. Introduction 3. Assumptions 4. Comparison on Statistical Distributions 5. Experiments on Real Weight Tensors 6. Per-layer Comparison

Summary

Neural network pruning and quantization techniques are compared in this paper to determine which method is better for compressing deep neural networks. The study analyzes the error of pruning and quantization on statistical distributions, such as standard normal distribution and distributions with heavy tails. Experimental results on real weight tensors from various models show that pruning becomes more beneficial for lower bit-width/higher sparsity ratios. The paper also provides a per-layer comparison between post-training quantization and pruning, using theoretical bounds to evaluate the output errors in each case.
×
This is where the content will go.