Combining Compressions for Multiplicative Size Scaling on Natural

1 minute read

Type: Paper

Introduction

What is the name of the unified quantization and distillation paper?

Combining Compressions for Multiplicative Size Scaling on Natural Language Tasks

Rajiv Movva123 , Jinhao Lei2 , Shayne Longpre23 , Ajay Gupta2 , and Chris DuBois
What are the main contribution of the Combining Compressions paper?
- Compare KD, QAT and MP
- QAT has the best accuracy compression tradeoffs
- Few diminishing returns when combining methods
- QAT and KD amplify accuracy losses when used together

Method

Magnitude pruning masks the weights with lowest magnitudes to achieve a target sparsity
- we iteratively prune weights at a linear schedule during training after some warmup steps.
- How does this affect training/inference speed
Combining Compression QAT is done with int8 weights
Pruning embedding weights seems to significantly harm accuracy
Combining Compression experiment applies data augmentation along with knowledge distillation
- train student for 3 epochs
- synonym replacement augmentation
What sizes of BERT models does combining compressions evaluate?
- LARGE (367 million params),
- BASE (134M),
- MEDIUM (57M),
- SMALL (45M),
- MINI (19M),
- TINY (8M).

Results

Combining Compression results: all three techniques together yields the best results
Combining Compression results: maximum decreases in size while maintaining accuracy

Conclusions

It would have been interesting to include some additional baseline experiments, especially benchmarking the inference speed of models trained with each method. Given that inference is generally a larger concern for models at this scale than model size, I would have started with this experiment. It would also be interesting to compare int8 QAT to int16 versions and post hoc quantization methods

Reference

@misc{movva2022combining,
      title={Combining Compressions for Multiplicative Size Scaling on Natural Language Tasks}, 
      author={Rajiv Movva and Jinhao Lei and Shayne Longpre and Ajay Gupta and Chris DuBois},
      year={2022},
      eprint={2208.09684},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Twitter Facebook LinkedIn

Ethan Kim

Combining Compressions for Multiplicative Size Scaling on Natural

Introduction

Method

Results

Conclusions

Reference

You May Also Enjoy

ITERATED DECOMPOSITION: IMPROVING SCIENCE Q&A

Decoder Inference Optimization

1 Year of a Challenging Big-Bench Task

Scattered or Connected? An Optimized Parameter-efficient