NoisyTune: A Little Noise Can Help You Finetune

1 minute read

Type: Paper

Intro

  • What is the name of the NoisyTune paper?

    NoisyTune: A Little Noise Can Help You Finetune Pretrained Language Models Better

    Chuhan Wu† Fangzhao Wu‡ Tao Qi† Yongfeng Huang† Xing Xie‡, Tsinghua University

  • NoisyTune shows that adding random noise to the parameters of a pretrained language model can improve performance after finetuning

  • The motivation for adding noise in NoisyTune is to reduce overfitting

    • Idea is that parameters may be overfitted to the self-supervised pretraining tasks

Method

  • NoisyTune noise depends on the parameters of the weight matrix
    • noise is uniform, range depends on standard deviation of weight matrix

    Untitled

    • $\lambda$ is a hyperparameter controlling amount of noise

That is the entire idea of this very simple method

Results

  • The NoisyTune method leads to consistent gains across different models and GLUE tasks

    Untitled

    • They report similar small gains on the XTREME multilingual benchmark
  • NoisyTune: during finetuning parameters make smaller L1 updates suggesting that standard finetuning has to overcome more overfitting to training set

    • Note: This reasoning is more of a theory than a proven mechanism

    Untitled

Conclusions

The NoisyTune methods is a simple extension that seemingly provides robust performance transfer to downstream tasks. Adding noise could easily be applied to any language model of interest although I would like to see experiments on some larger models. The theoretical motivation might be similar to dropout but needs further study. They find that $\lambda = 0.15$ performs well across datasets suggesting a standard level of noise to apply. As a regularization method for finetuning it is much simpler to implement than Microsoft’s SMART and thus has more potential to augment the standard finetuning paradigm.

Reference

@article{wu2022noisytune,
title={NoisyTune: A Little Noise Can Help You Finetune Pretrained Language
  Models Better },
author={Chuhan Wu and Fangzhao Wu and Tao Qi and Yongfeng Huang and Xing Xie },
year={2022},
journal={arXiv preprint arXiv: Arxiv-2202.12024}
}

Updated: