Evaluating Distributional Distortion in Neural Language Modeling

1 minute read

Introduction

> ****Evaluating Distributional Distortion in Neural Language Modeling****
> 

Benjamin LeBrun, Alessandro Sordoni, Timothy J. O’Donnell
  • What are the main contributions of the Distributional Distortion paper?

    Evaluate LM performance on Heavy tail of sequence distributions

    • instance level evaluation scheme
    • Use a transformer generative model to define artificial languages
      • can control distributional properties
    • Language Model Performance
      • Systematically underestimate probabilities of rare sequences
      • Do so more for rare sequences
      • overestimate probability of rare (ill-formed) sequences
      • Greater amounts of training data alleviate underestimation
      • underestimation is worse for distributions with lower entropy
  • Productivity in Natural Language refers to the phenomenon that a speaker can produce an infinite number of gramatically valid utterances
    • assigned nonzero probability
  • LNRE zone refers to the range of sample sizes which have nonzero probability of generating an unseen sequence
    • LNRE holds for language models and is expected to hold for many more orders of magnitude of sample data
  • The Kleene closure is the set of all strings of finite length

Method

  • The Distributional Distortion paper: generates artificial languages with GPT2-medium
    • sample from this language model’s distribution
  • Distributional Distortion measures model estimation error using the difference in logs of probabilities of sequence under true process and a trained language model process

Results

  • Language model estimation error is minimized when model reaches lowest cross entropy
    • note how GPT2 is able to perfectly replicate the training distribution after 5 epochs (left)

    Untitled

  • Distributional Distortion results low probability sequences are consistently underestimated
    • real rare event are underestimated in favor of perturbed rare events

    Untitled

  • Distributional Distortion sequences with higher temperature (higher entropy) have less estimation error
    • language models better model sequences with less heavy tails

Reference

@article{https://doi.org/10.48550/arxiv.2203.12788,
  doi = {10.48550/ARXIV.2203.12788},
  
  url = {https://arxiv.org/abs/2203.12788},
  
  author = {LeBrun, Benjamin and Sordoni, Alessandro and O'Donnell, Timothy J.},
  
  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Evaluating Distributional Distortion in Neural Language Modeling},
  
  publisher = {arXiv},
  
  year = {2022},
  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Updated: