Adaptable Adapters

less than 1 minute read

Introduction

What are the main contributions of the adaptable adapters paper?
- Adaptable Adapters:
  1. learning activation functions for different inputs and layers
  2. learnable switch to use only beneficial adapter layers
- Adapters require less training time and storage space
- using fewer adapters with a learnable activation increases performance in the low data domain
Padé Activation Units (Molina et al., 2020), are learnable activation functions that can approximate common activation functions as well as learn new ones.
Rational activation units are parameterized by two parameters
- order m, n
- a and b are learnable parameters
Schwartz et al. (2020) propose to add an output layer to each transformer layer.
- confidence based early exiting for efficiency
AdapterDrop (Rücklé et al., 2021) train adapters with layer dropping and later drop n layers at inference time

Method

Adaptable Adapter layer uses a rational activation function
Adaptable adapters uses a gumbel softmax to learn when to skip the layer
- skipping applied skip connection

Results

Adaptive Adapter results: better performance in the low data regime
- adapters are on top of BERT large
AA learned rational optimization functions

Reference

 @inproceedings{Moosavi2022AdaptableA,
  title={Adaptable Adapters},
  author={Nafise Sadat Moosavi and Quentin Delfosse and Kristian Kersting and Iryna Gurevych},
  year={2022}
}

Twitter Facebook LinkedIn

Ethan Kim

Adaptable Adapters

Introduction

Method

Results

Reference

You May Also Enjoy

ITERATED DECOMPOSITION: IMPROVING SCIENCE Q&A

Decoder Inference Optimization

1 Year of a Challenging Big-Bench Task

Scattered or Connected? An Optimized Parameter-efficient