Capturing Failures of Large Language Models via Human Cognitive Biases

3 minute read

Intro

What is the name of the UCB human cognitive biases paper

Capturing Failures of Large Language Models via Human Cognitive Biases

Erik Jones, Jacob Steinhardt UC Berkeley

The main idea of this paper is to evaluate failures of large language models using methods inspired from human cognitive biases. Human cognitive biases are known phenomena where people make erroneous or irrational judgement.

What are the main contributions of the UCB cognitive biases paper?
- Use four cognitive biases to identify errors made by GPT-Codex
  - errors based on framing of prompt
  - errors based on anchors
  - bias based on training set frequencies
  - reversion to simpler problems
Human cognitive biases were famously identified by Tversky and Kahneman

There are many examples of human cognitive biases. For example:

Pareidolia is the tendency for perception to impose a meaningful interpretation on a nebulous stimulus usually visual, so that one sees an object, pattern, or meaning where there is none.
Why do the authors choose to focus on code generation as an evaluation task?
- objectivity the generated code output is either correct or not
- open ended there are many way to solve the same problem

Method

The authors use a simple setup to evaluate the GPT-Codex model

The authors probe 4 main cognitive biases in the GPT-Codex model
- Framing Effect
- Anchoring Effect
- Availability Heuristic
- Attribute substitution
The authors probe the impact of the framing effect on code generation by prepending an irrelevant function prompt and a framing line to the prompt
- The use raise NotImplementedError , pass , assert False , return False , and print("Hello world!") as framing lines
- In this example the model erroneously generates the framing line instead of the correct answer
For the anchoring effect the authors test the model be prepending a related but inccorect anchor function to the prompt
- In the example, the model fails to adjust and gets distracted by the anchor function
The authors test the availability heuristic by testing when codex fails to give the correct order of operations
- The function needs to sum the inputs first, then square
In GPT codex one can test the attribute substitution bias by giving a prompt along with a misleading function name
- tends to follow the function name

Results

The paper results show that the model is very susceptible to the cognitive biases of the tasks that they construct.

GPT codex: framing bias results: adding framing leads to huge drop in accuracy
- large tendency to copy pass statement
GPT codex: anchoring with irrelevant statements leads to drop in performance
- the model also tends to erroneously copy these anchors
The availability heuristic
- 24 combinations and orderings of the binary operations sum, difference, and product, with unary operations square, cube, quadruple, and square root
- Only generates output correctly for 9/24
Attribute substitution bias adding misleading function signatures leads to huge drop in accuracy

They also evaluate how GPT3 performs with respect to the anchoring bias

Probing GPT3 for anchoring bias
- Can anchor the model to a higher or lower value
- Results: output is very often affected by anchoring biase

Conclusions

Overall there is a little bit of a mismatch between the probing tasks and the precise definition of the cognitive bias. For example, the attribute substitution task could really be called a ignore the misleading function name or deal with conflicting prompt task and does not actually probe any fundamental tendency of the model to respond to a simpler question. Additionally, many other probing tasks could be designed for the availability heuristic. Overall the paper could be framed more as an exploration of how GPT codex deals with different prompts. However the approach of drawing inspiration from human cognitive biases is very interesting and can hopefully be applied to other language models.

Reference

@article{jones2022capturing,
title={Capturing Failures of Large Language Models via Human Cognitive Biases },
author={Erik Jones and Jacob Steinhardt },
year={2022},
journal={arXiv preprint arXiv: Arxiv-2202.12299}
}

Twitter Facebook LinkedIn

Ethan Kim

Capturing Failures of Large Language Models via Human Cognitive Biases

Intro

Method

Results

Conclusions

Reference

You May Also Enjoy

ITERATED DECOMPOSITION: IMPROVING SCIENCE Q&A

Decoder Inference Optimization

1 Year of a Challenging Big-Bench Task

Scattered or Connected? An Optimized Parameter-efficient