You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Official implementation for "In-context Example Selection with Influences".
We introduce in-context influences as a way to select examples for few-shot in-context learning.
Authors: Tai Nguyen and Eric Wong.
News
Todo - Release influence scores for all tasks and code for baselines
04/18/2023 - Repository release
04/06/2023 - Blog post release
Getting started
Create a new conda environment using environment.yml. The env is called "icl-influences" by default.
Alternatively, feel free to use Dockerfile to build your own Docker image.
Usage
Download data
Directory data-train400-dev200 holds the subsampled data from our paper.
We conducted experiments on 9 SuperGLUE tasks.
To redownload these datasets from HuggingFace, please run the following command.
python data_download.py
In addition to downloading, the script automatically samples a specified number of examples for train/dev/test data splits.
Compute in-context influence scores
To compute in-context influences for a specific task and model, we first need to obtain a number of "training runs".
The following script 1) obtains the training runs, and 2) computes influence scores for both influence-based methods discussed in Section 3.1.
By default, we write training run results to out/ and influence scores to influence_scores.jsonl.
--shot: The number of examples used in each few-shot prompt
--iterations: The number of training runs evaluated on the Dev set
--cache_dir: (Optional) Directory for caching all models downloaded from HuggingFace
We recommend specifying a maximal number of shots that could fit in the context window.
This means that fewer iterations need to be run for good coverage of all train examples.
Evaluate
After influence scores are computed, run evaluation as followed.
The script picks a pre-defined k number of examples for each task define in evaluate.SHOT_MAP (same settings as in-context influence computation).
How to add your own data?
Add a method to data_download.py for downloading your own data. Keep the data fields similar to the current datasets.
Add the task type of your newly added task to task_config.json.
If the task type is outside of Multi-choice and Binary classification (ie. "free-form" text generation), you should also modify inference and encode methods in utils.py.
Alternative to accuracy, you can also define your own evaluation metric by modifying icl_datamodel.py.
Add a new prompt template to templates.py.
Rerun the same pipeline.
Models available
We currently include working pipelines for 4 autoregressive model families: GPT-2, OPT, GPT-J/NeoX, and LLaMA.
To save on memory, we load all models with half precision (fp16) wherever possible.
For LLaMA, please include the path to your converted weights following the HF's official guide.
Citation
If you find our work helpful, please cite:
@article{nguyen2023incontextinfluences,
author = Nguyen, Tai and Wong, Eric,
title = In-context Example Selection with Influences,
journal = arXiv,
year = 2023
}