You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The data folder contains two files: core_data.jsonl, containing the Unnatural Instructions core dataset of 68,478 instruction-input-output triplets, and full_data.jsonl, containing the full 240,670 Unnatural Instructions examples. The full data was constructed by expanding the core data with automatically generated instruction paraphrases.
📄 Format
Core data
Each line in core_data.jsonl is a JSON object with two fields - instruction, which is a natural language instruction describing a task, and instances, an array of JSON objects, each contains
input: An input for the task described by the instruction
instruction_with_input: The instruction concatenated with the input
constraints: The task's output space constraints
output: The output of executing instruction with the given input
Full data
core_data.jsonl has the same structure as core_data.jsonl, but with one additional field - reformulations. reformulations is an array of JSON objects, each corresponds to an automatically generated paraphrase for the given instruction. Each reformulation contains the fields:
instruction: A paraphrase of the original instruction
input: An input for the task described by the instruction
instruction_with_input: The paraphrased instruction concatenated with the input
output: The output of executing instruction with the given input
📘 Citation
If you make use of Unnatural Instructions, please cite the following paper:
@misc{honovich2022unnatural,
title = {Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor},
author = {Honovich, Or and Scialom, Thomas and Levy, Omer and Schick, Timo},
url = {https://arxiv.org/abs/2212.09689},
publisher = {arXiv},
year={2022}
}