StrategyQA Evaluator

This repo hosts the evaluator for StrategyQA leaderboard. You can read about StrategyQA on the dataset page.

This evaluator scores predictions provided in JSON format, and produces a file with the scores in JSON format.

Testing the evaluator

Run test.sh to build and test the evaluator.

The test will score the prediction files answers_file_small.json, decomps_file_small.json and paras_file_small.json against the gold annotations in gold_small.json. If everything is okay, then the test will pass.

(These gold and predictions JSON files are representative of the real gold and prediction files, but we put only 10 examples into them, thus the name "small".)

Running the evaluator locally

You can follow the steps in test.sh to build and run the evaluator yourself using Docker.

If you want to run the evaluator outside of Docker, look in the evaluator directory and first install the dependencies specified in requirements.txt. Then run eval.py as shown in the test.sh script.

Submitting to the Leaderboard

The file predictions_dummy.json is a valid dummy submission file for the StrategyQA leaderboard. It contains predictions for 490 questions. If you submit it, you'll get this dummy score:

Accuracy: 0.46122448979591835
SARI: 0.42750331591054463
Recall@10: 0.0

To submit your own predictions to the StrategyQA leaderboard, produce a JSON file like predictions_dummy.json with your predictions, and submit it.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
evaluator		evaluator
.gitignore		.gitignore
README.md		README.md
gold_small.json		gold_small.json
predictions_dummy.json		predictions_dummy.json
predictions_small.json		predictions_small.json
pyproject.toml		pyproject.toml
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

StrategyQA Evaluator

Testing the evaluator

Running the evaluator locally

Submitting to the Leaderboard

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

allenai/strategyqa-evaluator

Folders and files

Latest commit

History

Repository files navigation

StrategyQA Evaluator

Testing the evaluator

Running the evaluator locally

Submitting to the Leaderboard

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages