You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Graphs and analysis outputs are generated automatically.
Repository Structure
.
├── configs
│ ├── benchmarks # Benchmark configurations
│ ├── experiments # Experiment-specific benchmark_model pairings
│ └── models # Model configurations
├── data
│ ├── images # Images and graphs
│ ├── raw # Raw input data
│ └── tasks # Prepared tasks for benchmarks
├── scripts
│ ├── analyze.py # Analysis entry point
│ ├── run.py # Experiment execution entry point
│ ├── prep.py # Data preparation entry point
│ ├── models # Model initialization and clients
│ │ ├── base_llm.py # Abstract model class
│ │ ├── ... # Client-specific llm classes (azure ai, azure openai, google, and huggingface)
│ │ ├── llm_client.py # LLM Factory
│ └── utils
│ ├── cbb_run.py # Benchmark-specific run utils
│ ├── nq_run.py
│ ├── nm_run.py
│ ├── cbb_analyze.py # Benchmark-specific analysis utils
│ ├── nq_analyze.py
│ ├── nm_analyze.py
│ ├── metrics.py # Metric utilities
│ ├── graph_utils.py # Visualization utilities
│ └── utils.py # Helper utilities
├── slurm
│ ├── run_gem2lite.sh # Example SLURM scripts for HPC execution
│ └── ...
├── .gitignore # Gitignore file
├── environment.yaml # Conda environment specification
├── example.env # Template for API keys
└── README.md # This document
HPC Execution (SLURM)
The slurm/ directory contains scripts configured for batch execution on HPC clusters using SLURM:
sbatch slurm/run_gem2lite.sh
Ensure paths and environment settings are correct for your HPC environment.
Adding New Models
To add a new LLM:
Create a new YAML config file under configs/models/.
Extend the base_llm.py abstract class in scripts/models/.
Relevant citation:
@article{bianchi2025SmallerNeedles,
title = {Lost in the Haystack: Smaller Needles are More Difficult for LLMs to Find},
author = {Owen Bianchi and Mathew J. Koretsky and Maya Willey and Chelsea X. Alvarado and Tanay Nayak and Adi Asija and Nicole Kuznetsov and Mike A. Nalls and Faraz Faghri and Daniel Khashabi},
year = 2025,
journal = {arXiv preprint arXiv:2505.18148},
volume = {abs/2505.18148},
url = {https://arxiv.org/abs/2505.18148},
eprint = {2505.18148},
archiveprefix = {arXiv},
primaryclass = {cs.CL},
code = {https://github.com/NIH-CARD/LostInTheHaystack},
}
Enjoy exploring how LLMs handle varying gold context sizes!