QueryGym
QueryGym
Open-source toolkit for reproducible LLM-based query reformulation.

Reproducibility, by design.

Every QueryGym run produces a single JSON conforming to a public, versioned schema. Submissions to the leaderboard carry the JSON, a TREC-format run file, and the reformulated queries — together they reconstruct the experiment from a fresh clone.

Submitting a result

Run the example pipeline, then use submit_run.py and open a PR. CI validates the JSON against the schema; a maintainer verifies the numbers locally before merge.

bash
# 1. Run the example pipeline
python examples/querygym_pyserini/pipeline.py \
    --dataset msmarco-v1-passage.trecdl2019 \
    --method query2e --model gpt-4.1-mini \
    --output-dir outputs/dl19_query2e

# 2. Copy into the canonical layout
python -m reproducibility.scripts.submit_run \
    --from-dir outputs/dl19_query2e

# 3. Regenerate the aggregate
make repro-aggregate

# 4. Open a PR
git add reproducibility/data/ && git commit && git push
gh pr create

Papers

QueryGym is backed by two papers: the toolkit demo and a multi-LLM reproduction study. Both link directly to the committed corpus.