QueryGym
QueryGym
Open-source toolkit for reproducible LLM-based query reformulation.

Install

QueryGym runs on Python 3.9+. The default install is dependency-light; add extras for HuggingFace datasets, BEIR, Pyserini, or the reproducibility tooling.

Option 1 — pip

bash
pip install querygym

# Or with extras:
pip install "querygym[hf]"          # HuggingFace datasets
pip install "querygym[beir]"        # BEIR benchmarks
pip install "querygym[pyserini]"    # Pyserini retrieval
pip install "querygym[repro]"       # reproducibility aggregator + validator
pip install "querygym[all]"         # everything

Option 2 — Docker

bash
# GPU image
docker pull ghcr.io/ls3-lab/querygym:latest
docker run -it --gpus all ghcr.io/ls3-lab/querygym:latest

# CPU image (lightweight)
docker pull ghcr.io/ls3-lab/querygym:cpu
docker run -it ghcr.io/ls3-lab/querygym:cpu

Quickstart

Reformulate a single query with the GenQR Ensemble method and any OpenAI-compatible endpoint.

python
import querygym as qg

# Pick a method and a model
reformulator = qg.create_reformulator(
    "genqr_ensemble",
    model="gpt-4.1-mini",
)

# Run it
result = reformulator.reformulate(
    qg.QueryItem("q1", "what causes diabetes?")
)

print(result.reformulated)

CLI

bash
export OPENAI_API_KEY=sk-...

querygym run --method genqr_ensemble \
  --queries-tsv queries.tsv \
  --output-tsv reformulated.tsv \
  --cfg-path querygym/config/defaults.yaml

Full guide: querygym.readthedocs.io ↗