Reproducible query reformulation, powered by LLMs.

QueryGym is a toolkit for benchmarking and reproducing LLM-based query rewriting methods across IR datasets. Open prompt bank, pluggable searchers, frozen schemas, citable runs.

Get Started View Leaderboard

Live reformulation preview

USER QUERY

what causes diabetes?

querygym.create_reformulator("genqr_ensemble")

REFORMULATED

what causes diabetes?

what causes diabetes type 1 type 2 insulin resistance pancreas autoimmune glucose metabolism risk factors genetic predisposition lifestyle obesity

What you get

QueryGym pairs a small, opinionated library with a contract-driven reproducibility pipeline. The toolkit and the leaderboard share the same data shape.

📚

Single Prompt Bank

One YAML registry of every prompt with version, license, and authorship metadata. Cite the exact text used in any run.
🔌

Pluggable searchers

Drop-in adapters for Pyserini, PyTerrier, BEIR, MS MARCO, and any custom retriever. Bring your own index.
🧬

Stable run schema

Every run emits a versioned JSON conforming to a public JSON Schema — same shape across the toolkit and third-party submitters.
🧠

OpenAI-compatible LLMs

Works with any OpenAI-compatible endpoint. Gpt-4.1, Qwen, Mistral, vLLM, Ollama — switch with a config change.
🔁

Reproducible by design

Every leaderboard row links a JSON, a TREC run file, and the reformulated queries. Re-evaluate from a fresh clone.
📄

Citable artifacts

Backed by two papers (WWW 2026 Demos, SIGIR 2026 Reproducibility) and a tagged reproducibility corpus on GitHub.

The ecosystem

QueryGym is split into three surfaces. They share the same data contract, so a run from the toolkit or a third-party submitter lands in the same leaderboard.

querygym (pip)

Library

The toolkit itself. Methods, prompt bank, searchers, CLI.

pip install querygym

leaderboard.querygym.com

Leaderboard

SIGIR 2026 reproducibility results across IR benchmarks. Every row backed by a citable JSON.

View Leaderboard →

querygym.readthedocs.io

Docs

API reference, methods reference, contributor guide, schema docs.

Read the Docs →

Try it in 30 seconds

Reformulate a query against any OpenAI-compatible endpoint. Pyserini and BEIR are optional extras.

pip

pip install querygym

python

import querygym as qg

reformulator = qg.create_reformulator("genqr_ensemble", model="gpt-4.1-mini")
result = reformulator.reformulate(qg.QueryItem("q1", "what causes diabetes?"))
print(result.reformulated)

Full quickstart →

Supported methods

Nine reformulation methods, each with a registered prompt, a paper reference, and a reproducibility entry on the leaderboard.

GenQR
genqr

Generic LLM-driven keyword expansion.
GenQR Ensemble
genqr_ensemble

10 instruction variants for diverse keyword expansion.
Query2Doc
query2doc

Generates pseudo-documents from LLM knowledge.
QA Expand
qa_expand

Question-answer expansion with sub-questions.
MuGI
mugi

Multi-granularity expansion with adaptive concatenation.
LameR
lamer

Context-based passage synthesis from retrieved docs.
CSQE
csqe

Sentence-level context expansion (KEQE + CSQE).
ThinkQE
thinkqe

Multi-round reasoning with corpus feedback.
Query2E
query2e

Query-to-entity / keyword expansion.

Method reference →

Reproducible query reformulation, powered by LLMs.

What you get

Single Prompt Bank

Pluggable searchers

Stable run schema

OpenAI-compatible LLMs

Reproducible by design

Citable artifacts

The ecosystem

Try it in 30 seconds

Supported methods

GenQR

GenQR Ensemble

Query2Doc

QA Expand

MuGI

LameR

CSQE

ThinkQE

Query2E