This repository contains code and data used for experiments on cluster-based search and intent-aware reranking for the MSRD dataset.
- Goal: Evaluate cluster-based search strategies (including Trustpilot Cluster-Based Search — TP-CBS) and intent-aware reranking on the MSRD movie recommendation / retrieval dataset.
- Approach: Precompute movie and cluster embeddings, run multiple retrieval methods (BM25, embedding cosine, hybrid, cluster-based reranking, LLM-based relevance scoring), compute metrics (NDCG, latency), and compare methods per intent.
README.md— this file.requirements.txt— Python dependencies required to run scripts.data/MSRD/— dataset and derived files used by experiments:movies.jsonl— movie records (JSON Lines).movies_with_clusters.csv— movie metadata joined with cluster assignments.movie_clusters.csv— cluster assignments for movies.queries.csv— query set used in evaluation.query_intent.csv— ground-truth query intents.genres.csv— genre metadata and descriptions.msrd_relevance_predictions.csv— LLM-based relevance scores for (query, movie) pairs; computed using the Gemini model with prompts from theprompts/directory.
prompts/— LLM prompts used for obtaining relevancy scores through language models (system instructions and prompt templates for the Gemini-based relevance scorer).scripts/— command-line scripts used in experiments:01_query_intent_classifier.py— training and inference routines for the query intent classifier used for intent-aware routing.02_gemini_rel_scores_MSRD.py— computes LLM-based relevance scores via direct Gemini API calls and saves results to CSV.03_benchmark_MSRD.py— runs full end-to-end benchmarks (loading data, retrieval, reranking, and metric computation).
-
scripts/01_query_intent_classifier.py- Purpose: Infer query intent labels used to select intent-aware reranking policies in experiments.
- Authentication: controlled by the
USE_VERTEXflag at the top of the script (default:False).USE_VERTEX = False(default): set theGOOGLE_API_KEYenvironment variable.USE_VERTEX = True: setPROJECTandLOCATIONin the script and authenticate viagcloud.
- Example:
export GOOGLE_API_KEY="your_key_here" python scripts/01_query_intent_classifier.py --dataset msrd
-
scripts/02_gemini_rel_scores_MSRD.py- Purpose: Compute soft relevance scores for (query, movie) pairs using the Gemini API. Results are saved directly to
data/MSRD/msrd_relevance_predictions.csvfor use with script 03. - Authentication: controlled by the
USE_VERTEXflag at the top of the script (default:False).USE_VERTEX = False(default): set theGOOGLE_API_KEYenvironment variable.USE_VERTEX = True: setVERTEX_PROJECTandVERTEX_LOCATIONin the script and authenticate viagcloud.
- Example (Gemini API key):
export GOOGLE_API_KEY="your_key_here" python scripts/02_gemini_rel_scores_MSRD.py
- Purpose: Compute soft relevance scores for (query, movie) pairs using the Gemini API. Results are saved directly to
-
scripts/03_benchmark_MSRD.py- Purpose: End-to-end experiment driver. Loads MSRD data, computes or loads embeddings, runs retrieval methods (BM25, embedding cosine, hybrid, TP-CBS), and reports evaluation metrics (NDCG@k, latency, summary tables used in the paper).
- Example:
python scripts/03_benchmark_MSRD.py --data-dir data/MSRD --output results/benchmark.json
-
Create and activate a virtual environment: A Python virtual environment isolates project dependencies from your system Python installation.
python3 -m venv .venv source .venv/bin/activate -
Install dependencies: Install all required packages listed in
requirements.txt:pip install -r requirements.txt
-
Prepare data: Ensure MSRD files are present under
data/MSRD/. The scripts assume the filenames listed above.Important: This repository already contains all data necessary to reproduce the benchmarks. You can skip directly to step 4. If you wish to recreate the intermediate data (query intents and relevance scores), see Optional Data Recreation below.
-
Run the main benchmark:
python scripts/03_benchmark_MSRD.py --data-dir data/MSRD --output results/benchmark.json
If you wish to recreate query intents and relevance predictions from scratch:
-
Predict query intents:
export GOOGLE_API_KEY="your_key_here" python scripts/01_query_intent_classifier.py --dataset msrd
-
Compute LLM-based relevance scores:
export GOOGLE_API_KEY="your_key_here" python scripts/02_gemini_rel_scores_MSRD.py
- Hardware: Experiments were conducted on an Apple M3 PRO with 36GB RAM.
- Set random seeds in scripts to reproduce results.
- Use the versions in
requirements.txtto ensure a consistent environment. - For GPU runs, install an appropriate CUDA-enabled PyTorch build and select devices via script options or environment variables.
- For full reproducibility, use
scripts/03_benchmark_MSRD.pyas the primary entry point.
- To swap in a different reranker or retrieval model, adapt the corresponding section in
scripts/03_benchmark_MSRD.py.
If you use this code in your research, please cite the accompanying ECML PKDD 2026 submission and include a link to this repository.