Reforge

A modular pipeline for controlled reverse engineering experiments. Reforge compiles synthetic C programs into ELF binaries under a defined build matrix, extracts DWARF-based ground truth, decompiles stripped variants, and evaluates LLM-generated source recovery against that ground truth.

Overview

The system is composed of domain-specific workers coordinated through a FastAPI orchestration layer:

Component	Role
API (`app/`)	HTTP interface; routes requests to workers
Builder (`workers/builder/`)	Compiles C source to ELF across optimization levels and variants (debug, release, stripped)
Oracle TS (`workers/oracle_ts/`)	Parses preprocessed C translation units with tree-sitter; indexes functions and structural nodes
Oracle DWARF (`workers/oracle_dwarf/`)	Extracts function boundaries, line mappings, and per-function verdicts from debug DWARF info
Join DWARF-TS (`workers/join_dwarf_ts/`)	Aligns DWARF and tree-sitter function inventories using GCC line directives
Analyzer Ghidra Decompile (`workers/analyzer_ghidra_decompile/`)	Headless decompilation of stripped binaries; extracts CFG, calls, variables, and function metadata
Join Oracles to Ghidra (`workers/join_oracles_to_ghidra_decompile/`)	Merges oracle outputs with Ghidra decompilation to construct experiment substrate
LLM (`workers/llm/`)	Async experiment runner for LLM-assisted function naming and source recovery
Data Module (`data/`)	Evaluation schemas, scoring functions, and reproducible analysis notebooks

Quick Start

Local Development Setup

Requires Python 3.11+.

# Create and activate virtual environment
python -m venv .venv

# Linux / macOS
source .venv/bin/activate

# Windows (PowerShell)
Set-ExecutionPolicy -ExecutionPolicy Bypass -Scope Process
.venv\Scripts\Activate.ps1

# Install dependencies
pip install -r requirements.txt

# Run tests
python -m pytest data/tests/ workers/ -q

A pinned lockfile (requirements-lock.txt) is available for reproducible installs.

Pre-built Data (no LLM calls needed)

Download the data-v1.0 release, extract into docker/local-files/, and run the notebooks directly.

Docker (full pipeline)

cp docker/.env.example docker/.env   # edit with your credentials
cd docker
docker compose up -d
curl http://localhost:8080/health

API documentation is available at http://localhost:8080/docs.

Structure

app/ — FastAPI orchestration layer (README)
workers/ — Domain-specific processing workers (each with README and LOCK.md)
data/ — Evaluation module with notebooks (README)
C-Programs/ — Synthetic test programs (README)
scripts/ — Notebooks to help run the pipeline and Experimental notebooks (README)
docker/ — Docker Compose configuration

Evaluation Notebooks

The data/results/ directory contains reproducible pipeline evaluation notebooks:

PIPELINE_candidate_ambiguity — Alignment ambiguity
PIPELINE_counts — Function counts and verdict distributions
PIPELINE_data_quality — Data completeness and schema validation
PIPELINE_decompiler_quality — Ghidra output quality metrics
PIPELINE_ghidra_yield — Decompiler yield analysis
PIPELINE_opt_induced_uncertainty — Optimization effects on ambiguity and stability
PIPELINE_transitions — Verdict transitions across build variants

Test notebooks to investigate join criteria: data/results/join_health

Notebook with selected Thesis Figures: data/results/THESIS_FINAL

See data/README.md for details.

Scope and Contracts

Each worker defines its own scope lock (LOCK.md) specifying supported inputs, outputs, non-goals, and extension points. Refer to those files and per-worker READMEs for implementation details.

License

TBD

Acknowledgements

Part of thesis research on LLM-assisted binary analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reforge

Overview

Quick Start

Local Development Setup

Pre-built Data (no LLM calls needed)

Docker (full pipeline)

Structure

Evaluation Notebooks

Scope and Contracts

License

Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
C-Programs		C-Programs
app		app
data		data
db		db
docker		docker
docs		docs
ghidra_scripts		ghidra_scripts
scripts		scripts
workers		workers
.gitignore		.gitignore
README.md		README.md
requirements-lock.txt		requirements-lock.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Reforge

Overview

Quick Start

Local Development Setup

Pre-built Data (no LLM calls needed)

Docker (full pipeline)

Structure

Evaluation Notebooks

Scope and Contracts

License

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages