DP-GEN is a Python scientific computing package for generating deep learning-based interatomic potential energy models. It integrates with HPC systems, molecular dynamics software (LAMMPS, Gromacs), and ab-initio calculation software (VASP, PWSCF, etc.).
Always reference these instructions first and fallback to search or bash commands only when you encounter unexpected information that does not match the info here.
- Environment Setup:
- Ensure Python 3.9+ is available:
python --version - Create virtual environment:
python -m venv dpgen_env && source dpgen_env/bin/activate - Preferred: Install with uv:
uv pip install -e .-- takes 1-5 minutes due to scientific dependencies. Set timeout to 10+ minutes. - Alternative:
pip install -e .(fallback if uv not available) - Released version:
uv pip install dpgenorpip install dpgen - Test installation:
dpgen -h
- Ensure Python 3.9+ is available:
- Dependencies: Include large scientific packages (numpy, pymatgen, ASE, etc.)
- Installation Time: Full installation takes 1-5 minutes with uv, 3-10 minutes with pip. Use timeout of 10+ minutes.
- If network timeouts occur, retry with:
uv pip install --timeout 600 -e .orpip install --timeout 600 -e . - For faster testing:
uv pip install --no-deps -e .orpip install --no-deps -e .(installs without dependencies, will fail at runtime)
- Unit Tests:
python -m unittest discover tests -v-- takes 2-5 minutes. Set timeout to 10+ minutes. - Coverage:
coverage run --source=./dpgen -m unittest -v && coverage report-- takes 3-8 minutes. Set timeout to 15+ minutes. - Quick Test:
python -m unittest tests.test_load_file -v-- takes <1 second (will fail without dependencies but tests framework) - CLI Test:
python -m unittest tests.test_cli -v-- tests all dpgen subcommands
- No Traditional Build: This is a pure Python package using setuptools
- Documentation Build:
cd doc && make html-- takes 2-5 minutes. Set timeout to 10+ minutes. - Package Build:
python -m build-- takes 1-3 minutes
- Use Semantic Commit Messages: Follow conventional commit format for all commits and PR titles
feat:for new featuresfix:for bug fixesdocs:for documentation changestest:for test additions/modificationsrefactor:for code refactoringchore:for maintenance tasks- Examples:
feat: add comprehensive GitHub Copilot instructions,fix: resolve timeout in dependency installation
- Prefer uv: Use
uvfor Python dependency management when available- Installation:
uv pip install -e . - Adding dependencies:
uv add package-name - Fallback to
piponly whenuvis not available in the environment
- Installation:
- Purpose: Main deep potential generation with iterative training/exploration/labeling
- Usage:
dpgen run param.json machine.json - Key Files:
param.json: System parameters, training config, exploration settingsmachine.json: HPC/computation resource configuration
- Process: Creates iterations (iter.000001, iter.000002, etc.) with 00.train, 01.model_devi, 02.fp subdirectories
- init_bulk:
dpgen init_bulk param.json [machine.json]-- bulk systems - init_surf:
dpgen init_surf param.json [machine.json]-- surface systems - init_reaction:
dpgen init_reaction param.json [machine.json]-- reactive systems - Purpose: Generate initial training data for deep potential models
- Usage:
dpgen autotest make|run|post param.json [machine.json] - Three Phases:
make: Set up calculation tasksrun: Execute calculationspost: Analyze results
- Supports: VASP, ABACUS, DeepMD, MEAM, EAM force fields
- Usage:
dpgen simplify param.json machine.json - Purpose: Reduce dataset size while maintaining quality
- collect:
dpgen collect JOB_DIR OUTPUT-- gather generated data - db:
dpgen db param.json-- database operations - gui:
dpgen gui-- web interface (requires dpgui package)
- Location:
examples/directory contains templates for different scenarios - Key Examples:
examples/run/deprecated/dp2.x-lammps-vasp/param.json-- VASP with LAMMPSexamples/run/dp2.x-lammps-gaussian/param.json-- Gaussian calculationsexamples/run/dp-calypso-vasp/param.json-- CALYPSO with VASPexamples/run/ch4/param.json-- CH4 system example
- Format: JSON with extensive nested parameters for system, training, exploration settings
- Location:
examples/machine/directory - Purpose: Define computational resources (HPC systems, cloud platforms)
- Examples:
examples/machine/slurm/-- Slurm cluster configurationsexamples/machine/pbs/-- PBS system setupexamples/machine/lebesgue/-- Cloud platform integration
- Full Dependencies Required: All validation requires
uv pip install -e .orpip install -e .(1-5 min install) python -m unittest discover tests -v-- full test suite (requires dependencies)dpgen -h && dpgen run -h && dpgen autotest -h-- CLI validation (requires installation)
- Code Exploration: All source code can be read and edited without installation
- Local Import Test:
python -c "import sys; sys.path.insert(0, '.'); import dpgen"(0.03s) - JSON Examples: Configuration files in
examples/can be examined without running - Test Framework:
time python -m unittest tests.test_load_file -vshows test structure (0.15s, will fail without deps) - File Structure: Use
find examples -name "*.json"to explore configuration templates
dpgen/-- Main source codegenerator/-- Core DP-GEN functionality (dpgen run)auto_test/-- Testing framework (dpgen autotest)data/-- Data preparation (dpgen init_*)simplify/-- Dataset optimizationmain.py-- CLI entry point
tests/-- Unit tests organized by moduleexamples/-- Configuration templates and examplesdoc/-- Sphinx documentation
pyproject.toml-- Modern Python packaging configuration.github/workflows/test.yml-- CI pipeline (Python 3.9, 3.12)dpgen/main.py-- CLI command definitions and entry pointsexamples/init/surf.json-- Surface initialization exampleexamples/run/ch4/param.json-- Complete run parameter example
- Main CLI: Edit
dpgen/main.pyto add new subcommands - Core Logic: Extend modules in
dpgen/generator/,dpgen/auto_test/, etc. - Tests: Add corresponding tests in
tests/directory - Examples: Provide configuration examples in
examples/
- Linting: Uses ruff configuration in pyproject.toml
- Import Style: isort with black profile
- Documentation: Numpy-style docstrings, Sphinx for docs
- Heavy Dependencies: Package integrates with VASP, LAMMPS, Gaussian, etc.
- File I/O: Uses dpdata for structure file handling
- HPC Integration: dpdispatcher for job submission/management
- Configuration: Complex JSON schemas for scientific computing workflows
- Installation: 1-5 minutes with uv, 3-10 minutes with pip (timeout 10+ minutes)
- Full Test Suite: 2-5 minutes (timeout 10+ minutes)
- Documentation Build: 2-5 minutes (timeout 10+ minutes)
- Quick Import Test: <1 second
- CI Pipeline: Runs on Python 3.9 and 3.12, full cycle ~10-15 minutes
- Network Timeouts: Scientific packages are large, increase pip timeout
- Missing System Dependencies: Some packages require system libraries
- Version Conflicts: Pin to specific Python versions (3.9-3.12)
- Missing Dependencies: Many features require specific scientific software
- Configuration Errors: JSON files have complex nested schemas
- HPC Connectivity: Machine files require proper authentication setup
Always test with minimal examples from examples/ directory before using custom configurations.