The predictor project is a comprehensive tool for timeseries prediction, equipped with a robust plugin architecture. This project allows for both local and remote configuration handling, as well as replicability of experimental results. The system can be extended with custom plugins for various types of neural networks, including artificial neural networks (ANN), convolutional neural networks (CNN), long short-term memory networks (LSTM), and transformer-based models. Examples of the aforementioned models are included alongside with historical EURUSD and other training data in the examples directory.
To install and set up the predictor application, follow these steps:
-
Clone the Repository:
git clone https://github.com/harveybc/predictor.git cd predictor -
Add the clonned directory to the Windows or Linux PYTHONPATH environment variable:
In Windows a close of current command line promp may be required for the PYTHONPATH varible to be usable. Confirm you added the directory to the PYTHONPATH with the following commands:
-
On Windows, run:
echo %PYTHONPATH% -
On Linux, run:
echo $PYTHONPATH
If the clonned repo directory appears in the PYTHONPATH, continue to the next step.
-
Create and Activate a Virtual Environment (Anaconda is required):
- Using
conda:conda create --name predictor-env python=3.9 conda activate predictor-env
- Using
-
Install Dependencies:
pip install --upgrade pip pip install -r requirements.txt
-
Build the Package:
python -m build
-
Install the Package:
pip install . -
(Optional) Run the predictor:
-
On Windows, run the following command to verify installation (it uses all default valuex, use predictor.bat --help for complete command line arguments description):
predictor.bat --load_config examples\config\phase_1\phase_1_ann_6300_1h_config.json
-
On Linux, run:
sh predictor.sh --load_config examples\config\phase_1\phase_1_ann_6300_1h_config.json
-
-
(Optional) Run Tests: For pasing remote tests, requires an instance of harveybc/data-logger
- On Windows, run the following command to run the tests:
pytest
set_env.bat pytest
- On Windows, run the following command to run the tests:
-
(Optional) Generate Documentation:
- Run the following command to generate code documentation in HTML format in the docs directory:
pdoc --html -o docs app
- Run the following command to generate code documentation in HTML format in the docs directory:
-
(Optional) Install Nvidia CUDA GPU support:
Please read: Readme - CUDA
Example config json files are located in examples\config, for a list of individual parameters to call via CLI or in a config json file, use: predictor.bat --help
After executing the prediction pipeline, the predictor will generate 4 files:
- output_file: csv file, predictions for the selected time_horizon (see defaults in app\config.py)
- results_file: csv file, aggregated results for the configured number of iterations of the training with the selected number of training epochs
- loss_plot_file: png image, the plot of error vs epoch for training and validation in the last iteration
- model_plot_file: png image, the plot of the used Keras model
The application supports several command line arguments to control its behavior for example:
usage: predictor.bat --load_config examples\config\phase_1\phase_1_ann_6300_1h_config.json --epochs 100 --iterations 5
There are many examples of config files in the examples\config directory, also training data of EURUSD and othertimeseries in examples\data and the results of the example config files are set to be on examples\results, there are some scripts to automate running sequential predictions in examples\scripts.
The predictor integrates with doin-node for distributed NEAT hyperparameter optimization using an island-model approach. Multiple GPU nodes collaboratively optimize TCN model parameters, sharing champions via blockchain.
Input CSVs must contain only two columns: DATE_TIME and the target column (e.g., typical_price). All additional features (temporal encodings, window statistics) are generated online by the stl_preprocessor plugin during training, controlled by NEAT-optimizable parameters.
DATE_TIME,typical_price
2024-01-01 00:00:00,1.10234
2024-01-01 04:00:00,1.10156
...
The NEAT optimizer can evolve these parameters (defined in hyperparameter_bounds):
| Parameter | Range | Description |
|---|---|---|
window_size |
[48, 160] | Input sliding window length |
tcn_filters |
[16, 128] | TCN convolutional filters |
tcn_kernel_size |
[2, 7] | TCN kernel size |
tcn_stack_layers |
[1, 4] | TCN residual stacks |
tcn_dilations_per_stack |
[2, 6] | Dilations per stack |
tcn_head_layers |
[1, 3] | Dense head layers per horizon |
tcn_head_units |
[16, 64] | Units per head layer |
use_temporal_features |
[0, 1] | Enable sincos temporal features (hod/dow/moy) |
hod_encoding |
[0, 2] | Hour-of-day encoding: 0=none, 1=onehot, 2=sincos |
dow_encoding |
[0, 2] | Day-of-week encoding |
moy_encoding |
[0, 2] | Month-of-year encoding |
add_window_stats |
[0, 1] | Enable rolling std/ema/price-minus-ema features |
add_multi_scale_returns |
[0, 1] | Enable multi-scale return features |
loss_type |
[0, 4] | Loss: 0=mae, 1=huber, 2=trend_sigma, 3=pearson, 4=soft_dtw |
use_log1p_features |
[0, 1] | Apply log1p transform to target column |
positional_encoding |
[0, 1] | Sinusoidal positional encoding on input |
learning_rate |
[1e-5, 1e-2] | AdamW learning rate |
batch_size |
[16, 64] | Training batch size |
tcn_dropout |
[0.0, 0.3] | Dropout rate |
l2_reg |
[1e-7, 1e-3] | L2 regularization |
Base model starts with 7 input features: 1 price + 6 temporal sincos (when use_temporal_features=1 with sincos encodings). NEAT can optionally add 6 more window stats features (rolling_std, rolling_ema, price_minus_ema for 2 periods) by evolving add_window_stats=1.
For NVIDIA GPUs, set these environment variables before launching to prevent GPU memory pre-allocation:
export TF_FORCE_GPU_ALLOW_GROWTH=true # MUST be "true", NOT "1" (TF rejects "1" silently)
export TF_GPU_ALLOCATOR=cuda_malloc_asyncWithout these, the parent process allocates all GPU memory, leaving none for subprocess candidates.
If CUDA was installed via pip install tensorflow[and-cuda] (no system /usr/local/cuda), you also need:
NB=$CONDA_PREFIX/lib/python3.12/site-packages/nvidia
export LD_LIBRARY_PATH="${NB}/cudnn/lib:${NB}/cublas/lib:${NB}/cuda_runtime/lib:${NB}/cufft/lib:${NB}/curand/lib:${NB}/cusolver/lib:${NB}/cusparse/lib:${NB}/cuda_cupti/lib:${NB}/nvjitlink/lib:${NB}/cuda_nvrtc/lib:${NB}/nccl/lib"Without LD_LIBRARY_PATH, TensorFlow silently falls back to CPU (check with nvidia-smi — 0% GPU means it's not working).
The optimization config file (e.g., examples/config/phase_1_daily/optimization/phase_1_tcn_neat_1d_optimization_config.json) defines:
- Data files (train/val/test CSVs)
- Plugin selection (tcn, neat_optimizer, stl_preprocessor, stl_pipeline)
- NEAT parameters (population_size, n_generations, mutation rates)
- Hyperparameter bounds
- Default values for non-optimized parameters
export TF_FORCE_GPU_ALLOW_GROWTH=1
export TF_GPU_ALLOCATOR=cuda_malloc_async
predictor --load_config examples/config/phase_1_daily/optimization/phase_1_tcn_neat_1d_optimization_config.jsonSee the doin-node README for multi-node deployment instructions.
To retrain the best solution found by the distributed optimization as a standalone candidate:
predictor --load_config examples/config/phase_1_daily/phase_1_tcn_neat_champion_1d_training_config.jsonResults are stored in examples/results/phase_1_daily/:
| File | Description |
|---|---|
phase_1_tcn_neat_1d_optimization_stats.json |
Per-generation statistics (champion fitness, MAE, species count) |
phase_1_tcn_neat_1d_optimization_parameters.json |
Best champion hyperparameters found |
phase_1_tcn_neat_1d_optimization_resume.json |
Full NEAT population state for resuming optimization |
phase_1_tcn_neat_1d_rss.csv |
Memory usage log per candidate evaluation |
The blockchain SQLite database from doin-node contains the full experiment history across all nodes and can be imported into Metabase for visualization. See the doin-node README for Metabase setup instructions.
predictor/
│
├── app/ # Main application package
│ ├── __init__.py # Package initialization
│ ├── cli.py # Command-line interface handling
│ ├── config.py # Default configuration values
│ ├── config_handler.py # Configuration management
│ ├── config_merger.py # Configuration merging logic
│ ├── data_handler.py # Data loading and saving functions
│ ├── data_processor.py # Core data processing pipeline
│ ├── main.py # Application entry point
│ ├── plugin_loader.py # Dynamic plugin loading system
│ ├── reconstruction.py # Data reconstruction utilities
│ └── plugins/ # Prediction plugins directory
│ ├── predictor_plugin_ann.py # Artificial Neural Network plugin
│ ├── predictor_plugin_cnn.py # Convolutional Neural Network plugin
│ ├── predictor_plugin_lstm.py # Long Short-Term Memory plugin
│ └── predictor_plugin_transformer.py # Transformer model plugin
│
├── tests/ # Test suite directory
│ ├── __init__.py # Test package initialization
│ ├── conftest.py # pytest configuration
│ ├── acceptance_tests/ # User acceptance tests
│ ├── integration_tests/ # Integration test modules
│ ├── system_tests/ # System-wide test cases
│ └── unit_tests/ # Unit test modules
│
├── examples/ # Example files directory
│ ├── data/ # Example training data
│ └── scripts/ # Example execution scripts
│
├── setup.py # Package installation script
├── predictor.bat # Windows execution script
├── predictor.sh # Linux execution script
├── set_env.bat # Windows environment setup
├── set_env.sh # Linux environment setup
├── requirements.txt # Python dependencies
├── LICENSE.txt # Project license
└── prompt.txt # Project documentation
graph TD
subgraph SP_Input ["Input Processing (Features Only)"]
I[/"Input (ws, num_channels)"/] --> FS{"Split Features"};
subgraph SP_Branches ["Feature Branches (Parallel)"]
FS -- Feature 1 --> F1_FLAT["Flatten"] --> F1_DENSE["Dense x M"];
FS -- ... --> F_DOTS["..."];
FS -- Feature n --> Fn_FLAT["Flatten"] --> Fn_DENSE["Dense x M"];
end
F1_DENSE --> M{"Merge Concat Features"};
F_DOTS --> M;
Fn_DENSE --> M;
end
subgraph SP_Heads ["Output Heads (Parallel)"]
subgraph Head1 ["Head for Horizon 1"]
M --> H1_DENSE["Dense x K"];
H1_DENSE --> H1_BAYES{"DenseFlipout (Bayesian)"};
H1_DENSE --> H1_BIAS["Dense (Bias)"];
H1_BAYES --> H1_ADD{"Add"};
H1_BIAS --> H1_ADD;
H1_ADD --> O1["Output H1"];
end
subgraph HeadN ["Head for Horizon N"]
M --> HN_DENSE["Dense x K"];
HN_DENSE --> HN_BAYES{"DenseFlipout (Bayesian)"};
HN_DENSE --> HN_BIAS["Dense (Bias)"];
HN_BAYES --> HN_ADD{"Add"};
HN_BIAS --> HN_ADD;
HN_ADD --> ON["Output HN"];
end
end
O1 --> Z((Final Output List));
ON --> Z;
subgraph Legend
NoteM["M = config['intermediate_layers']"];
NoteK["K = config['intermediate']"];
NoteNoFB["NOTE: Diagram simplified - Feedback loops not shown."];
end
style H1_BAYES,HN_BAYES fill:#556B2F,stroke:#333,color:#fff;
style H1_BIAS,HN_BIAS fill:#4682B4,stroke:#333,color:#fff;
style NoteM,NoteK,NoteNoFB fill:#8B413,stroke:#333,stroke-dasharray:5 5,color:#fff;