Skip to content

harveybc/predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predictor

Description

The predictor project is a comprehensive tool for timeseries prediction, equipped with a robust plugin architecture. This project allows for both local and remote configuration handling, as well as replicability of experimental results. The system can be extended with custom plugins for various types of neural networks, including artificial neural networks (ANN), convolutional neural networks (CNN), long short-term memory networks (LSTM), and transformer-based models. Examples of the aforementioned models are included alongside with historical EURUSD and other training data in the examples directory.

Installation Instructions

To install and set up the predictor application, follow these steps:

  1. Clone the Repository:

    git clone https://github.com/harveybc/predictor.git
    cd predictor
  2. Add the clonned directory to the Windows or Linux PYTHONPATH environment variable:

In Windows a close of current command line promp may be required for the PYTHONPATH varible to be usable. Confirm you added the directory to the PYTHONPATH with the following commands:

  • On Windows, run:

    echo %PYTHONPATH%
  • On Linux, run:

    echo $PYTHONPATH 

If the clonned repo directory appears in the PYTHONPATH, continue to the next step.

  1. Create and Activate a Virtual Environment (Anaconda is required):

    • Using conda:
      conda create --name predictor-env python=3.9
      conda activate predictor-env
  2. Install Dependencies:

    pip install --upgrade pip
    pip install -r requirements.txt
  3. Build the Package:

    python -m build
  4. Install the Package:

    pip install .
  5. (Optional) Run the predictor:

    • On Windows, run the following command to verify installation (it uses all default valuex, use predictor.bat --help for complete command line arguments description):

      predictor.bat --load_config examples\config\phase_1\phase_1_ann_6300_1h_config.json
    • On Linux, run:

      sh predictor.sh --load_config examples\config\phase_1\phase_1_ann_6300_1h_config.json
  6. (Optional) Run Tests: For pasing remote tests, requires an instance of harveybc/data-logger

    • On Windows, run the following command to run the tests:
      set_env.bat
      pytest
      pytest
      
      
  7. (Optional) Generate Documentation:

    • Run the following command to generate code documentation in HTML format in the docs directory:
      pdoc --html -o docs app
  8. (Optional) Install Nvidia CUDA GPU support:

Please read: Readme - CUDA

Usage

Example config json files are located in examples\config, for a list of individual parameters to call via CLI or in a config json file, use: predictor.bat --help

After executing the prediction pipeline, the predictor will generate 4 files:

  • output_file: csv file, predictions for the selected time_horizon (see defaults in app\config.py)
  • results_file: csv file, aggregated results for the configured number of iterations of the training with the selected number of training epochs
  • loss_plot_file: png image, the plot of error vs epoch for training and validation in the last iteration
  • model_plot_file: png image, the plot of the used Keras model

The application supports several command line arguments to control its behavior for example:

usage: predictor.bat --load_config examples\config\phase_1\phase_1_ann_6300_1h_config.json --epochs 100 --iterations 5

There are many examples of config files in the examples\config directory, also training data of EURUSD and othertimeseries in examples\data and the results of the example config files are set to be on examples\results, there are some scripts to automate running sequential predictions in examples\scripts.

Distributed NEAT Optimization (via DOIN Network)

The predictor integrates with doin-node for distributed NEAT hyperparameter optimization using an island-model approach. Multiple GPU nodes collaboratively optimize TCN model parameters, sharing champions via blockchain.

Data Format

Input CSVs must contain only two columns: DATE_TIME and the target column (e.g., typical_price). All additional features (temporal encodings, window statistics) are generated online by the stl_preprocessor plugin during training, controlled by NEAT-optimizable parameters.

DATE_TIME,typical_price
2024-01-01 00:00:00,1.10234
2024-01-01 04:00:00,1.10156
...

NEAT-Optimizable Parameters

The NEAT optimizer can evolve these parameters (defined in hyperparameter_bounds):

Parameter Range Description
window_size [48, 160] Input sliding window length
tcn_filters [16, 128] TCN convolutional filters
tcn_kernel_size [2, 7] TCN kernel size
tcn_stack_layers [1, 4] TCN residual stacks
tcn_dilations_per_stack [2, 6] Dilations per stack
tcn_head_layers [1, 3] Dense head layers per horizon
tcn_head_units [16, 64] Units per head layer
use_temporal_features [0, 1] Enable sincos temporal features (hod/dow/moy)
hod_encoding [0, 2] Hour-of-day encoding: 0=none, 1=onehot, 2=sincos
dow_encoding [0, 2] Day-of-week encoding
moy_encoding [0, 2] Month-of-year encoding
add_window_stats [0, 1] Enable rolling std/ema/price-minus-ema features
add_multi_scale_returns [0, 1] Enable multi-scale return features
loss_type [0, 4] Loss: 0=mae, 1=huber, 2=trend_sigma, 3=pearson, 4=soft_dtw
use_log1p_features [0, 1] Apply log1p transform to target column
positional_encoding [0, 1] Sinusoidal positional encoding on input
learning_rate [1e-5, 1e-2] AdamW learning rate
batch_size [16, 64] Training batch size
tcn_dropout [0.0, 0.3] Dropout rate
l2_reg [1e-7, 1e-3] L2 regularization

Base model starts with 7 input features: 1 price + 6 temporal sincos (when use_temporal_features=1 with sincos encodings). NEAT can optionally add 6 more window stats features (rolling_std, rolling_ema, price_minus_ema for 2 periods) by evolving add_window_stats=1.

GPU Environment

For NVIDIA GPUs, set these environment variables before launching to prevent GPU memory pre-allocation:

export TF_FORCE_GPU_ALLOW_GROWTH=true    # MUST be "true", NOT "1" (TF rejects "1" silently)
export TF_GPU_ALLOCATOR=cuda_malloc_async

Without these, the parent process allocates all GPU memory, leaving none for subprocess candidates.

If CUDA was installed via pip install tensorflow[and-cuda] (no system /usr/local/cuda), you also need:

NB=$CONDA_PREFIX/lib/python3.12/site-packages/nvidia
export LD_LIBRARY_PATH="${NB}/cudnn/lib:${NB}/cublas/lib:${NB}/cuda_runtime/lib:${NB}/cufft/lib:${NB}/curand/lib:${NB}/cusolver/lib:${NB}/cusparse/lib:${NB}/cuda_cupti/lib:${NB}/nvjitlink/lib:${NB}/cuda_nvrtc/lib:${NB}/nccl/lib"

Without LD_LIBRARY_PATH, TensorFlow silently falls back to CPU (check with nvidia-smi — 0% GPU means it's not working).

Optimization Config

The optimization config file (e.g., examples/config/phase_1_daily/optimization/phase_1_tcn_neat_1d_optimization_config.json) defines:

  • Data files (train/val/test CSVs)
  • Plugin selection (tcn, neat_optimizer, stl_preprocessor, stl_pipeline)
  • NEAT parameters (population_size, n_generations, mutation rates)
  • Hyperparameter bounds
  • Default values for non-optimized parameters

Running Locally (Single Node)

export TF_FORCE_GPU_ALLOW_GROWTH=1
export TF_GPU_ALLOCATOR=cuda_malloc_async

predictor --load_config examples/config/phase_1_daily/optimization/phase_1_tcn_neat_1d_optimization_config.json

Running Distributed (DOIN Network)

See the doin-node README for multi-node deployment instructions.

Champion Training (No Optimization)

To retrain the best solution found by the distributed optimization as a standalone candidate:

predictor --load_config examples/config/phase_1_daily/phase_1_tcn_neat_champion_1d_training_config.json

Optimization Results & Metabase Integration

Results are stored in examples/results/phase_1_daily/:

File Description
phase_1_tcn_neat_1d_optimization_stats.json Per-generation statistics (champion fitness, MAE, species count)
phase_1_tcn_neat_1d_optimization_parameters.json Best champion hyperparameters found
phase_1_tcn_neat_1d_optimization_resume.json Full NEAT population state for resuming optimization
phase_1_tcn_neat_1d_rss.csv Memory usage log per candidate evaluation

The blockchain SQLite database from doin-node contains the full experiment history across all nodes and can be imported into Metabase for visualization. See the doin-node README for Metabase setup instructions.

Directory Structure

predictor/
│
├── app/                                 # Main application package
│   ├── __init__.py                     # Package initialization
│   ├── cli.py                          # Command-line interface handling
│   ├── config.py                       # Default configuration values
│   ├── config_handler.py               # Configuration management
│   ├── config_merger.py                # Configuration merging logic
│   ├── data_handler.py                 # Data loading and saving functions
│   ├── data_processor.py               # Core data processing pipeline
│   ├── main.py                         # Application entry point
│   ├── plugin_loader.py                # Dynamic plugin loading system
│   ├── reconstruction.py               # Data reconstruction utilities
│   └── plugins/                        # Prediction plugins directory
│       ├── predictor_plugin_ann.py     # Artificial Neural Network plugin
│       ├── predictor_plugin_cnn.py     # Convolutional Neural Network plugin
│       ├── predictor_plugin_lstm.py    # Long Short-Term Memory plugin
│       └── predictor_plugin_transformer.py # Transformer model plugin
│
├── tests/                              # Test suite directory
│   ├── __init__.py                    # Test package initialization
│   ├── conftest.py                    # pytest configuration
│   ├── acceptance_tests/              # User acceptance tests
│   ├── integration_tests/             # Integration test modules
│   ├── system_tests/                  # System-wide test cases
│   └── unit_tests/                    # Unit test modules
│
├── examples/                           # Example files directory
│   ├── data/                           # Example training data
│   └── scripts/                        # Example execution scripts
│
├── setup.py                           # Package installation script
├── predictor.bat                      # Windows execution script
├── predictor.sh                       # Linux execution script
├── set_env.bat                        # Windows environment setup
├── set_env.sh                         # Linux environment setup
├── requirements.txt                    # Python dependencies
├── LICENSE.txt                        # Project license
└── prompt.txt                         # Project documentation

Example of plugin model:

graph TD

    subgraph SP_Input ["Input Processing (Features Only)"]
        I[/"Input (ws, num_channels)"/] --> FS{"Split Features"};

        subgraph SP_Branches ["Feature Branches (Parallel)"]
             FS -- Feature 1 --> F1_FLAT["Flatten"] --> F1_DENSE["Dense x M"];
             FS -- ... --> F_DOTS["..."];
             FS -- Feature n --> Fn_FLAT["Flatten"] --> Fn_DENSE["Dense x M"];
        end

        F1_DENSE --> M{"Merge Concat Features"};
        F_DOTS --> M;
        Fn_DENSE --> M;
    end

    subgraph SP_Heads ["Output Heads (Parallel)"]

        subgraph Head1 ["Head for Horizon 1"]
            M --> H1_DENSE["Dense x K"];
            H1_DENSE --> H1_BAYES{"DenseFlipout (Bayesian)"};
            H1_DENSE --> H1_BIAS["Dense (Bias)"];
            H1_BAYES --> H1_ADD{"Add"};
            H1_BIAS --> H1_ADD;
            H1_ADD --> O1["Output H1"];
        end

         subgraph HeadN ["Head for Horizon N"]
            M --> HN_DENSE["Dense x K"];
            HN_DENSE --> HN_BAYES{"DenseFlipout (Bayesian)"};
            HN_DENSE --> HN_BIAS["Dense (Bias)"];
            HN_BAYES --> HN_ADD{"Add"};
            HN_BIAS --> HN_ADD;
            HN_ADD --> ON["Output HN"];
        end

    end

    O1 --> Z((Final Output List));
    ON --> Z;

    subgraph Legend
         NoteM["M = config['intermediate_layers']"];
         NoteK["K = config['intermediate']"];
         NoteNoFB["NOTE: Diagram simplified - Feedback loops not shown."];
    end

    style H1_BAYES,HN_BAYES fill:#556B2F,stroke:#333,color:#fff;
    style H1_BIAS,HN_BIAS fill:#4682B4,stroke:#333,color:#fff;
    style NoteM,NoteK,NoteNoFB fill:#8B413,stroke:#333,stroke-dasharray:5 5,color:#fff;

Loading

About

Predictor that uses a configurable plugin-based predictive supervised learning model, to make forecasts for a configurable time horizon in a timeseries, using heterogeneus multivariate timeseries data as input, the input data needs to be aligned with the timeseries to be used as training signals. Includes 4 built-in deep-learning predictive models.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages