NoVacancy is a machine learning application that predicts hotel reservation cancellations using historical booking data. The dataset is sourced from ScienceDirect and contains approximately 36,000 reservations from two hotels (one resort, one urban) with arrival dates between July 1, 2015, and August 31, 2017. The dataset includes 17 features, with cancellation status serving as the binary target variable.
- Docker Desktop (v4.0+)
- Git
- Clone the repository
git clone https://github.com/Morgan-Sell/no-vacancy.git
cd no-vacancy- Create environment file
cp .env.example .envUpdate .env with your preferred database credentials or use the defaults.
- Start the application
docker compose --profile airflow up -d --buildFirst build takes ~3-5 minutes. Subsequent starts are faster.
- Verify all services are running
docker ps --format "table {{.Names}}\t{{.Status}}"You should see: novacancy-frontend, novacancy-inference, mlflow, airflow-webserver, airflow-scheduler, and the database containers all with "Up" status.
| Service | URL | Purpose |
|---|---|---|
| Frontend | http://localhost:5050 | Booking form & predictions |
| Airflow | http://localhost:8080 | Training pipeline orchestration |
| MLflow | http://localhost:5001 | Model registry & experiment tracking |
| FastAPI | http://localhost:8000/docs | Inference API documentation |
Airflow credentials: homer / waffles
Before making predictions, you need to train a model:
- Navigate to http://localhost:5050
- Click Train Model
- Watch the training progress panel update in real-time
- (Optional) Monitor detailed progress in Airflow at http://localhost:8080
- Verify model registration in MLflow at http://localhost:5001
- Navigate to http://localhost:5050
- Fill out the reservation form with guest and booking details
- Click Predict Cancellation
- View the cancellation risk assessment
docker compose --profile airflow downTo also remove volumes (database data):
docker compose --profile airflow down -vNoVacancy is built on a modular, containerized architecture featuring:
- API Framework: FastAPI for RESTful prediction services
- Data Storage: PostgreSQL with medallion architecture (Bronze/Silver/Gold layers)
- ML Operations: MLflow for experiment tracking and model registry
- Infrastructure: Docker Compose for container orchestration
- CI/CD: GitHub Actions for automated linting, testing and deployment
- Database Migrations: Alembic for schema version control
- Orchestration: Airflow for workflow management and scheduling
- Data Validation: Great Expectations for pipeline-level data quality checks
- Model Monitoring: Evidently AI for drift detection and performance monitoring (planned)
The machine learning pipeline and preprocessing strategies are based on comprehensive exploratory data analysis available in the EDA notebook.
The system implements a medallion architecture with integrated MLOps workflows, enabling automated model training, validation, and deployment with human oversight for production model promotion.
Implements end-to-end frontend integration connecting the Flask UI to FastAPI inference and Airflow training orchestration.
Frontend
- Flask proxy routes for predictions (
/api/predict) and training (/api/train) - Real-time training progress polling with task-level status updates
- Tropical "Oracle of Occupancy" UI with White Lotus-inspired design
MLOps Pipeline
- Gated model promotion workflow: models register to Staging, auto-promote to Production only after validation passes AUC threshold
- Idempotent Silver DB writes (TRUNCATE before insert) for reliable pipeline reruns
- MLflow artifact volume mounts shared across Airflow and inference containers
Infrastructure Fixes
- Airflow init container dependency (
service_completed_successfullyfor one-shot containers) - 12-factor logging: stdout in Airflow context, file logging elsewhere
- Prediction endpoint returns cancellation probability (not binary outcome)
Implemented Great Expectations to ensure data quality during ingestion and feature engineering. The validation tests focused on critical checks. There are a few validation tests with medium severity. medium severity is used to flag unexpected values that will not break the model pipeline.
Integrated Apache Airflow orchestration for the machine learning pipeline via Docker containers. The system executes a five-task workflow: data import, model training, prediction generation, artifact validation, and cleanup operations. This branch demonstrates production-ready ML pipeline orchestration with proper dependency management and error handling.
-
Clone the repo.
git clone https://github.com/Morgan-Sell/no-vacancy.git -
Switch to
orchestrationfeature branch.git checkout orchestration -
Start the orchestration and training services.
docker compose --profile airflow --profile training up -d
4.Access the Airflow web interface at http://localhost:8080. You'll see the login page.
-
Log in using:
- Username:
homer - Password:
waffles
- Username:
-
After successful authentication, the Airflow dashboard displays the
training_pipelineDAG with its linear task dependencies.
- The DAG can be triggered manually or runs on its scheduled interval (weekly). Each task executes in isolated Docker containers with proper dependency management between data processing, training, and validation stages.
Implemented a CI pipeline with GitHub Actions for automated linting and testing. Enhanced the MLflow integration to provide complete model lifecycle management - the system now logs trained models and preprocessing artifacts in trainer.py and retrieves them for inference in predictor.py, enabling consistent model versioning and deployment.
-
Clone the repo.
git clone https://github.com/Morgan-Sell/no-vacancy.git -
Switch to
ci-pipe-v2feature branch.git checkout ci-pipe-v2 -
Add the following GitHub secrets:
BRONZE_DBBRONZE_DB_HOSTDB_PASSWORDDB_PORTDB_USERGOLD_DBGOLD_DB_HOSTMLFLOW_DBMLFLOW_DB_HOSTMLFLOW_TRACKING_URISILVER_DBSILVER_DB_HOSTTEST_DBTEST_DB_HOSTTEST_DB_PASSWORDTEST_DB_PORTTEST_DB_USER
-
Every time code changes are pushed to GitHub, the CI pipeline will be executed. To see the CI pipeline results, navigate to the Actions tab. Here you will see a list of all completed workflows.
- The CI pipeline only checks code style; it doesn't fix the code. To enable auto-fixing,
.pre-commit-config.yamlwas created. Before committing/pushing your code changes to GitHub, run the following code in your terminalpre-commit run --all-files --config .pre-commit-config.yaml
When NoVacancyPipeline is trained, model artifacts and experimentation results are saved to MLflow repository.
-
Clone the repo.
git clone https://github.com/Morgan-Sell/no-vacancy.git -
Switch to
mlflowfeature branch.git checkout mlflow -
Create a
.envfile in the project root directory with the following variables:- DB_USER
- DB_PASSWORD
- DB_PORT
- BRONZE_DB_HOST
- BRONZE_DB
- SILVER_DB_HOST
- SILVER_DB
- GOLD_DB_HOST
- GOLD_DB
- TEST_DB_USER
- TEST_DB_PASSWORD
- TEST_DB_HOST
- TEST_DB
- TEST_DB_PORT
- MLFLOW_DB_HOST
- MLFLOW_DB
- MLFLOW_TRACKING_URI
-
Build Docker image defined in
docker-compose.yaml.docker compose build -
Start all docker services in detached mode.
docker compose up -d -
Identify the container ID by running
docker ps -a -
Once you've identified the container ID associated with the image called
no-vacancy-appenter the following to access the application:docker exec -it <container_id> /bin/bash
8 Train the NoVacancyPipeline in a Docker container by running
python services/trainer.py
- Enter
http://localhost:5001into your web browser. Your web browser disply MLflow. In the Experiments sidebar, you should see NoVacancyModelTraining. Select it and you will see the model artifacts.
Implements medallion architecture and reads /data/bookings_raw.csv into the raw_data table in the Bronze database.
-
Clone the repo.
git clone https://github.com/Morgan-Sell/no-vacancy.git -
Switch to
postgresfeature branch.git checkout postgres -
Create a
.envfile in the project root directory with the following variables:- DB_USER
- DB_PASSWORD
- DB_PORT
- BRONZE_DB_HOST
- BRONZE_DB
- SILVER_DB_HOST
- SILVER_DB
- GOLD_DB_HOST
- GOLD_DB
- TEST_DB_USER
- TEST_DB_PASSWORD
- TEST_DB_HOST
- TEST_DB
- TEST_DB_PORT
-
Build Docker image defined in
docker-compose.yaml.docker compose build -
Start all docker services in detached mode.
docker compose up -d -
Enter
http://127.0.0.1:8000/into your web browser. Your web browser should generate the below text.{"message":"Welcome to the No Vacancy API!"} -
Identify the container ID by running
docker ps -a -
Once you've identified the container ID associated with the image called
no-vacancy-appenter the following to access the application:docker exec -it <container_id> /bin/bash -
Now that you're in the application you can query the data from
raw_datatable. Enter the code below in your command line to access PostgreSQL and see the first 10 lines of theraw_datatable.docker exec -it bronze-db psql -U <db_user_from_dotenv> -d <bronze_db_from_dotenv> SELECT * FROM raw_data LIMIT 10; -
Since the
raw_datatable has been populated, you can process the data and save the data tonovacancy-silverdatabase and train the model using theNoVacancyPipelineclass. Once you're inside the Docker container (by following instruction #8), execute the following code:python services/trainer.py
If the application successfully runs, you should see some like the following:
2025-05-14 00:43:16,688 - __main__ - INFO -train_pipeline:116 - ✅ Loaded raw data
2025-05-14 00:43:19,010 - __main__ - INFO -train_pipeline:126 - ✅ Saved preprocessed data to Silver database
Fitting 3 folds for each of 20 candidates, totalling 60 fits
2025-05-14 00:43:34,575 - services.pipeline_management - INFO -save_pipeline:39 - ✅ Pipeline and processor successfully saved at models/no_vacancy_pipeline.pkl
2025-05-14 00:43:34,740 - __main__ - INFO -evaluate_model:98 - 0.0.3 - AUC: 1.0
FastAPI web server comprised of the routers and the services required to process the data, perform feature engineering, train the model, save model artifacts and produce predictions.
-
Clone the repo.
git clone https://github.com/Morgan-Sell/no-vacancy.git -
Switch to
web-serverfeature branch.git checkout web-server -
Build the Docker image (replace <docker_username> with your Docker Hub username). Make sure Docker is running on your local PC.
docker build -t <docker_username>/no-vacancy:v1 . -
Run the container.
docker run -it -p 8000:8000 <docker_username>/no-vacancy:v1 -
Test the API by going to
http://0.0.0.0:8000/docsin your browser.







