Build a dbt project analyzing global airport data on Snowflake.
- Copy this repository by clicking the "Use Template" button
- Explore the dataset: DATASETS.md
- Get your repo running by going through one of these options:
- Use as a Codespace: Click "Code" > "Codespaces" > "Create codespace on main"
- Clone locally: Clone your fork, then run:
uv sync source .venv/bin/activate # (or equivalent on your platform, see the README of the course)
- Create a new dbt project called
airstats. You only need to do thedbt init ...step, your uv/virtualenv is already up and running
Copy the profiles.yml from the airbnb project to the airstats folder:
Now edit airstats/profiles.yml and change the profile name to airstats and database name to AIRSTATS:
Before (airbnb configuration):
airbnb:
outputs:
dev:
type: snowflake
account: "..."
user: dbt
role: TRANSFORM
private_key: "..."
private_key_passphrase: q
database: AIRBNB
schema: DEV
threads: 1
warehouse: COMPUTE_WH
target: devAfter (airstats configuration):
airstats: # <-- CHANGED
outputs:
dev:
type: snowflake
account: "..."
user: dbt
role: TRANSFORM
private_key: "..."
private_key_passphrase: q
database: AIRSTATS # <-- CHANGED
schema: DEV
threads: 1
warehouse: COMPUTE_WH
target: devcd airstats
dbt debugYou should see "All checks passed!" if the connection is configured correctly.
Once done Add airstats to git, ensure that your profiles.yml is added too (this is OK as it's an assignment in a private repository; never add credentials to git in a real-world project)
Remove the example models that dbt created by default:
rm -rf models/exampleAlso remove the example model configuration from dbt_project.yml. Delete these lines at the end of the file:
models:
airstats:
example:
+materialized: viewIf you receive deprecation warnings when you execute dbt once you removed the example configuration, ignore these; it's a reported bug in dbt.
Before building models, explore the data in Snowflake. Here are a few SQL queries to get you started:
Data source: https://ourairports.com/data/
USE AIRSTATS.RAW;
-- Check the airports table
SELECT * FROM airports LIMIT 10;
-- Count airports by type
SELECT type, COUNT(*) as count
FROM airports
GROUP BY type
ORDER BY count DESC;
-- Check the runways table
SELECT * FROM runways LIMIT 10;
-- Check the comments table
SELECT * FROM airport_comments LIMIT 10;- In your SQL files, always use CTEs for "importing" refs/sources, even if it only adds boilerplate - this is dbt convention.
- Keep your warehouse clean. If you changed the name or materialization of a model, check if the one with the old name/materialization is still around and drop it in Snowflake (use
DROP VIEWorDROP TABLEas appropriate).
The AIRSTATS database has been set up in Snowflake with the following tables in the RAW schema:
| Table | Description | Key Columns |
|---|---|---|
airports |
Global airport data (~72K rows) | id, ident, type, name, iso_country |
airport_comments |
User comments about airports | id, airport_ref, airport_ident, date |
runways |
Runway information (~44K rows) | id, airport_ref, airport_ident, closed |
Create a new file models/sources.yml that defines these three source tables. The airport_comments table must have the source name comments.
After creating the sources file, you can check for syntax errors by running:
dbt compileThe staging layer (bronze) is responsible for:
- Selecting only the columns we need from source tables
- Renaming columns to follow consistent naming conventions
- Referencing sources using the dbt
source()function
Create a folder models/bronze/ for these models.
Create models/bronze/src_airports.sql
Requirements:
- Use a CTE to reference the
airportssource - Select and rename the following columns:
| Source Column Name | Target Column Name |
|---|---|
ident |
airport_ident |
type |
airport_type |
name |
airport_name |
latitude_deg |
airport_lat |
longitude_deg |
airport_long |
continent |
continent (no rename) |
iso_country |
iso_country (no rename) |
iso_region |
iso_region (no rename) |
Create models/bronze/src_airport_comments.sql
Requirements:
- Use a CTE to reference the
airport_commentssource - Select and rename the following columns:
| Source Column Name | Target Column Name |
|---|---|
id |
comment_id |
airport_ident |
airport_ident (no rename) |
date |
comment_timestamp (it's actually a timestamp column, not a date column) |
member_nickname |
member_nickname (no rename) |
subject |
comment_subject |
body |
comment_body |
Create models/bronze/src_runways.sql
Requirements:
- Use a CTE to reference the
runwayssource - Select and rename the following columns:
| Source Column Name | Target Column Name |
|---|---|
id |
runway_id |
airport_ident |
airport_ident (no rename) |
length_ft |
runway_length_ft |
width_ft |
runway_width_ft |
surface |
runway_surface |
lighted |
runway_lighted |
closed |
runway_closed |
Run dbt to build the staging models:
dbt runAll three models should complete successfully.
Create a silver_airports model: Just a copy (SELECT *) from src_airports, no transformations needed.
Create a silver_runways model:
- Source model:
src_runways - If the surface is null or empty, change it to
__UNKNOWN__; keep this column's name asrunway_surface - The columns of this model must be exactly the same (and in the same order) as those of
src_runways
Create a silver_airport_comments model:
- Source model:
src_airport_comments - Filter out records with null / empty values for the comment body
- If the member's nickname is null, change it to
__UNKNOWN__ - Make this an incremental model that uses
comment_idto identify new records (hint: compare against the maximum existingcomment_idin the target table) - Add a new column:
loaded_at, which should be thecurrent_timestamp()by default - The columns of this model must be exactly the same (and in the same order) as those of
src_airport_comments, plus the extraloaded_atas the last column
Add a new record to RAW.airport_comments. Then materialize the incremental model again (but only that model).
Add your solution in the next lines:
- Adding a new record:
REPLACE THIS CODE BLOCK BY PASTING THE SQL for adding a new record to `RAW.airport_comments` - Command to execute to update this model (but only this model, not all the models):
REPLACE THIS CODE BLOCK BY PASTING THE dbt COMMAND YOU EXECUTED - Execute an SQL on the Snowflake UI to ensure the new record has been added:
REPLACE THIS CODE BLOCK BY PASTING 1) THE SQL to extract the new record from `silver_airport_comments` 2) THE result you see in Snowflake
Requirements
- Every table in the silver layer must be materialized as a table by default through instructions in
dbt_project.yml silver_airport_commentswill be overridden to use anincrementalmaterialization- Also change every src table to ephemeral materialization - not in
dbt_project.yml, but in the corresponding sql files individually
- Create a snapshot on
silver_airports, call itscd_silver_airports. Research, understand and use the check snapshot strategy on all columns as this model doesn't have a timestamp column we can work with. - Execute the snapshot
The airport Los Angeles County Sheriff's Department Heliport (airport_ident: 01CN) must be closed. Simulate this change by updating the type column of this heliport to closed in RAW.AIRPORTS, then run dbt run --select silver_airports followed by dbt snapshot.
- Updating the record to "closed":
REPLACE THIS BLOCK BY PASTING THE SQL you executed - Command to execute and snapshot update:
REPLACE THIS CODE BLOCK BY PASTING THE dbt COMMAND YOU EXECUTED
- Create
analyses/la_heliport_closed.sqlwhere you validate if the snapshot went through - select every line corresponding to this airport in the snapshot table. - Execute the analysis and print the values to screen
- Create a snapshot for
silver_runways, call itscd_silver_runways. Use the same check strategy as forscd_silver_airports. - Execute the snapshot
Implement the following:
- Test every silver table: Figure out which values are unique and should not be null (use your educated guess for not null) and add these tests
- Find a column where an
accepted_valuestest makes sense and use it - Make relations between the three silver tables explicit by implementing
relationshipstests. Set the severity of all relationship tests to "warn" (since referential integrity may not hold across all source data). - Use at least three dbt-expectations tests (in total) on these three models
- Implement two singular tests
- Add configuration to store test failures into a database table
- Add descriptions to the silver tables and their columns
- Use a '{{ doc("...") }}'-based documentation at least once
- Create an overview.md where you discuss in a few sentences how the silver tables interconnect