Skip to content

Commit b51eb45

Browse files
Merge pull request #890 from keboola/Branched-storage-feature
docs: document Branched Storage for Snowflake projects in Development Branches
2 parents 50cc64a + 177efa5 commit b51eb45

2 files changed

Lines changed: 62 additions & 50 deletions

File tree

47.2 KB
Loading

components/branches/index.md

Lines changed: 62 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -6,78 +6,90 @@ permalink: /components/branches/
66
* TOC
77
{:toc}
88

9-
*If you already know how development branches work in general and want to create and start using your first branch,
9+
*If you already know how development branches work in general and want to create and start using your first branch,
1010
go to our [Getting Started tutorial](/tutorial/branches/).*
1111

12-
Development Branches allow you to modify [component configurations](/components/) without interfering with running
13-
configurations or entire [orchestrated pipelines](/flows/orchestrator/). They are ideal to use when making bigger changes
14-
to a project or when you need to be extra careful about performing your changes safely.
12+
Development Branches allow you to modify [component configurations](/components/) without interfering with running
13+
configurations or entire [orchestrated pipelines](/flows/orchestrator/). They are ideal to use when making bigger changes
14+
to a project or when you need to be extra careful about performing your changes safely.
1515

16-
To give an example, let's say that you have an ordinary orchestration that extracts, transforms and writes data
17-
to a target system, and you need to remove a column from the source. To do that, you must modify several configurations,
18-
and ideally, also perform a dry run to check that the data in the target system is correct. However, modifying a pipeline
19-
that runs, e.g., every ten minutes, is difficult without an outage of the pipeline. Development Branches are designed
16+
To give an example, let's say that you have an ordinary orchestration that extracts, transforms and writes data
17+
to a target system, and you need to remove a column from the source. To do that, you must modify several configurations,
18+
and ideally, also perform a dry run to check that the data in the target system is correct. However, modifying a pipeline
19+
that runs, e.g., every ten minutes, is difficult without an outage of the pipeline. Development Branches are designed
2020
to help in such situations.
2121

2222
{% include public-beta-warning.html %}
2323

2424
## How Branches Work
25-
When you create a development branch in your project, you obtain an exact copy of the project and all its current
26-
configurations. You can then modify these configurations without ever touching the original ones in production,
27-
and these will keep running in orchestrations.
2825

29-
When you run a configuration in a branch, it can **read** the [tables](/storage/tables/) and [files](/storage/files/)
26+
When you create a development branch in your project, you obtain an exact copy of the project and all its current
27+
configurations. You can then modify these configurations without ever touching the original ones in production,
28+
and these will keep running in orchestrations.
29+
30+
When you run a configuration in a branch, it can **read** the [tables](/storage/tables/) and [files](/storage/files/)
3031
from Storage as if it were a normal configuration. However, when your branch configuration attempts to **write** data
31-
(tables or files), the data is written to the branchs isolated storage layer. This means that production data and branch data
32+
(tables or files), the data is written to the branch's isolated storage layer. This means that production data and branch data
3233
are completely separated. There is no need to duplicate your entire project's data when creating a new branch.
3334

34-
### Branched Storage Architecture
35+
## Branched Storage
36+
37+
Branched Storage is an improved storage isolation model for development branches. Instead of cluttering your project
38+
with prefixed bucket names (like `in.c-1234-bucket`), each branch gets its own fully isolated storage namespace.
39+
Production data is never touched, and no data is copied up front — a copy is created only when you actually write to or modify a table within the branch.
40+
41+
{% include tip.html content="Branched Storage is available for projects running on <strong>Snowflake</strong>. If your project uses BigQuery, the classic prefix-based model still applies — Branched Storage support for BigQuery is coming." %}
3542

36-
Instead of creating prefixed buckets immediately upon branch creation, Keboola now uses *branched storage*
37-
a dedicated storage namespace that behaves like an isolated copy of your production environment,
38-
but without duplicating data up front. Tables and files are only materialized when they are cloned or written to.
39-
The isolation is handled by automatically prefixing schema names, without injecting branch IDs into bucket names.
43+
### Why It Matters
4044

41-
This approach provides:
42-
- **Full isolation** – each branch has its own Storage environment that does not affect production.
43-
- **On-demand materialization** – tables and files appear in the branched storage only once they are accessed, cloned, or written to within the branch.
44-
- **Transparent behavior** – from the user’s perspective, reading and writing works exactly the same as in production.
45-
When a job in a branch reads from a table that has not been modified, the data is transparently loaded from production.
46-
- **Safety** – all write operations are performed within the branch’s own isolated context, ensuring that production data remains untouched.
45+
Without Branched Storage, every write in a branch produced new buckets with prefixed names that were visible in
46+
production Storage and cluttered the namespace. You had to be careful about what you ran and where.
4747

48-
<div class="alert alert-info" markdown="1">
49-
Branched Storage is currently available **only for projects using Snowflake** as the backend.
50-
Projects running on **BigQuery** continue to use the previous branch model with prefixed buckets (e.g., `in.c-1234-bucket`) until Branched Storage support is added.
51-
</div>
48+
With Branched Storage:
49+
50+
- **Production is safe** — writes in a branch never affect production data.
51+
- **No data duplication up front** — creating a branch is instant and doesn't copy your storage. Tables are only materialized when you write to them.
52+
- **Reads are transparent** — if a table hasn't been modified in the branch, you're reading live production data, with no extra cost.
53+
- **Clean Storage** — the branch has its own storage namespace. No prefixed buckets visible in production.
5254

5355
{: .image-popup}
54-
![Screenshot - Branched Storage](branched_storage.png)
56+
![Screenshot - Branched Storage in Storage UI](/components/branches/branched_storage.png)
5557

56-
---
58+
### Enabling Branched Storage
59+
60+
Branched Storage is enabled per project in **Project Settings**. Look for the **Branched storage** toggle under the Features section.
5761

58-
### Data Pipelines
62+
{: .image-popup}
63+
![Screenshot - Branched Storage Toggle](/components/branches/feature-branched-storage.png)
64+
65+
Once enabled, all new branches in that project will automatically use the isolated storage model.
66+
67+
### How It Behaves in Practice
5968

60-
When you create a data source connector and then transform the data it produces using a transformation, it behaves the following way in branches:
69+
When you create a branch and run a job:
6170

62-
In production, you might have a data source connector that extracts website requests data to a bucket called `in.c-requests`. Then you create a transformation that takes data from `in.c-requests` and transforms it into aggregated visits stored in `out.c-visits`. Both buckets contain production data.
71+
1. **Reading** from a table that hasn't been modified → the branch reads directly from production. Nothing is copied.
72+
2. **Writing** to a table for the first time → the table is cloned into the branch's isolated storage. All subsequent reads and writes for that table within the branch use this isolated copy.
73+
3. **Production is never touched** — regardless of how many times you write in a branch.
6374

64-
When you switch to a new branch in a **Snowflake project**, no data is copied immediately. The branched storage references production data until you start modifying or writing data.
75+
When you delete or merge the branch, the branched storage is cleaned up accordingly.
6576

66-
If you run a transformation that writes to a new table or modifies existing data, the table will be created or cloned inside the branched storage.
67-
Any subsequent reads or writes within that branch will operate only on this isolated copy. Your production data in `out.c-visits` remains untouched.
77+
## Data Pipelines in Branches
6878

69-
For **BigQuery projects**, the classic prefix-based model still applies — new tables written from a branch are prefixed with the branch ID (e.g., `out.c-1234-visits`).
79+
In production, you might have a data source connector that extracts data into a bucket `in.c-requests`, and a transformation
80+
that reads from it and writes results to `out.c-visits`.
7081

71-
<div class="alert alert-info" markdown="1">
72-
In Branched Storage (Snowflake), data is materialized only when it is written to or cloned within a branch.
73-
Reading from unmodified tables uses production data transparently.
74-
</div>
82+
When you switch to a branch on a **Snowflake project with Branched Storage enabled**, no data is copied immediately.
83+
The branch reads from production Storage until you run a job that writes data — at that point, only the affected tables
84+
are materialized in the branch's own storage. Your production `out.c-visits` remains untouched throughout.
7585

76-
This allows you to test the entire pipeline with real data, in complete isolation from production, without duplicating all storage content at branch creation.
86+
This allows you to test the entire pipeline with real data, in complete isolation from production, without duplicating
87+
all storage content at branch creation.
7788

7889
## Creating a Branch
79-
If you have your configurations ready in production and want to create a branch to test some changes, click on your project’s name
80-
at the top of the screen. Then click on the green icon **New** displayed next to your project’s name.
90+
91+
If you have your configurations ready in production and want to create a branch to test some changes, click on your project's name
92+
at the top of the screen. Then click on the green icon **New** displayed next to your project's name.
8193

8294
{: .image-popup}
8395
![Screenshot - Create Development Branch](/tutorial/branches/figures/08-create-dev-branch.png)
@@ -90,19 +102,20 @@ Name your new branch and click **Create Development Branch** to open it.
90102
The branch will appear right below the name of your production project.
91103

92104
{: .image-popup}
93-
![Screenshot - Created Development Branch](/tutorial/branches/figures/10-dev-branch-created.png).
105+
![Screenshot - Created Development Branch](/tutorial/branches/figures/10-dev-branch-created.png)
94106

95-
Now you can start modifying your configurations, run them, and analyze the results.
107+
Now you can start modifying your configurations, run them, and analyze the results.
96108

97109
If you want to learn more about working in a branch, follow our [tutorial](/tutorial/branches/).
98110

99111
## Closing a Branch
100-
Before you merge your development branch back to production, check a detailed [diff of the configuration changes](/tutorial/branches/project-diff/).
112+
113+
Before you merge your development branch back to production, check a detailed [diff of the configuration changes](/tutorial/branches/project-diff/).
101114

102115
You can end your branch's lifecycle in two ways:
103116

104-
- **Deleting** -- if you do not wish to use the changes you've made and want to simply discard them. The data associated with the branch is discarded when the branch is deleted.
105-
- [**Merging into production**](/tutorial/branches/merge-to-production/) -- all changes in the configurations are brought back to the respective production configurations. All the changes are applied at once (after you approve them) and produce new [versions](/components/#configuration-versions) of the respective configurations. The branch can be either deleted or kept for further reference after merging.
117+
- **Deleting** if you do not wish to use the changes you've made and want to simply discard them. The data associated with the branch is discarded when the branch is deleted.
118+
- [**Merging into production**](/tutorial/branches/merge-to-production/) all changes in the configurations are brought back to the respective production configurations. All the changes are applied at once (after you approve them) and produce new [versions](/components/#configuration-versions) of the respective configurations. The branch can be either deleted or kept for further reference after merging.
106119

107120
***Important:** All of this happens within the same project, enabling collaboration with other project members on the modifications.*
108121

@@ -125,4 +138,3 @@ Components using OAuth do not allow authorizing nor changing the OAuth in a deve
125138
*****
126139

127140
***Important:** Development branches are for development and testing only, so setting up status notifications on Flows is not supported.*
128-

0 commit comments

Comments
 (0)