From 705349c123b3b3cde462d921578ab84a74a687c8 Mon Sep 17 00:00:00 2001 From: zubednarova Date: Mon, 13 Apr 2026 07:58:21 +0200 Subject: [PATCH 01/15] Create Storage Access --- data-apps/Storage Access | 467 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 467 insertions(+) create mode 100644 data-apps/Storage Access diff --git a/data-apps/Storage Access b/data-apps/Storage Access new file mode 100644 index 000000000..e50a3938c --- /dev/null +++ b/data-apps/Storage Access @@ -0,0 +1,467 @@ +# Storage Access + +## Overview + +Storage Access allows your Data App to read data from and write data back to Keboola Storage tables in real-time. Your app connects directly to Keboola's storage through Query Service via SQL, enabling: + +- **Real-time data access**: Always work with the latest data, no redeployment needed +- **Write-back capability**: Update, insert into **existing** Storage tables directly from your app +- **Interactive applications**: Build data entry forms, approval workflows, and collaborative tools + +This feature is available for both **Streamlit** and **Python/JS** Data Apps. + +## When to Use Storage Access + +**Use Storage Access when you need to:** + +- Build interactive data entry or editing applications +- Work with large datasets more efficiently - no need to load them via input mapping +- Enable business users to update data directly from the app + +**Stick with Input Mapping when:** + +- You don't need write-back capability - the app only reads and displays data + +## How It Works + +### Architecture Overview + +When you enable Storage Access, Keboola creates a dedicated **workspace** for your Data App. This workspace contains a database user with specific permissions (INSERT, SELECT, UPDATE, TRUNCATE, DELETE) on the tables you've selected. + +``` +Your Data App + │ + ▼ +Query Service ────► Workspace User ────► Storage Tables + │ │ + │ │ + └── Handles authentication, └── Your selected tables + billing, metadata refresh with granted permissions +``` + +Your app communicates with Storage through the **Query Service API**, not directly with Snowflake. This provides: + +- Automatic authentication using your app's token +- Usage tracking for billing +- Automatic metadata refresh after writes +- Abstraction from the underlying backend + +### Workspace Lifecycle + +The workspace is **ephemeral** - a fresh workspace is created each time your app starts (including wake-up from sleep): + +| Event | Workspace Action | +| --- | --- | +| App deploys | New workspace created | +| App wakes from sleep | New workspace created (old one deleted) | +| App redeployed | New workspace created (old one deleted) | +| App deleted | Workspace deleted | + +This design ensures: + +- Permission changes take effect on next app start +- No stale credentials or connections +- Clean isolation between app runs + +## Setting Up Storage Access + +### Step 1: Storage Access + +1. Open your Data App configuration in Keboola. +2. Go to the **Advanced Settings** tab. +3. Find the **Storage Access** section. +4. Click **+ Add Writable Table**. +2. Select a bucket and table from Storage. +3. For each table, the app will have **SELECT**, **UPDATE**, and **TRUNCATE** permissions. + +**Notes:** + +- You can add multiple tables from different buckets. +- All selected tables must exist before deploying. +- Column-level permissions are not supported - the app has access to all columns in selected tables. + +### Step 3: Deploy Your App + +Click **Deploy** (or **Redeploy** for existing apps). During deployment: + +1. Keboola creates a new workspace with your selected table permissions. +2. The workspace ID is passed to your app as an environment variable. +3. Your app code can now use the Query Service to read and write data. + +## Reading Data from Storage + +### Using the Query Service Client + +Install the Keboola Query Service client: + +**In `pyproject.toml` (Python):** + +```toml +dependencies = [ + "kbcstorage>=0.8.0", + "keboola.query-service-client>=0.2.0", +] +``` + +**In your Python code:** + +```python +import os +import json +from keboola.query_service_client import QueryServiceClient + +# Read workspace ID from the manifest file +manifest_path = os.environ.get("KBC_WORKSPACE_MANIFEST_PATH", + "/var/run/secrets/keboola.com/workspace/manifest.json") + +with open(manifest_path) as f: + manifest = json.load(f) + workspace_id = manifest["workspaceId"] + +# Initialize the Query Service client +client = QueryServiceClient( + token=os.environ["KBC_TOKEN"], + url=os.environ["KBC_URL"] +) +``` + +### Reading Selected Tables + +To read a table you've selected in the UI: + +```python +import pandas as pd + +# Query a table - use the full table ID (bucket.table) +result = client.execute_query( + workspace_id=workspace_id, + query='SELECT * FROM "in.c-main"."customers" LIMIT 1000' +) + +# Convert to DataFrame +df = pd.DataFrame(result["data"], columns=result["columns"]) +print(df.head()) +``` + +**Table naming convention:** + +- Use the full table ID in quotes: `"bucket_stage.bucket_name"."table_name"` +- Example: `"in.c-sales"."orders"` for a table `orders` in bucket `in.c-sales` + +### Running Custom Queries + +You can run any SELECT query against your permitted tables: + +```python +# Join multiple tables +query = """ + SELECT + c.customer_name, + SUM(o.amount) as total_spent + FROM "in.c-main"."customers" c + JOIN "in.c-main"."orders" o ON c.id = o.customer_id + GROUP BY c.customer_name + ORDER BY total_spent DESC + LIMIT 10 +""" + +result = client.execute_query(workspace_id=workspace_id, query=query) +``` + +## Writing Data Back to Storage + +Storage Access allows your app to modify data in Storage tables using standard SQL statements via the Query Service. This is useful for: + +- Data entry forms +- Approval workflows +- Data correction interfaces +- Collaborative editing + +### Inserting and Updating Data + +You can use standard SQL INSERT and UPDATE statements directly via the Query Service: + +```python +# INSERT new records +client.execute_query( + workspace_id=workspace_id, + query=''' + INSERT INTO "in.c-main"."approvals" ("id", "name", "status", "updated_at") + VALUES (1, 'New Record', 'pending', CURRENT_TIMESTAMP) + ''' +) + +# UPDATE existing records +client.execute_query( + workspace_id=workspace_id, + query=''' + UPDATE "in.c-main"."approvals" + SET status = 'approved', updated_at = CURRENT_TIMESTAMP + WHERE id = 123 + ''' +) + +# DELETE records +client.execute_query( + workspace_id=workspace_id, + query=''' + DELETE FROM "in.c-main"."approvals" + WHERE status = 'cancelled' + ''' +) +``` + +The Query Service automatically handles metadata refresh in Storage after write operations, so row counts and table statistics stay current without any additional calls. + +### Truncating Tables + +To remove all data from a table: + +```python +client.execute_query( + workspace_id=workspace_id, + query='TRUNCATE TABLE "in.c-main"."temp_data"' +) +``` + +**Warning:** Truncation is immediate and cannot be undone. Use with caution. + +### Important Considerations + +**Metadata refresh:** After any write operation, Keboola automatically refreshes the table metadata. This ensures: + +- Row counts are accurate in the Storage UI +- Other components see the updated data +- Table statistics are current + +**Concurrency:** Multiple users of your app may write simultaneously. If you need to prevent conflicts, you must handle this in your application logic: + +```python +# Example: Optimistic locking with version column +query = """ + UPDATE "in.c-main"."records" + SET status = 'approved', version = version + 1 + WHERE id = 123 AND version = 5 +""" +result = client.execute_query(workspace_id=workspace_id, query=query) + +if result["rows_affected"] == 0: + raise Exception("Record was modified by another user. Please refresh and try again.") +``` + +## Environment Variables + +When Storage Access is enabled, these environment variables are available to your app: + +| Variable | Description | +| --- | --- | +| `KBC_WORKSPACE_MANIFEST_PATH` | Path to the workspace manifest file (contains `workspaceId`) | +| `WORKSPACE_ID` | The workspace ID directly (alternative to reading from manifest) | +| `KBC_TOKEN` | Storage API token (always available) | +| `KBC_URL` | Keboola Connection URL (always available) | + +**Reading the workspace ID:** + +```python +import os +import json + +# Method 1: From environment variable (simpler) +workspace_id = os.environ.get("WORKSPACE_ID") + +# Method 2: From manifest file (more robust) +manifest_path = os.environ.get("KBC_WORKSPACE_MANIFEST_PATH", + "/var/run/secrets/keboola.com/workspace/manifest.json") +if os.path.exists(manifest_path): + with open(manifest_path) as f: + workspace_id = json.load(f)["workspaceId"] +``` + +## Comparison: Input Mapping vs Direct Storage Access + +| Aspect | Input Mapping | Direct Storage Access | +| --- | --- | --- | +| **Data freshness** | Snapshot at deploy time | Real-time, always current | +| **Data loading** | CSV files loaded to `/data/in/tables/` | Query on demand via API | +| **Write capability** | None (read-only) | SELECT, UPDATE, TRUNCATE, DELETE | +| **Dataset size** | Limited by container memory | Virtually unlimited (pagination) | +| **Configuration** | Select tables in UI | Select tables + enable toggle | +| **Use case** | Static dashboards, reports | Interactive apps, data entry | + +**You can use both together:** Input Mapping for reference data that rarely changes, Storage Access for data you need to read/write in real-time. + +## Example: Read-Write Data App + +This example shows a simple Flask app that reads records from Storage and allows users to update their status. + +**`app.py`:** + +```python +from flask import Flask, request, jsonify, render_template_string +import os +import json +import pandas as pd +from keboola.query_service_client import QueryServiceClient + +app = Flask(__name__) + +# Initialize Query Service client +def get_qs_client(): + manifest_path = os.environ.get("KBC_WORKSPACE_MANIFEST_PATH", + "/var/run/secrets/keboola.com/workspace/manifest.json") + with open(manifest_path) as f: + workspace_id = json.load(f)["workspaceId"] + + return QueryServiceClient( + token=os.environ["KBC_TOKEN"], + url=os.environ["KBC_URL"] + ), workspace_id + + +@app.route("/", methods=["GET", "POST"]) +def index(): + client, workspace_id = get_qs_client() + + if request.method == "POST": + # Handle status update + record_id = request.form["record_id"] + new_status = request.form["status"] + + client.execute_query( + workspace_id=workspace_id, + query=f''' + UPDATE "in.c-main"."approvals" + SET status = '{new_status}', updated_at = CURRENT_TIMESTAMP + WHERE id = {record_id} + ''' + ) + + # Load current records + result = client.execute_query( + workspace_id=workspace_id, + query='SELECT id, name, status, updated_at FROM "in.c-main"."approvals" ORDER BY id' + ) + records = [dict(zip(result["columns"], row)) for row in result["data"]] + + return render_template_string(TEMPLATE, records=records) + + +TEMPLATE = """ + + +Approval Manager + +

Pending Approvals

+ + + {% for r in records %} + + + + + + + {% endfor %} +
IDNameStatusAction
{{ r.id }}{{ r.name }}{{ r.status }} +
+ + + +
+
+ + +""" + +if __name__ == "__main__": + app.run(host="0.0.0.0", port=5000) +``` + +**`pyproject.toml`:** + +```toml +[project] +name = "approval-app" +version = "0.1.0" +requires-python = ">=3.11" +dependencies = [ + "flask>=3.0.0", + "pandas>=2.0.0", + "kbcstorage>=0.8.0", + "keboola.query-service-client>=0.2.0", +] + +[build-system] +requires = ["setuptools>=61.0"] +build-backend = "setuptools.build_meta" +``` + +## Best Practices + +**1. Handle missing workspace gracefully** + +```python +workspace_id = os.environ.get("WORKSPACE_ID") +if not workspace_id: + # Fall back to Input Mapping or show error + st.error("Direct Storage Access is not enabled for this app.") + st.stop() +``` + +**2. Use parameterized queries to prevent SQL injection** + +```python +# ❌ DANGEROUS - never do this +query = f"SELECT * FROM table WHERE id = {user_input}" + +# ✅ SAFE - use parameterized queries or sanitize input +safe_id = int(user_input) # Validate it's actually a number +query = f"SELECT * FROM table WHERE id = {safe_id}" +``` + +**3. Implement pagination for large datasets** + +```python +page_size = 1000 +offset = 0 + +while True: + result = client.execute_query( + workspace_id=workspace_id, + query=f"SELECT * FROM table LIMIT {page_size} OFFSET {offset}" + ) + if not result["data"]: + break + process_batch(result["data"]) + offset += page_size +``` + +**4. Cache frequently-used data** + +```python +import streamlit as st + +@st.cache_data(ttl=300) # Cache for 5 minutes +def load_reference_data(): + client, workspace_id = get_qs_client() + result = client.execute_query(...) + return pd.DataFrame(result["data"], columns=result["columns"]) +``` + +**5. Log write operations for auditability** + +```python +import logging + +logging.info(f"User {current_user} updated record {record_id} to status {new_status}") +``` + +## Limitations + +- **Snowflake only**: Storage Access currently works only with Snowflake backends. BigQuery support is planned for a future release. +- **Column-level permissions not supported**: If you grant access to a table, the app can read/write all columns. +- **Permission changes require app restart**: If you add or remove tables from the Storage Access configuration, the changes take effect on the next app start (deploy, redeploy, or wake from sleep). From 2e286bea359415fd909c172c753e456ef672adec Mon Sep 17 00:00:00 2001 From: zubednarova Date: Mon, 13 Apr 2026 08:05:25 +0200 Subject: [PATCH 02/15] Rename data-apps/Storage Access to data-apps/storage-access/index.md --- data-apps/{Storage Access => storage-access/index.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename data-apps/{Storage Access => storage-access/index.md} (100%) diff --git a/data-apps/Storage Access b/data-apps/storage-access/index.md similarity index 100% rename from data-apps/Storage Access rename to data-apps/storage-access/index.md From eeba645c97d50681137d064191c6508692b655df Mon Sep 17 00:00:00 2001 From: zubednarova Date: Mon, 13 Apr 2026 08:31:12 +0200 Subject: [PATCH 03/15] Update index.md --- data-apps/storage-access/index.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/data-apps/storage-access/index.md b/data-apps/storage-access/index.md index e50a3938c..cb2cc2d8a 100644 --- a/data-apps/storage-access/index.md +++ b/data-apps/storage-access/index.md @@ -1,4 +1,10 @@ -# Storage Access +--- +title: Storage Access +permalink: /data-apps/storage-access/ +--- + +* TOC +{:toc} ## Overview @@ -65,7 +71,10 @@ This design ensures: ## Setting Up Storage Access -### Step 1: Storage Access +### Step 1: Enable Direct Grant feature +-- need to verify but it looks like private beta - check the exact feature name + +### Step 2: Storage Access 1. Open your Data App configuration in Keboola. 2. Go to the **Advanced Settings** tab. From 705d7e9933e93b25bc79e8f9c313900ba60b7c49 Mon Sep 17 00:00:00 2001 From: zubednarova Date: Mon, 13 Apr 2026 08:33:30 +0200 Subject: [PATCH 04/15] Update index.md --- data-apps/storage-access/index.md | 1 + 1 file changed, 1 insertion(+) diff --git a/data-apps/storage-access/index.md b/data-apps/storage-access/index.md index cb2cc2d8a..2cca1043b 100644 --- a/data-apps/storage-access/index.md +++ b/data-apps/storage-access/index.md @@ -73,6 +73,7 @@ This design ensures: ### Step 1: Enable Direct Grant feature -- need to verify but it looks like private beta - check the exact feature name +direct-grant-output-mapping ### Step 2: Storage Access From 3e3dccb6cfb127762731f3ad8f52e1de351f1cea Mon Sep 17 00:00:00 2001 From: zubednarova Date: Wed, 15 Apr 2026 07:34:43 +0200 Subject: [PATCH 05/15] Update index.md --- data-apps/storage-access/index.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/data-apps/storage-access/index.md b/data-apps/storage-access/index.md index 2cca1043b..570bddaa6 100644 --- a/data-apps/storage-access/index.md +++ b/data-apps/storage-access/index.md @@ -71,9 +71,10 @@ This design ensures: ## Setting Up Storage Access -### Step 1: Enable Direct Grant feature --- need to verify but it looks like private beta - check the exact feature name -direct-grant-output-mapping +### Step 1: Enable Storage Access +1. Go to the **Project Settings**. +2. Go to the **Features**. +3. Find the **Storage Access** feature and activate it. ### Step 2: Storage Access From b55ce783148dc7b0b5a859f8f8562166cb16735e Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Thu, 16 Apr 2026 05:49:36 +0000 Subject: [PATCH 06/15] docs: address review comments on Storage Access documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add links to Query Service API docs and recommended client library - Remove duplicate column-level permissions note (kept in Limitations) - Add error handling for workspace manifest reading - Fix Flask example: initialize client once at startup, add input validation - Rename 'parameterized queries' to 'validate and sanitize input' with allowlist examples - Replace OFFSET pagination with keyset (cursor-based) pagination - Add generic Python cache example alongside Streamlit-specific one - Clarify logging: mention stdout destination and Terminal Log tab - Make workspace ID reading consistent (always from manifest with error handling) - Remove WORKSPACE_ID env var (use manifest file consistently) - Add Storage Access page to site navigation Co-Authored-By: Zuzana Bednářová --- _data/navigation.yml | 2 + data-apps/storage-access/index.md | 147 +++++++++++++++++++++--------- 2 files changed, 104 insertions(+), 45 deletions(-) diff --git a/_data/navigation.yml b/_data/navigation.yml index 99dafc633..e3b604bd4 100644 --- a/_data/navigation.yml +++ b/_data/navigation.yml @@ -196,6 +196,8 @@ items: title: Git Repository Deployment - url: /components/data-apps/backend-versions/ title: Backend Versions + - url: /data-apps/storage-access/ + title: Storage Access - url: /data-apps/terminal-log-tab/ title: Terminal Log Tab diff --git a/data-apps/storage-access/index.md b/data-apps/storage-access/index.md index 570bddaa6..883de71a5 100644 --- a/data-apps/storage-access/index.md +++ b/data-apps/storage-access/index.md @@ -45,13 +45,15 @@ Query Service ────► Workspace User ────► Storage Tables billing, metadata refresh with granted permissions ``` -Your app communicates with Storage through the **Query Service API**, not directly with Snowflake. This provides: +Your app communicates with Storage through the [**Query Service API**](https://query.keboola.com/api-docs/), not directly with Snowflake. This provides: - Automatic authentication using your app's token - Usage tracking for billing - Automatic metadata refresh after writes - Abstraction from the underlying backend +The recommended Python client library is [keboola.query-service-client](https://pypi.org/project/keboola.query-service-client/). + ### Workspace Lifecycle The workspace is **ephemeral** - a fresh workspace is created each time your app starts (including wake-up from sleep): @@ -89,7 +91,6 @@ This design ensures: - You can add multiple tables from different buckets. - All selected tables must exist before deploying. -- Column-level permissions are not supported - the app has access to all columns in selected tables. ### Step 3: Deploy Your App @@ -121,13 +122,19 @@ import os import json from keboola.query_service_client import QueryServiceClient -# Read workspace ID from the manifest file +# Read workspace ID from the manifest file (once at startup) manifest_path = os.environ.get("KBC_WORKSPACE_MANIFEST_PATH", "/var/run/secrets/keboola.com/workspace/manifest.json") -with open(manifest_path) as f: - manifest = json.load(f) - workspace_id = manifest["workspaceId"] +try: + with open(manifest_path) as f: + manifest = json.load(f) + workspace_id = manifest["workspaceId"] +except (FileNotFoundError, KeyError) as e: + raise RuntimeError( + "Storage Access is not enabled for this app. " + "Enable it in Advanced Settings and redeploy." + ) from e # Initialize the Query Service client client = QueryServiceClient( @@ -267,25 +274,26 @@ When Storage Access is enabled, these environment variables are available to you | Variable | Description | | --- | --- | | `KBC_WORKSPACE_MANIFEST_PATH` | Path to the workspace manifest file (contains `workspaceId`) | -| `WORKSPACE_ID` | The workspace ID directly (alternative to reading from manifest) | | `KBC_TOKEN` | Storage API token (always available) | | `KBC_URL` | Keboola Connection URL (always available) | **Reading the workspace ID:** +The recommended way to obtain the workspace ID is from the manifest file: + ```python import os import json -# Method 1: From environment variable (simpler) -workspace_id = os.environ.get("WORKSPACE_ID") - -# Method 2: From manifest file (more robust) manifest_path = os.environ.get("KBC_WORKSPACE_MANIFEST_PATH", "/var/run/secrets/keboola.com/workspace/manifest.json") -if os.path.exists(manifest_path): +try: with open(manifest_path) as f: workspace_id = json.load(f)["workspaceId"] +except (FileNotFoundError, KeyError) as e: + raise RuntimeError( + "Storage Access is not enabled. Enable it in Advanced Settings and redeploy." + ) from e ``` ## Comparison: Input Mapping vs Direct Storage Access @@ -316,30 +324,31 @@ from keboola.query_service_client import QueryServiceClient app = Flask(__name__) -# Initialize Query Service client -def get_qs_client(): - manifest_path = os.environ.get("KBC_WORKSPACE_MANIFEST_PATH", - "/var/run/secrets/keboola.com/workspace/manifest.json") - with open(manifest_path) as f: - workspace_id = json.load(f)["workspaceId"] - - return QueryServiceClient( - token=os.environ["KBC_TOKEN"], - url=os.environ["KBC_URL"] - ), workspace_id +# Initialize Query Service client once at startup +manifest_path = os.environ.get("KBC_WORKSPACE_MANIFEST_PATH", + "/var/run/secrets/keboola.com/workspace/manifest.json") +with open(manifest_path) as f: + WORKSPACE_ID = json.load(f)["workspaceId"] + +qs_client = QueryServiceClient( + token=os.environ["KBC_TOKEN"], + url=os.environ["KBC_URL"] +) + +ALLOWED_STATUSES = {"pending", "approved", "rejected"} @app.route("/", methods=["GET", "POST"]) def index(): - client, workspace_id = get_qs_client() - if request.method == "POST": - # Handle status update - record_id = request.form["record_id"] + # Validate and sanitize user input + record_id = int(request.form["record_id"]) # ensure integer new_status = request.form["status"] + if new_status not in ALLOWED_STATUSES: + return "Invalid status", 400 - client.execute_query( - workspace_id=workspace_id, + qs_client.execute_query( + workspace_id=WORKSPACE_ID, query=f''' UPDATE "in.c-main"."approvals" SET status = '{new_status}', updated_at = CURRENT_TIMESTAMP @@ -348,8 +357,8 @@ def index(): ) # Load current records - result = client.execute_query( - workspace_id=workspace_id, + result = qs_client.execute_query( + workspace_id=WORKSPACE_ID, query='SELECT id, name, status, updated_at FROM "in.c-main"."approvals" ORDER BY id' ) records = [dict(zip(result["columns"], row)) for row in result["data"]] @@ -416,59 +425,107 @@ build-backend = "setuptools.build_meta" **1. Handle missing workspace gracefully** ```python -workspace_id = os.environ.get("WORKSPACE_ID") -if not workspace_id: - # Fall back to Input Mapping or show error - st.error("Direct Storage Access is not enabled for this app.") +import os, json + +manifest_path = os.environ.get("KBC_WORKSPACE_MANIFEST_PATH", + "/var/run/secrets/keboola.com/workspace/manifest.json") +try: + with open(manifest_path) as f: + workspace_id = json.load(f)["workspaceId"] +except (FileNotFoundError, KeyError): + # Storage Access is not enabled — show a user-friendly error + import streamlit as st # or use your framework's error handling + st.error("Storage Access is not enabled for this app. Enable it in Advanced Settings and redeploy.") st.stop() ``` -**2. Use parameterized queries to prevent SQL injection** +**2. Validate and sanitize user input to prevent SQL injection** + +Since the Query Service accepts raw SQL strings, you must validate all user input before including it in queries: ```python # ❌ DANGEROUS - never do this query = f"SELECT * FROM table WHERE id = {user_input}" -# ✅ SAFE - use parameterized queries or sanitize input -safe_id = int(user_input) # Validate it's actually a number +# ✅ SAFE - validate types and use allowlists +safe_id = int(user_input) # Ensure it's actually a number query = f"SELECT * FROM table WHERE id = {safe_id}" + +# ✅ For string values, use an allowlist of permitted values +ALLOWED_STATUSES = {"pending", "approved", "rejected"} +if status not in ALLOWED_STATUSES: + raise ValueError(f"Invalid status: {status}") +query = f"UPDATE table SET status = '{status}' WHERE id = {safe_id}" ``` -**3. Implement pagination for large datasets** +**3. Implement keyset pagination for large datasets** + +Use keyset (cursor-based) pagination instead of OFFSET, which can produce duplicates or gaps on live data: ```python page_size = 1000 -offset = 0 +last_id = 0 # Start from the beginning while True: result = client.execute_query( workspace_id=workspace_id, - query=f"SELECT * FROM table LIMIT {page_size} OFFSET {offset}" + query=f''' + SELECT * FROM "in.c-main"."my_table" + WHERE id > {last_id} + ORDER BY id ASC + LIMIT {page_size} + ''' ) if not result["data"]: break process_batch(result["data"]) - offset += page_size + last_id = result["data"][-1][0] # Update cursor to last row's id ``` **4. Cache frequently-used data** +For **Streamlit** apps, use `st.cache_data`: + ```python import streamlit as st @st.cache_data(ttl=300) # Cache for 5 minutes def load_reference_data(): - client, workspace_id = get_qs_client() - result = client.execute_query(...) + result = client.execute_query( + workspace_id=workspace_id, + query='SELECT * FROM "in.c-main"."reference_data"' + ) return pd.DataFrame(result["data"], columns=result["columns"]) ``` -**5. Log write operations for auditability** +For **Python/JS** (non-Streamlit) apps, use a simple in-memory cache: + +```python +from functools import lru_cache +import time + +_cache = {} +_cache_ttl = 300 # seconds + +def get_cached_data(key, query_fn): + now = time.time() + if key in _cache and now - _cache[key]["ts"] < _cache_ttl: + return _cache[key]["data"] + data = query_fn() + _cache[key] = {"data": data, "ts": now} + return data +``` + +**5. Track write operations** + +Write operations are automatically tracked by the Query Service for billing purposes. For additional application-level auditing, log to stdout (visible in Data App container logs): ```python import logging +logging.basicConfig(level=logging.INFO) logging.info(f"User {current_user} updated record {record_id} to status {new_status}") +# Output goes to stdout → visible in the Terminal Log tab of your Data App ``` ## Limitations From 285e541fcc303cc96dae7ee02c88341cdcc03921 Mon Sep 17 00:00:00 2001 From: zubednarova Date: Thu, 16 Apr 2026 08:06:21 +0200 Subject: [PATCH 07/15] Update index.md --- data-apps/storage-access/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/data-apps/storage-access/index.md b/data-apps/storage-access/index.md index 883de71a5..c9e2025cc 100644 --- a/data-apps/storage-access/index.md +++ b/data-apps/storage-access/index.md @@ -85,7 +85,7 @@ This design ensures: 3. Find the **Storage Access** section. 4. Click **+ Add Writable Table**. 2. Select a bucket and table from Storage. -3. For each table, the app will have **SELECT**, **UPDATE**, and **TRUNCATE** permissions. +3. For each table, the app will have **INSERT**, **SELECT**, **UPDATE**, **DELETE** and **TRUNCATE** permissions. **Notes:** @@ -302,7 +302,7 @@ except (FileNotFoundError, KeyError) as e: | --- | --- | --- | | **Data freshness** | Snapshot at deploy time | Real-time, always current | | **Data loading** | CSV files loaded to `/data/in/tables/` | Query on demand via API | -| **Write capability** | None (read-only) | SELECT, UPDATE, TRUNCATE, DELETE | +| **Write capability** | None (read-only) | INSERT, SELECT, UPDATE, TRUNCATE, DELETE | | **Dataset size** | Limited by container memory | Virtually unlimited (pagination) | | **Configuration** | Select tables in UI | Select tables + enable toggle | | **Use case** | Static dashboards, reports | Interactive apps, data entry | From 11f2256f912cc46950d5c0a96b7af83c9719326b Mon Sep 17 00:00:00 2001 From: MiroCillik Date: Mon, 20 Apr 2026 08:58:37 +0200 Subject: [PATCH 08/15] docs(storage-access): fix numbering, clean deps, and standardize permission list - Rename Step 2 heading to "Configure Writable Tables" and renumber list (was 1,2,3,4,2,3) - Use consistent permission ordering (SELECT, INSERT, UPDATE, DELETE, TRUNCATE) across the page - Drop SELECT from the "Write capability" cell in the comparison table - Remove unused kbcstorage dependency from both pyproject.toml snippets - Remove unused pandas and jsonify imports from the Flask example - Clarify that code examples are Python; same concepts apply to JavaScript - Convert the truncation warning to {% include warning.html %} and soften the undo claim Co-Authored-By: Claude Opus 4.7 (1M context) --- data-apps/storage-access/index.md | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/data-apps/storage-access/index.md b/data-apps/storage-access/index.md index c9e2025cc..6e7b88280 100644 --- a/data-apps/storage-access/index.md +++ b/data-apps/storage-access/index.md @@ -14,7 +14,7 @@ Storage Access allows your Data App to read data from and write data back to Keb - **Write-back capability**: Update, insert into **existing** Storage tables directly from your app - **Interactive applications**: Build data entry forms, approval workflows, and collaborative tools -This feature is available for both **Streamlit** and **Python/JS** Data Apps. +This feature is available for both **Streamlit** and **Python/JS** Data Apps. Code examples on this page use Python; the same concepts apply when calling the Query Service API from JavaScript. ## When to Use Storage Access @@ -32,7 +32,7 @@ This feature is available for both **Streamlit** and **Python/JS** Data Apps. ### Architecture Overview -When you enable Storage Access, Keboola creates a dedicated **workspace** for your Data App. This workspace contains a database user with specific permissions (INSERT, SELECT, UPDATE, TRUNCATE, DELETE) on the tables you've selected. +When you enable Storage Access, Keboola creates a dedicated **workspace** for your Data App. This workspace contains a database user with specific permissions (SELECT, INSERT, UPDATE, DELETE, TRUNCATE) on the tables you've selected. ``` Your Data App @@ -78,14 +78,14 @@ This design ensures: 2. Go to the **Features**. 3. Find the **Storage Access** feature and activate it. -### Step 2: Storage Access +### Step 2: Configure Writable Tables 1. Open your Data App configuration in Keboola. 2. Go to the **Advanced Settings** tab. 3. Find the **Storage Access** section. 4. Click **+ Add Writable Table**. -2. Select a bucket and table from Storage. -3. For each table, the app will have **INSERT**, **SELECT**, **UPDATE**, **DELETE** and **TRUNCATE** permissions. +5. Select a bucket and table from Storage. +6. For each table, the app will have **SELECT**, **INSERT**, **UPDATE**, **DELETE**, and **TRUNCATE** permissions. **Notes:** @@ -110,7 +110,6 @@ Install the Keboola Query Service client: ```toml dependencies = [ - "kbcstorage>=0.8.0", "keboola.query-service-client>=0.2.0", ] ``` @@ -242,7 +241,7 @@ client.execute_query( ) ``` -**Warning:** Truncation is immediate and cannot be undone. Use with caution. +{% include warning.html content="Truncation removes every row in the target table immediately and cannot be undone through the Query Service. Use with caution." %} ### Important Considerations @@ -302,7 +301,7 @@ except (FileNotFoundError, KeyError) as e: | --- | --- | --- | | **Data freshness** | Snapshot at deploy time | Real-time, always current | | **Data loading** | CSV files loaded to `/data/in/tables/` | Query on demand via API | -| **Write capability** | None (read-only) | INSERT, SELECT, UPDATE, TRUNCATE, DELETE | +| **Write capability** | None (read-only) | INSERT, UPDATE, DELETE, TRUNCATE | | **Dataset size** | Limited by container memory | Virtually unlimited (pagination) | | **Configuration** | Select tables in UI | Select tables + enable toggle | | **Use case** | Static dashboards, reports | Interactive apps, data entry | @@ -316,10 +315,9 @@ This example shows a simple Flask app that reads records from Storage and allows **`app.py`:** ```python -from flask import Flask, request, jsonify, render_template_string +from flask import Flask, request, render_template_string import os import json -import pandas as pd from keboola.query_service_client import QueryServiceClient app = Flask(__name__) @@ -410,8 +408,6 @@ version = "0.1.0" requires-python = ">=3.11" dependencies = [ "flask>=3.0.0", - "pandas>=2.0.0", - "kbcstorage>=0.8.0", "keboola.query-service-client>=0.2.0", ] From 71176f0f5d4e823772740d815e49d29643881a32 Mon Sep 17 00:00:00 2001 From: MiroCillik Date: Wed, 22 Apr 2026 10:32:47 +0200 Subject: [PATCH 09/15] docs(storage-access): correct SDK name, API signature, and Data App env vars Previously the docs referenced a fictional package (keboola.query-service-client), a nonexistent class (QueryServiceClient), and a non-matching execute_query() signature (`workspace_id=`, `query=` returning a dict). The real SDK is keboola-query-service on PyPI with a Client class and execute_query(branch_id=, workspace_id=, statements=[...]) returning list[QueryResult] with .columns (Column objects) and .data attributes. Also replaces the KBC_WORKSPACE_MANIFEST_PATH manifest-file flow with the direct env vars the Data App runtime actually sets: BRANCH_ID, WORKSPACE_ID, QUERY_SERVICE_URL, KBC_TOKEN. The Data Integration summary on the Data Apps overview page is updated to match, with a pointer to the canonical env vars list in keboola/data-app-python-js. Co-Authored-By: Claude Opus 4.7 (1M context) --- data-apps/index.md | 2 +- data-apps/storage-access/index.md | 166 ++++++++++++++++-------------- 2 files changed, 92 insertions(+), 76 deletions(-) diff --git a/data-apps/index.md b/data-apps/index.md index 2acea294d..3fc08778d 100644 --- a/data-apps/index.md +++ b/data-apps/index.md @@ -140,7 +140,7 @@ Keboola provides built-in authentication methods to protect your apps: * **Input Mapping**: Automatically load specific tables into your app. * **Storage API Client**: Programmatic access to all Storage features. -* **Environment Variables**: Pre-configured `KBC_URL` and `KBC_TOKEN`. +* **Environment Variables**: Platform-provided env vars include `BRANCH_ID` (always set), `KBC_TOKEN` and `DATA_LOADER_API_URL` (with Data Loader), and `WORKSPACE_ID` / `QUERY_SERVICE_URL` (with [Storage Access](/data-apps/storage-access/)). See the [runtime README](https://github.com/keboola/data-app-python-js/blob/main/README.md#environment-variables) for the full list. ### Configuration & Secrets diff --git a/data-apps/storage-access/index.md b/data-apps/storage-access/index.md index 6e7b88280..4c7905781 100644 --- a/data-apps/storage-access/index.md +++ b/data-apps/storage-access/index.md @@ -52,7 +52,7 @@ Your app communicates with Storage through the [**Query Service API**](https://q - Automatic metadata refresh after writes - Abstraction from the underlying backend -The recommended Python client library is [keboola.query-service-client](https://pypi.org/project/keboola.query-service-client/). +The recommended Python client library is [keboola-query-service](https://pypi.org/project/keboola-query-service/) (also available for JavaScript/TypeScript as [@keboola/query-service](https://www.npmjs.com/package/@keboola/query-service)). ### Workspace Lifecycle @@ -110,7 +110,7 @@ Install the Keboola Query Service client: ```toml dependencies = [ - "keboola.query-service-client>=0.2.0", + "keboola-query-service>=0.2.0", ] ``` @@ -118,27 +118,23 @@ dependencies = [ ```python import os -import json -from keboola.query_service_client import QueryServiceClient - -# Read workspace ID from the manifest file (once at startup) -manifest_path = os.environ.get("KBC_WORKSPACE_MANIFEST_PATH", - "/var/run/secrets/keboola.com/workspace/manifest.json") +from keboola_query_service import Client +# Storage Access env vars are set by the platform when the feature is enabled. try: - with open(manifest_path) as f: - manifest = json.load(f) - workspace_id = manifest["workspaceId"] -except (FileNotFoundError, KeyError) as e: + branch_id = os.environ["BRANCH_ID"] + workspace_id = os.environ["WORKSPACE_ID"] + query_service_url = os.environ["QUERY_SERVICE_URL"] +except KeyError as e: raise RuntimeError( "Storage Access is not enabled for this app. " "Enable it in Advanced Settings and redeploy." ) from e # Initialize the Query Service client -client = QueryServiceClient( +client = Client( + base_url=query_service_url, token=os.environ["KBC_TOKEN"], - url=os.environ["KBC_URL"] ) ``` @@ -150,13 +146,17 @@ To read a table you've selected in the UI: import pandas as pd # Query a table - use the full table ID (bucket.table) -result = client.execute_query( +results = client.execute_query( + branch_id=branch_id, workspace_id=workspace_id, - query='SELECT * FROM "in.c-main"."customers" LIMIT 1000' + statements=['SELECT * FROM "in.c-main"."customers" LIMIT 1000'], ) +# One QueryResult per statement — we sent one statement, so take the first. +result = results[0] + # Convert to DataFrame -df = pd.DataFrame(result["data"], columns=result["columns"]) +df = pd.DataFrame(result.data, columns=[c.name for c in result.columns]) print(df.head()) ``` @@ -172,7 +172,7 @@ You can run any SELECT query against your permitted tables: ```python # Join multiple tables query = """ - SELECT + SELECT c.customer_name, SUM(o.amount) as total_spent FROM "in.c-main"."customers" c @@ -182,7 +182,12 @@ query = """ LIMIT 10 """ -result = client.execute_query(workspace_id=workspace_id, query=query) +results = client.execute_query( + branch_id=branch_id, + workspace_id=workspace_id, + statements=[query], +) +result = results[0] ``` ## Writing Data Back to Storage @@ -196,35 +201,38 @@ Storage Access allows your app to modify data in Storage tables using standard S ### Inserting and Updating Data -You can use standard SQL INSERT and UPDATE statements directly via the Query Service: +You can use standard SQL INSERT and UPDATE statements directly via the Query Service. Pass `statements` as a list — the SDK will execute them (transactionally by default) and return one result per statement: ```python # INSERT new records client.execute_query( + branch_id=branch_id, workspace_id=workspace_id, - query=''' + statements=[''' INSERT INTO "in.c-main"."approvals" ("id", "name", "status", "updated_at") VALUES (1, 'New Record', 'pending', CURRENT_TIMESTAMP) - ''' + '''], ) # UPDATE existing records client.execute_query( + branch_id=branch_id, workspace_id=workspace_id, - query=''' + statements=[''' UPDATE "in.c-main"."approvals" SET status = 'approved', updated_at = CURRENT_TIMESTAMP WHERE id = 123 - ''' + '''], ) # DELETE records client.execute_query( + branch_id=branch_id, workspace_id=workspace_id, - query=''' + statements=[''' DELETE FROM "in.c-main"."approvals" WHERE status = 'cancelled' - ''' + '''], ) ``` @@ -236,8 +244,9 @@ To remove all data from a table: ```python client.execute_query( + branch_id=branch_id, workspace_id=workspace_id, - query='TRUNCATE TABLE "in.c-main"."temp_data"' + statements=['TRUNCATE TABLE "in.c-main"."temp_data"'], ) ``` @@ -260,41 +269,44 @@ query = """ SET status = 'approved', version = version + 1 WHERE id = 123 AND version = 5 """ -result = client.execute_query(workspace_id=workspace_id, query=query) +results = client.execute_query( + branch_id=branch_id, + workspace_id=workspace_id, + statements=[query], +) -if result["rows_affected"] == 0: +if results[0].rows_affected == 0: raise Exception("Record was modified by another user. Please refresh and try again.") ``` ## Environment Variables -When Storage Access is enabled, these environment variables are available to your app: +When Storage Access is enabled, the platform sets these environment variables in your Data App container: | Variable | Description | | --- | --- | -| `KBC_WORKSPACE_MANIFEST_PATH` | Path to the workspace manifest file (contains `workspaceId`) | -| `KBC_TOKEN` | Storage API token (always available) | -| `KBC_URL` | Keboola Connection URL (always available) | +| `WORKSPACE_ID` | ID of the provisioned workspace for this app. | +| `BRANCH_ID` | Storage API branch ID of the project. | +| `QUERY_SERVICE_URL` | URL of the Query Service API (stack-specific). | +| `KBC_TOKEN` | Keboola Storage API token. | -**Reading the workspace ID:** - -The recommended way to obtain the workspace ID is from the manifest file: +If Storage Access is not enabled, `WORKSPACE_ID` / `BRANCH_ID` / `QUERY_SERVICE_URL` are not set. Read them with a clear error message for users: ```python import os -import json -manifest_path = os.environ.get("KBC_WORKSPACE_MANIFEST_PATH", - "/var/run/secrets/keboola.com/workspace/manifest.json") try: - with open(manifest_path) as f: - workspace_id = json.load(f)["workspaceId"] -except (FileNotFoundError, KeyError) as e: + branch_id = os.environ["BRANCH_ID"] + workspace_id = os.environ["WORKSPACE_ID"] + query_service_url = os.environ["QUERY_SERVICE_URL"] +except KeyError as e: raise RuntimeError( "Storage Access is not enabled. Enable it in Advanced Settings and redeploy." ) from e ``` +For the full list of environment variables exposed to Data Apps, see the [data-app-python-js runtime README](https://github.com/keboola/data-app-python-js/blob/main/README.md#environment-variables). + ## Comparison: Input Mapping vs Direct Storage Access | Aspect | Input Mapping | Direct Storage Access | @@ -317,20 +329,17 @@ This example shows a simple Flask app that reads records from Storage and allows ```python from flask import Flask, request, render_template_string import os -import json -from keboola.query_service_client import QueryServiceClient +from keboola_query_service import Client app = Flask(__name__) -# Initialize Query Service client once at startup -manifest_path = os.environ.get("KBC_WORKSPACE_MANIFEST_PATH", - "/var/run/secrets/keboola.com/workspace/manifest.json") -with open(manifest_path) as f: - WORKSPACE_ID = json.load(f)["workspaceId"] +# Read Storage Access env vars once at startup +BRANCH_ID = os.environ["BRANCH_ID"] +WORKSPACE_ID = os.environ["WORKSPACE_ID"] -qs_client = QueryServiceClient( +qs_client = Client( + base_url=os.environ["QUERY_SERVICE_URL"], token=os.environ["KBC_TOKEN"], - url=os.environ["KBC_URL"] ) ALLOWED_STATUSES = {"pending", "approved", "rejected"} @@ -344,23 +353,27 @@ def index(): new_status = request.form["status"] if new_status not in ALLOWED_STATUSES: return "Invalid status", 400 - + qs_client.execute_query( + branch_id=BRANCH_ID, workspace_id=WORKSPACE_ID, - query=f''' + statements=[f''' UPDATE "in.c-main"."approvals" SET status = '{new_status}', updated_at = CURRENT_TIMESTAMP WHERE id = {record_id} - ''' + '''], ) - + # Load current records - result = qs_client.execute_query( + results = qs_client.execute_query( + branch_id=BRANCH_ID, workspace_id=WORKSPACE_ID, - query='SELECT id, name, status, updated_at FROM "in.c-main"."approvals" ORDER BY id' + statements=['SELECT id, name, status, updated_at FROM "in.c-main"."approvals" ORDER BY id'], ) - records = [dict(zip(result["columns"], row)) for row in result["data"]] - + result = results[0] + column_names = [c.name for c in result.columns] + records = [dict(zip(column_names, row)) for row in result.data] + return render_template_string(TEMPLATE, records=records) @@ -408,7 +421,7 @@ version = "0.1.0" requires-python = ">=3.11" dependencies = [ "flask>=3.0.0", - "keboola.query-service-client>=0.2.0", + "keboola-query-service>=0.2.0", ] [build-system] @@ -421,14 +434,13 @@ build-backend = "setuptools.build_meta" **1. Handle missing workspace gracefully** ```python -import os, json +import os -manifest_path = os.environ.get("KBC_WORKSPACE_MANIFEST_PATH", - "/var/run/secrets/keboola.com/workspace/manifest.json") try: - with open(manifest_path) as f: - workspace_id = json.load(f)["workspaceId"] -except (FileNotFoundError, KeyError): + branch_id = os.environ["BRANCH_ID"] + workspace_id = os.environ["WORKSPACE_ID"] + query_service_url = os.environ["QUERY_SERVICE_URL"] +except KeyError: # Storage Access is not enabled — show a user-friendly error import streamlit as st # or use your framework's error handling st.error("Storage Access is not enabled for this app. Enable it in Advanced Settings and redeploy.") @@ -463,19 +475,21 @@ page_size = 1000 last_id = 0 # Start from the beginning while True: - result = client.execute_query( + results = client.execute_query( + branch_id=branch_id, workspace_id=workspace_id, - query=f''' + statements=[f''' SELECT * FROM "in.c-main"."my_table" WHERE id > {last_id} ORDER BY id ASC LIMIT {page_size} - ''' + '''], ) - if not result["data"]: + rows = results[0].data + if not rows: break - process_batch(result["data"]) - last_id = result["data"][-1][0] # Update cursor to last row's id + process_batch(rows) + last_id = rows[-1][0] # Update cursor to last row's id ``` **4. Cache frequently-used data** @@ -487,11 +501,13 @@ import streamlit as st @st.cache_data(ttl=300) # Cache for 5 minutes def load_reference_data(): - result = client.execute_query( + results = client.execute_query( + branch_id=branch_id, workspace_id=workspace_id, - query='SELECT * FROM "in.c-main"."reference_data"' + statements=['SELECT * FROM "in.c-main"."reference_data"'], ) - return pd.DataFrame(result["data"], columns=result["columns"]) + result = results[0] + return pd.DataFrame(result.data, columns=[c.name for c in result.columns]) ``` For **Python/JS** (non-Streamlit) apps, use a simple in-memory cache: From 556d9c4115ef96fe213a1a3d725c65bfeb518267 Mon Sep 17 00:00:00 2001 From: MiroCillik Date: Wed, 22 Apr 2026 10:33:40 +0200 Subject: [PATCH 10/15] docs(storage-access): add SQL-injection warning and SDK-helper forward note Adds a prominent warning callout at the top of the Writing Data Back section stating plainly that the Query Service does not support parameterized queries and the app is responsible for validating untrusted values. Pairs with a tip callout in Best Practices that points at the upcoming SQL escape helpers in the Python and JS SDKs. Addresses review feedback from PR #910 that the existing allowlist / type-coercion pattern is insufficient guidance on its own, especially for arbitrary string input. Co-Authored-By: Claude Opus 4.7 (1M context) --- data-apps/storage-access/index.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/data-apps/storage-access/index.md b/data-apps/storage-access/index.md index 4c7905781..9f62c0e3f 100644 --- a/data-apps/storage-access/index.md +++ b/data-apps/storage-access/index.md @@ -199,6 +199,8 @@ Storage Access allows your app to modify data in Storage tables using standard S - Data correction interfaces - Collaborative editing +{% include warning.html content="The Query Service accepts raw SQL and does not support parameterized queries or server-side bind variables. Your application is responsible for validating every untrusted value before interpolating it into a statement. See the [Validate and sanitize user input](#best-practices) guidance for patterns." %} + ### Inserting and Updating Data You can use standard SQL INSERT and UPDATE statements directly via the Query Service. Pass `statements` as a list — the SDK will execute them (transactionally by default) and return one result per statement: @@ -466,6 +468,8 @@ if status not in ALLOWED_STATUSES: query = f"UPDATE table SET status = '{status}' WHERE id = {safe_id}" ``` +{% include tip.html title="Safer interpolation helpers are coming" content="First-class `SQL.literal()` / `SQL.ident()` / `sql.format()` helpers (with dialect-aware escaping and a `SafeSql` trust marker) are in development in the [Keboola Query Service Python SDK](https://github.com/keboola/query-service-api-python-sdk) and [JavaScript SDK](https://github.com/keboola/query-service-api-js-sdk) and will replace the allowlist/type-coercion patterns above once a release ships. Until then, the validation approach shown here is the recommended interim solution — especially for arbitrary string input, which is genuinely hard to sanitize by hand." %} + **3. Implement keyset pagination for large datasets** Use keyset (cursor-based) pagination instead of OFFSET, which can produce duplicates or gaps on live data: From b6f4e547453b321d998da42367edced7bf73f1c6 Mon Sep 17 00:00:00 2001 From: MiroCillik Date: Thu, 23 Apr 2026 21:30:25 +0200 Subject: [PATCH 11/15] docs(storage-access): address review nits - Expand "Stick with Input Mapping when" from one bullet to three - Remove trailing space on Architecture Overview heading - Fix Step 1 numbering (double-space before list items) - Flask example: expand validation comment to explicitly name the allowlist + int() coercion as the reason the f-string is safe, and warn against adding new form fields without the same guard - Remove unused `from functools import lru_cache` in cache example Co-Authored-By: Claude Opus 4.7 (1M context) --- data-apps/storage-access/index.md | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/data-apps/storage-access/index.md b/data-apps/storage-access/index.md index 9f62c0e3f..881f30f1c 100644 --- a/data-apps/storage-access/index.md +++ b/data-apps/storage-access/index.md @@ -26,11 +26,13 @@ This feature is available for both **Streamlit** and **Python/JS** Data Apps. Co **Stick with Input Mapping when:** -- You don't need write-back capability - the app only reads and displays data +- You don't need write-back capability — the app only reads and displays data. +- Your dataset is small and changes infrequently (e.g., static reference data loaded at deploy time). +- You want the simplest possible setup with no additional configuration. ## How It Works -### Architecture Overview +### Architecture Overview When you enable Storage Access, Keboola creates a dedicated **workspace** for your Data App. This workspace contains a database user with specific permissions (SELECT, INSERT, UPDATE, DELETE, TRUNCATE) on the tables you've selected. @@ -74,9 +76,10 @@ This design ensures: ## Setting Up Storage Access ### Step 1: Enable Storage Access + 1. Go to the **Project Settings**. -2. Go to the **Features**. -3. Find the **Storage Access** feature and activate it. +2. Go to the **Features**. +3. Find the **Storage Access** feature and activate it. ### Step 2: Configure Writable Tables @@ -350,8 +353,12 @@ ALLOWED_STATUSES = {"pending", "approved", "rejected"} @app.route("/", methods=["GET", "POST"]) def index(): if request.method == "POST": - # Validate and sanitize user input - record_id = int(request.form["record_id"]) # ensure integer + # Validate and sanitize user input BEFORE it reaches SQL. + # int() guarantees record_id is a number; the allowlist guarantees + # new_status is one of three exact strings. This is the only reason + # the f-string below is safe — do not add other form fields to the + # query without analogous validation. + record_id = int(request.form["record_id"]) new_status = request.form["status"] if new_status not in ALLOWED_STATUSES: return "Invalid status", 400 @@ -517,7 +524,6 @@ def load_reference_data(): For **Python/JS** (non-Streamlit) apps, use a simple in-memory cache: ```python -from functools import lru_cache import time _cache = {} From a65270376f3b5f0f5e7f5c566a76a86bc697dcd7 Mon Sep 17 00:00:00 2001 From: MiroCillik Date: Fri, 24 Apr 2026 11:06:04 +0200 Subject: [PATCH 12/15] docs(storage-access): read workspace_id from manifest file MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Switches all code examples to read the workspace ID from the manifest file at KBC_WORKSPACE_MANIFEST_PATH (which contains workspaceId plus other workspace metadata) instead of the WORKSPACE_ID env var. This matches the recommended pattern from the Data App runtime — the env var is still set, but the manifest is the canonical source. - Adds KBC_WORKSPACE_MANIFEST_PATH row to the env vars table and notes WORKSPACE_ID is still available but manifest is preferred - Updates all four example snippets (Using the Client, env vars section example, Flask app, and Best Practices #1) to read the manifest with proper (KeyError, FileNotFoundError) error handling - Mentions KBC_WORKSPACE_MANIFEST_PATH in the Data Apps overview page env vars summary Co-Authored-By: Claude Opus 4.7 (1M context) --- data-apps/index.md | 2 +- data-apps/storage-access/index.md | 34 +++++++++++++++++++++---------- 2 files changed, 24 insertions(+), 12 deletions(-) diff --git a/data-apps/index.md b/data-apps/index.md index 3fc08778d..ce5816915 100644 --- a/data-apps/index.md +++ b/data-apps/index.md @@ -140,7 +140,7 @@ Keboola provides built-in authentication methods to protect your apps: * **Input Mapping**: Automatically load specific tables into your app. * **Storage API Client**: Programmatic access to all Storage features. -* **Environment Variables**: Platform-provided env vars include `BRANCH_ID` (always set), `KBC_TOKEN` and `DATA_LOADER_API_URL` (with Data Loader), and `WORKSPACE_ID` / `QUERY_SERVICE_URL` (with [Storage Access](/data-apps/storage-access/)). See the [runtime README](https://github.com/keboola/data-app-python-js/blob/main/README.md#environment-variables) for the full list. +* **Environment Variables**: Platform-provided env vars include `BRANCH_ID` (always set), `KBC_TOKEN` and `DATA_LOADER_API_URL` (with Data Loader), and `WORKSPACE_ID` / `QUERY_SERVICE_URL` / `KBC_WORKSPACE_MANIFEST_PATH` (with [Storage Access](/data-apps/storage-access/)). See the [runtime README](https://github.com/keboola/data-app-python-js/blob/main/README.md#environment-variables) for the full list. ### Configuration & Secrets diff --git a/data-apps/storage-access/index.md b/data-apps/storage-access/index.md index 881f30f1c..56e1b958c 100644 --- a/data-apps/storage-access/index.md +++ b/data-apps/storage-access/index.md @@ -120,15 +120,19 @@ dependencies = [ **In your Python code:** ```python +import json import os from keboola_query_service import Client -# Storage Access env vars are set by the platform when the feature is enabled. +# Storage Access config is set by the platform when the feature is enabled. +# workspace_id is read from the manifest file (recommended); the other values +# are plain env vars. try: branch_id = os.environ["BRANCH_ID"] - workspace_id = os.environ["WORKSPACE_ID"] query_service_url = os.environ["QUERY_SERVICE_URL"] -except KeyError as e: + with open(os.environ["KBC_WORKSPACE_MANIFEST_PATH"]) as f: + workspace_id = json.load(f)["workspaceId"] +except (KeyError, FileNotFoundError) as e: raise RuntimeError( "Storage Access is not enabled for this app. " "Enable it in Advanced Settings and redeploy." @@ -290,21 +294,24 @@ When Storage Access is enabled, the platform sets these environment variables in | Variable | Description | | --- | --- | -| `WORKSPACE_ID` | ID of the provisioned workspace for this app. | +| `KBC_WORKSPACE_MANIFEST_PATH` | Path to the workspace manifest JSON file. The file contains `workspaceId` (and other workspace metadata). **Recommended source for the workspace ID.** | +| `WORKSPACE_ID` | ID of the provisioned workspace for this app. Also available in the manifest file (above) — prefer reading the manifest in new code. | | `BRANCH_ID` | Storage API branch ID of the project. | | `QUERY_SERVICE_URL` | URL of the Query Service API (stack-specific). | | `KBC_TOKEN` | Keboola Storage API token. | -If Storage Access is not enabled, `WORKSPACE_ID` / `BRANCH_ID` / `QUERY_SERVICE_URL` are not set. Read them with a clear error message for users: +If Storage Access is not enabled, `KBC_WORKSPACE_MANIFEST_PATH` / `WORKSPACE_ID` / `BRANCH_ID` / `QUERY_SERVICE_URL` are not set. Read them with a clear error message for users: ```python +import json import os try: branch_id = os.environ["BRANCH_ID"] - workspace_id = os.environ["WORKSPACE_ID"] query_service_url = os.environ["QUERY_SERVICE_URL"] -except KeyError as e: + with open(os.environ["KBC_WORKSPACE_MANIFEST_PATH"]) as f: + workspace_id = json.load(f)["workspaceId"] +except (KeyError, FileNotFoundError) as e: raise RuntimeError( "Storage Access is not enabled. Enable it in Advanced Settings and redeploy." ) from e @@ -333,14 +340,17 @@ This example shows a simple Flask app that reads records from Storage and allows ```python from flask import Flask, request, render_template_string +import json import os from keboola_query_service import Client app = Flask(__name__) -# Read Storage Access env vars once at startup +# Read Storage Access config once at startup. +# workspace_id is read from the manifest (recommended); other values are env vars. BRANCH_ID = os.environ["BRANCH_ID"] -WORKSPACE_ID = os.environ["WORKSPACE_ID"] +with open(os.environ["KBC_WORKSPACE_MANIFEST_PATH"]) as f: + WORKSPACE_ID = json.load(f)["workspaceId"] qs_client = Client( base_url=os.environ["QUERY_SERVICE_URL"], @@ -443,13 +453,15 @@ build-backend = "setuptools.build_meta" **1. Handle missing workspace gracefully** ```python +import json import os try: branch_id = os.environ["BRANCH_ID"] - workspace_id = os.environ["WORKSPACE_ID"] query_service_url = os.environ["QUERY_SERVICE_URL"] -except KeyError: + with open(os.environ["KBC_WORKSPACE_MANIFEST_PATH"]) as f: + workspace_id = json.load(f)["workspaceId"] +except (KeyError, FileNotFoundError): # Storage Access is not enabled — show a user-friendly error import streamlit as st # or use your framework's error handling st.error("Storage Access is not enabled for this app. Enable it in Advanced Settings and redeploy.") From 239b66c6afbf0d399af6a62de2d7b8f24b6d1103 Mon Sep 17 00:00:00 2001 From: MiroCillik Date: Tue, 28 Apr 2026 12:23:14 +0200 Subject: [PATCH 13/15] docs(storage-access): document JSON config for programmatic writable-table setup Adds a subsection under Step 2 showing the storage.output.tables shape that the platform writes when a writable table is added through the UI, so developers and agents can do the same via the Storage API directly. Documents the destination + unload_strategy: "direct-grant" marker that flags a table as exposed through Storage Access. Co-Authored-By: Claude Opus 4.7 (1M context) --- data-apps/storage-access/index.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/data-apps/storage-access/index.md b/data-apps/storage-access/index.md index 56e1b958c..1dd8542c8 100644 --- a/data-apps/storage-access/index.md +++ b/data-apps/storage-access/index.md @@ -95,6 +95,30 @@ This design ensures: - You can add multiple tables from different buckets. - All selected tables must exist before deploying. +#### Configuring Writable Tables Programmatically + +If you manage Data App configurations through the Storage API (or via automation/agents) rather than the UI, the same writable-table selection is expressed in the configuration JSON under `storage.output.tables`. Each entry is a table the app gets read/write permissions on: + +```json +{ + "storage": { + "output": { + "tables": [ + { + "destination": "out.c-data-app.mvc-crashes", + "unload_strategy": "direct-grant" + } + ] + } + } +} +``` + +- **`destination`** — the full Storage table ID (`..`) the app should be able to read and write. The table must exist before the app is deployed. +- **`unload_strategy: "direct-grant"`** — required marker that tells the platform "grant the app's workspace direct SELECT/INSERT/UPDATE/DELETE/TRUNCATE on this table." Tables without this strategy in `storage.output.tables` are not exposed via Storage Access. + +To add or remove writable tables programmatically, update the Data App's configuration via the Storage API ([Component Configurations endpoint](https://keboola.docs.apiary.io/#reference/component-configurations)) and redeploy the app for the new permissions to take effect. + ### Step 3: Deploy Your App Click **Deploy** (or **Redeploy** for existing apps). During deployment: From e872ced1b7b7f9258da54715f39e2af3f82c75b5 Mon Sep 17 00:00:00 2001 From: MiroCillik Date: Wed, 29 Apr 2026 09:04:46 +0200 Subject: [PATCH 14/15] docs(storage-access): fix Query Service API docs link, surface Snowflake-only MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Correct https://query.keboola.com/api-docs/ → https://query.keboola.com/api/v1/documentation - Add a prominent warning callout in the Overview that Storage Access currently supports only Snowflake-backed projects, BigQuery coming soon. The same constraint was already in the Limitations section but is easy to miss at the bottom of the page. Co-Authored-By: Claude Opus 4.7 (1M context) --- data-apps/storage-access/index.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/data-apps/storage-access/index.md b/data-apps/storage-access/index.md index 1dd8542c8..1410299c9 100644 --- a/data-apps/storage-access/index.md +++ b/data-apps/storage-access/index.md @@ -16,6 +16,8 @@ Storage Access allows your Data App to read data from and write data back to Keb This feature is available for both **Streamlit** and **Python/JS** Data Apps. Code examples on this page use Python; the same concepts apply when calling the Query Service API from JavaScript. +{% include warning.html content="**Snowflake only.** Storage Access currently works only on projects using the Snowflake storage backend. BigQuery support is coming soon." %} + ## When to Use Storage Access **Use Storage Access when you need to:** @@ -47,7 +49,7 @@ Query Service ────► Workspace User ────► Storage Tables billing, metadata refresh with granted permissions ``` -Your app communicates with Storage through the [**Query Service API**](https://query.keboola.com/api-docs/), not directly with Snowflake. This provides: +Your app communicates with Storage through the [**Query Service API**](https://query.keboola.com/api/v1/documentation), not directly with Snowflake. This provides: - Automatic authentication using your app's token - Usage tracking for billing From 3e00abb31eb66c3509efbb9d5489c8ea3fa4b450 Mon Sep 17 00:00:00 2001 From: MiroCillik Date: Wed, 29 Apr 2026 09:12:42 +0200 Subject: [PATCH 15/15] docs(storage-access): update Query Service API docs link Switch to https://api.keboola.com/?service=query (the canonical service-discovery URL) instead of the previous query.keboola.com path. Co-Authored-By: Claude Opus 4.7 (1M context) --- data-apps/storage-access/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/data-apps/storage-access/index.md b/data-apps/storage-access/index.md index 1410299c9..056368250 100644 --- a/data-apps/storage-access/index.md +++ b/data-apps/storage-access/index.md @@ -49,7 +49,7 @@ Query Service ────► Workspace User ────► Storage Tables billing, metadata refresh with granted permissions ``` -Your app communicates with Storage through the [**Query Service API**](https://query.keboola.com/api/v1/documentation), not directly with Snowflake. This provides: +Your app communicates with Storage through the [**Query Service API**](https://api.keboola.com/?service=query), not directly with Snowflake. This provides: - Automatic authentication using your app's token - Usage tracking for billing