Skip to content

Commit 81a11d5

Browse files
docs: add Custom Python component documentation
Co-Authored-By: Monika Feigler <monika@feigler.cz>
1 parent e4f9542 commit 81a11d5

2 files changed

Lines changed: 336 additions & 0 deletions

File tree

_data/navigation.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -567,6 +567,8 @@ items:
567567
title: dbt Cloud Job Trigger
568568
- url: /components/applications/triggers/deepnote-notebook-execution-trigger/
569569
title: Deepnote Notebook Execution Trigger
570+
- url: /components/applications/custom-python/
571+
title: Custom Python
570572
- url: /components/applications/data-gateway/
571573
title: Data Gateway
572574
- url: /components/applications/ai/
Lines changed: 334 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,334 @@
1+
---
2+
title: Custom Python
3+
permalink: /components/applications/custom-python/
4+
---
5+
6+
* TOC
7+
{:toc}
8+
9+
This component allows you to run custom Python code directly within Keboola. Its primary purpose is to enable fast creation of
10+
custom integrations (connectors) that can be configured and executed inside Keboola — removing the need to build and maintain
11+
a separate dedicated component.
12+
13+
## Comparison to Python Transformations
14+
- **[Python Transformations](/transformations/python-plain/)**: Designed exclusively for transforming data already present
15+
in Keboola Storage. Results are written back into Storage.
16+
- **Custom Python Component**: Ideal for creating integrations or applications that interact with external systems,
17+
download or push data, or require user-provided secure parameters (e.g., API keys, passwords).
18+
19+
{% include warning.html content="Do not use Python Transformations for integrations with external systems requiring secure parameters. Use the Custom Python component instead." %}
20+
21+
## Key Features
22+
- **Secure Parameters** — Safely provide encrypted parameters (API keys, tokens, passwords) via the Keboola UI
23+
using the `#` prefix.
24+
- **Customizable Environment** — A clean container image to install only the dependencies you need, with support
25+
for multiple Python versions.
26+
- **Flexible Code Execution** — Run code from a public or private Git repository, or directly within the Keboola UI
27+
configuration.
28+
29+
## Configuration
30+
[Create a new configuration](/components/#creating-component-configuration) of the **Custom Python** application.
31+
32+
The configuration has the following parameters:
33+
34+
- **Python Version & Environment Isolation** (`venv`) — Select the Python version and isolation mode:
35+
- `3.13` *(recommended)* — Isolated environment with only the packages of your choice.
36+
- `3.14` — Isolated environment (release candidate).
37+
- `3.12` — Isolated environment.
38+
- `base` — Shared environment (Python 3.10) containing many pre-installed packages in legacy versions.
39+
This option is not recommended and will be deprecated in the future.
40+
- **User Parameters** (`user_properties`) — A JSON object containing custom configuration parameters.
41+
Key names prefixed with `#` will be encrypted upon saving.
42+
These parameters are accessible in your code via `CommonInterface`.
43+
- **Source Code & Dependencies** (`source`) — Choose where the code comes from:
44+
- `code` *(default)* — Enter code manually in the Keboola UI.
45+
- `git` — Fetch code from a Git repository.
46+
- **Python Packages** (`packages`) — An array of extra packages to install (available only when `source` is `code`).
47+
- **Python Code** (`code`) — The Python code to run (available only when `source` is `code`).
48+
- **Git Repository Source Settings** (`git`) — Configuration for the Git repository
49+
(available only when `source` is `git`). See [Git Configuration](#git-configuration).
50+
51+
{% include tip.html content="We recommend using an **isolated environment** (Python 3.12+). The shared `base` environment
52+
may lead to package collisions and will become subject to deprecation. Update your code regularly to run with the latest
53+
package versions to avoid security vulnerabilities." %}
54+
55+
### Git Configuration
56+
When using a Git repository as the code source, configure the following:
57+
58+
- **Repository URL** (`url`) — Supports both HTTPS and SSH formats.
59+
- **Repository Visibility & Authentication** (`auth`):
60+
- `none` *(default)* — Public repository, no authentication.
61+
- `pat` — Private repository, authenticating with a Personal Access Token.
62+
- `ssh` — Private repository, authenticating with an SSH key.
63+
- **Personal Access Token** (`#token`) — Required when `auth` is `pat`. This value is encrypted in Keboola Storage.
64+
- **SSH Keys** (`ssh_keys`) — Required when `auth` is `ssh`. Contains:
65+
- `public` — Public key saved in your Git project (stored for reference only).
66+
- `#private` — Private key used for authentication (encrypted in Keboola Storage).
67+
- **Branch Name** (`branch`) — The branch to check out. If left empty, `main` is used.
68+
The UI provides a dynamic branch selector.
69+
- **Script Filename** (`filename`) — The Python script to execute. If left empty, `main.py` is used.
70+
The UI lists available files from the repository.
71+
72+
When using a Git repository, specify your dependencies in one of the following ways (files must be in the repository root):
73+
74+
- A `pyproject.toml` file accompanied by a `uv.lock` file *(recommended, modern approach)*.
75+
- A `requirements.txt` file *(legacy approach)*.
76+
77+
If both are present, `pyproject.toml` with `uv.lock` takes precedence.
78+
79+
## Code Examples
80+
81+
### Accessing User Parameters
82+
The code below is pre-populated in every new configuration. It shows how to access user parameters
83+
via `CommonInterface`:
84+
85+
```python
86+
from keboola.component import CommonInterface
87+
88+
ci = CommonInterface()
89+
# Access user parameters
90+
print(ci.configuration.parameters)
91+
```
92+
93+
### Loading Configuration Parameters
94+
95+
```python
96+
import logging
97+
98+
from keboola.component import CommonInterface
99+
100+
SOME_PARAMETER = "some_user_parameter"
101+
REQUIRED_PARAMETERS = [SOME_PARAMETER]
102+
103+
# Initialize the interface
104+
ci = CommonInterface()
105+
106+
# Validate required parameters (raises ValueError if any is missing)
107+
ci.validate_configuration_parameters(REQUIRED_PARAMETERS)
108+
109+
# Print KBC Project ID from environment variables
110+
logging.info(ci.environment_variables.project_id)
111+
112+
# Load a specific configuration parameter
113+
logging.info(ci.configuration.parameters[SOME_PARAMETER])
114+
```
115+
116+
### Creating an Output Table with Schema
117+
118+
```python
119+
import csv
120+
121+
from keboola.component import CommonInterface
122+
from keboola.component.dao import BaseType, ColumnDefinition
123+
124+
ci = CommonInterface()
125+
126+
# Define the table schema
127+
schema = {
128+
"id": ColumnDefinition(
129+
data_types=BaseType.integer(),
130+
primary_key=True,
131+
),
132+
"created_at": ColumnDefinition(data_types=BaseType.timestamp()),
133+
"status": ColumnDefinition(), # Default type is string
134+
"value": ColumnDefinition(data_types=BaseType.numeric(length="38,2")),
135+
}
136+
137+
# Create table definition
138+
out_table = ci.create_out_table_definition(
139+
name="results.csv",
140+
destination="out.c-data.results",
141+
schema=schema,
142+
incremental=True,
143+
has_header=True,
144+
)
145+
146+
# Write data to the output file
147+
with open(out_table.full_path, "w+", newline="") as f:
148+
writer = csv.DictWriter(f, fieldnames=out_table.column_names)
149+
writer.writeheader()
150+
writer.writerow({
151+
"id": "1",
152+
"created_at": "2023-01-15T14:30:00Z",
153+
"status": "completed",
154+
"value": "123.45",
155+
})
156+
157+
# Write the manifest file
158+
ci.write_manifest(out_table)
159+
```
160+
161+
### Accessing Input Tables from Mapping
162+
163+
```python
164+
import csv
165+
166+
from keboola.component import CommonInterface
167+
168+
ci = CommonInterface()
169+
170+
# Access input mapping configuration
171+
input_tables = ci.configuration.tables_input_mapping
172+
173+
for table in input_tables:
174+
table_name = table.destination
175+
176+
# Load table definition from manifest
177+
table_def = ci.get_input_table_definition_by_name(table_name)
178+
179+
print(f"Processing table: {table_name}")
180+
print(f" Columns: {table_def.column_names}")
181+
182+
# Read data from the CSV file
183+
with open(table_def.full_path, "r") as input_file:
184+
csv_reader = csv.DictReader(input_file)
185+
for row in csv_reader:
186+
print(f" Row: {row}")
187+
```
188+
189+
### Processing Input Files
190+
191+
```python
192+
import logging
193+
194+
from keboola.component import CommonInterface
195+
196+
ci = CommonInterface()
197+
198+
# Get input files with specific tags (only latest versions)
199+
input_files = ci.get_input_files_definitions(
200+
tags=["images", "documents"],
201+
only_latest_files=True,
202+
)
203+
204+
for file in input_files:
205+
print(f"Processing file: {file.name}")
206+
print(f" Full path: {file.full_path}")
207+
print(f" Tags: {file.tags}")
208+
```
209+
210+
### Creating Output Files
211+
212+
```python
213+
import json
214+
215+
from keboola.component import CommonInterface
216+
217+
ci = CommonInterface()
218+
219+
output_file = ci.create_out_file_definition(
220+
name="results.json",
221+
tags=["processed", "results"],
222+
is_public=False,
223+
is_permanent=True,
224+
)
225+
226+
with open(output_file.full_path, "w") as f:
227+
json.dump({"status": "success", "processed_records": 42}, f)
228+
229+
ci.write_manifest(output_file)
230+
```
231+
232+
### Using State Files for Incremental Processing
233+
[State files](https://developers.keboola.com/extend/common-interface/config-file/#state-file) allow your component to
234+
store and retrieve information between runs. This is useful for incremental processing or tracking the last processed data.
235+
236+
```python
237+
from datetime import datetime
238+
239+
from keboola.component import CommonInterface
240+
241+
ci = CommonInterface()
242+
243+
# Load state from the previous run
244+
state = ci.get_state_file()
245+
246+
last_updated = state.get("last_updated", "1970-01-01T00:00:00Z")
247+
print(f"Last processed data up to: {last_updated}")
248+
249+
# Process data (only new data since last_updated)
250+
processed_items = [
251+
{"id": 1, "timestamp": "2023-05-15T10:30:00Z"},
252+
{"id": 2, "timestamp": "2023-05-16T14:45:00Z"},
253+
]
254+
255+
if processed_items:
256+
processed_items.sort(key=lambda x: x["timestamp"])
257+
new_last_updated = processed_items[-1]["timestamp"]
258+
else:
259+
new_last_updated = last_updated
260+
261+
# Store the new state for the next run
262+
ci.write_state_file({
263+
"last_updated": new_last_updated,
264+
"processed_count": len(processed_items),
265+
"last_run": datetime.now().isoformat(),
266+
})
267+
```
268+
269+
### Handling Errors
270+
271+
```python
272+
import logging
273+
274+
from keboola.component.exceptions import UserException
275+
276+
try:
277+
do_something()
278+
except UserException as exc:
279+
logging.exception(exc)
280+
exit(1) # Exit code 1 = user error
281+
except Exception as exc:
282+
logging.exception(exc)
283+
exit(2) # Exit code 2 = application error
284+
```
285+
286+
### Logging
287+
Always use the `logging` library, as it integrates with Keboola's rich logger after `CommonInterface` initialization:
288+
289+
```python
290+
import logging
291+
292+
from keboola.component import CommonInterface
293+
294+
ci = CommonInterface()
295+
296+
logging.info("Info message")
297+
logging.warning("Warning message")
298+
logging.exception(exception, extra={"additional_detail": "xxx"})
299+
```
300+
301+
## Running Code from a Git Repository
302+
We have prepared a
303+
[simple example project](https://github.com/keboola/component-custom-python-example-repo-1)
304+
to help you get started with running custom Python code from a Git repository. It can also serve as a template
305+
for your future projects.
306+
307+
Example configuration for running code from a public Git repository:
308+
309+
```json
310+
{
311+
"parameters": {
312+
"source": "git",
313+
"venv": "3.13",
314+
"git": {
315+
"url": "https://github.com/keboola/component-custom-python-example-repo-1.git",
316+
"branch": "main",
317+
"filename": "main.py",
318+
"auth": "none"
319+
},
320+
"user_properties": {
321+
"debug": true
322+
}
323+
}
324+
}
325+
```
326+
327+
## Listing Pre-Installed Packages
328+
If you are using the shared `base` environment and want to see which packages are available, run the following code:
329+
330+
```python
331+
import subprocess
332+
333+
subprocess.check_call(["uv", "pip", "list"])
334+
```

0 commit comments

Comments
 (0)