|
| 1 | +--- |
| 2 | +title: Custom Python |
| 3 | +permalink: /components/applications/custom-python/ |
| 4 | +--- |
| 5 | + |
| 6 | +* TOC |
| 7 | +{:toc} |
| 8 | + |
| 9 | +This component allows you to run custom Python code directly within Keboola. Its primary purpose is to enable fast creation of |
| 10 | +custom integrations (connectors) that can be configured and executed inside Keboola — removing the need to build and maintain |
| 11 | +a separate dedicated component. |
| 12 | + |
| 13 | +## Comparison to Python Transformations |
| 14 | +- **[Python Transformations](/transformations/python-plain/)**: Designed exclusively for transforming data already present |
| 15 | + in Keboola Storage. Results are written back into Storage. |
| 16 | +- **Custom Python Component**: Ideal for creating integrations or applications that interact with external systems, |
| 17 | + download or push data, or require user-provided secure parameters (e.g., API keys, passwords). |
| 18 | + |
| 19 | +{% include warning.html content="Do not use Python Transformations for integrations with external systems requiring secure parameters. Use the Custom Python component instead." %} |
| 20 | + |
| 21 | +## Key Features |
| 22 | +- **Secure Parameters** — Safely provide encrypted parameters (API keys, tokens, passwords) via the Keboola UI |
| 23 | + using the `#` prefix. |
| 24 | +- **Customizable Environment** — A clean container image to install only the dependencies you need, with support |
| 25 | + for multiple Python versions. |
| 26 | +- **Flexible Code Execution** — Run code from a public or private Git repository, or directly within the Keboola UI |
| 27 | + configuration. |
| 28 | + |
| 29 | +## Configuration |
| 30 | +[Create a new configuration](/components/#creating-component-configuration) of the **Custom Python** application. |
| 31 | + |
| 32 | +The configuration has the following parameters: |
| 33 | + |
| 34 | +- **Python Version & Environment Isolation** (`venv`) — Select the Python version and isolation mode: |
| 35 | + - `3.13` *(recommended)* — Isolated environment with only the packages of your choice. |
| 36 | + - `3.14` — Isolated environment (release candidate). |
| 37 | + - `3.12` — Isolated environment. |
| 38 | + - `base` — Shared environment (Python 3.10) containing many pre-installed packages in legacy versions. |
| 39 | + This option is not recommended and will be deprecated in the future. |
| 40 | +- **User Parameters** (`user_properties`) — A JSON object containing custom configuration parameters. |
| 41 | + Key names prefixed with `#` will be encrypted upon saving. |
| 42 | + These parameters are accessible in your code via `CommonInterface`. |
| 43 | +- **Source Code & Dependencies** (`source`) — Choose where the code comes from: |
| 44 | + - `code` *(default)* — Enter code manually in the Keboola UI. |
| 45 | + - `git` — Fetch code from a Git repository. |
| 46 | +- **Python Packages** (`packages`) — An array of extra packages to install (available only when `source` is `code`). |
| 47 | +- **Python Code** (`code`) — The Python code to run (available only when `source` is `code`). |
| 48 | +- **Git Repository Source Settings** (`git`) — Configuration for the Git repository |
| 49 | + (available only when `source` is `git`). See [Git Configuration](#git-configuration). |
| 50 | + |
| 51 | +{% include tip.html content="We recommend using an **isolated environment** (Python 3.12+). The shared `base` environment |
| 52 | +may lead to package collisions and will become subject to deprecation. Update your code regularly to run with the latest |
| 53 | +package versions to avoid security vulnerabilities." %} |
| 54 | + |
| 55 | +### Git Configuration |
| 56 | +When using a Git repository as the code source, configure the following: |
| 57 | + |
| 58 | +- **Repository URL** (`url`) — Supports both HTTPS and SSH formats. |
| 59 | +- **Repository Visibility & Authentication** (`auth`): |
| 60 | + - `none` *(default)* — Public repository, no authentication. |
| 61 | + - `pat` — Private repository, authenticating with a Personal Access Token. |
| 62 | + - `ssh` — Private repository, authenticating with an SSH key. |
| 63 | +- **Personal Access Token** (`#token`) — Required when `auth` is `pat`. This value is encrypted in Keboola Storage. |
| 64 | +- **SSH Keys** (`ssh_keys`) — Required when `auth` is `ssh`. Contains: |
| 65 | + - `public` — Public key saved in your Git project (stored for reference only). |
| 66 | + - `#private` — Private key used for authentication (encrypted in Keboola Storage). |
| 67 | +- **Branch Name** (`branch`) — The branch to check out. If left empty, `main` is used. |
| 68 | + The UI provides a dynamic branch selector. |
| 69 | +- **Script Filename** (`filename`) — The Python script to execute. If left empty, `main.py` is used. |
| 70 | + The UI lists available files from the repository. |
| 71 | + |
| 72 | +When using a Git repository, specify your dependencies in one of the following ways (files must be in the repository root): |
| 73 | + |
| 74 | +- A `pyproject.toml` file accompanied by a `uv.lock` file *(recommended, modern approach)*. |
| 75 | +- A `requirements.txt` file *(legacy approach)*. |
| 76 | + |
| 77 | +If both are present, `pyproject.toml` with `uv.lock` takes precedence. |
| 78 | + |
| 79 | +## Code Examples |
| 80 | + |
| 81 | +### Accessing User Parameters |
| 82 | +The code below is pre-populated in every new configuration. It shows how to access user parameters |
| 83 | +via `CommonInterface`: |
| 84 | + |
| 85 | +```python |
| 86 | +from keboola.component import CommonInterface |
| 87 | + |
| 88 | +ci = CommonInterface() |
| 89 | +# Access user parameters |
| 90 | +print(ci.configuration.parameters) |
| 91 | +``` |
| 92 | + |
| 93 | +### Loading Configuration Parameters |
| 94 | + |
| 95 | +```python |
| 96 | +import logging |
| 97 | + |
| 98 | +from keboola.component import CommonInterface |
| 99 | + |
| 100 | +SOME_PARAMETER = "some_user_parameter" |
| 101 | +REQUIRED_PARAMETERS = [SOME_PARAMETER] |
| 102 | + |
| 103 | +# Initialize the interface |
| 104 | +ci = CommonInterface() |
| 105 | + |
| 106 | +# Validate required parameters (raises ValueError if any is missing) |
| 107 | +ci.validate_configuration_parameters(REQUIRED_PARAMETERS) |
| 108 | + |
| 109 | +# Print KBC Project ID from environment variables |
| 110 | +logging.info(ci.environment_variables.project_id) |
| 111 | + |
| 112 | +# Load a specific configuration parameter |
| 113 | +logging.info(ci.configuration.parameters[SOME_PARAMETER]) |
| 114 | +``` |
| 115 | + |
| 116 | +### Creating an Output Table with Schema |
| 117 | + |
| 118 | +```python |
| 119 | +import csv |
| 120 | + |
| 121 | +from keboola.component import CommonInterface |
| 122 | +from keboola.component.dao import BaseType, ColumnDefinition |
| 123 | + |
| 124 | +ci = CommonInterface() |
| 125 | + |
| 126 | +# Define the table schema |
| 127 | +schema = { |
| 128 | + "id": ColumnDefinition( |
| 129 | + data_types=BaseType.integer(), |
| 130 | + primary_key=True, |
| 131 | + ), |
| 132 | + "created_at": ColumnDefinition(data_types=BaseType.timestamp()), |
| 133 | + "status": ColumnDefinition(), # Default type is string |
| 134 | + "value": ColumnDefinition(data_types=BaseType.numeric(length="38,2")), |
| 135 | +} |
| 136 | + |
| 137 | +# Create table definition |
| 138 | +out_table = ci.create_out_table_definition( |
| 139 | + name="results.csv", |
| 140 | + destination="out.c-data.results", |
| 141 | + schema=schema, |
| 142 | + incremental=True, |
| 143 | + has_header=True, |
| 144 | +) |
| 145 | + |
| 146 | +# Write data to the output file |
| 147 | +with open(out_table.full_path, "w+", newline="") as f: |
| 148 | + writer = csv.DictWriter(f, fieldnames=out_table.column_names) |
| 149 | + writer.writeheader() |
| 150 | + writer.writerow({ |
| 151 | + "id": "1", |
| 152 | + "created_at": "2023-01-15T14:30:00Z", |
| 153 | + "status": "completed", |
| 154 | + "value": "123.45", |
| 155 | + }) |
| 156 | + |
| 157 | +# Write the manifest file |
| 158 | +ci.write_manifest(out_table) |
| 159 | +``` |
| 160 | + |
| 161 | +### Accessing Input Tables from Mapping |
| 162 | + |
| 163 | +```python |
| 164 | +import csv |
| 165 | + |
| 166 | +from keboola.component import CommonInterface |
| 167 | + |
| 168 | +ci = CommonInterface() |
| 169 | + |
| 170 | +# Access input mapping configuration |
| 171 | +input_tables = ci.configuration.tables_input_mapping |
| 172 | + |
| 173 | +for table in input_tables: |
| 174 | + table_name = table.destination |
| 175 | + |
| 176 | + # Load table definition from manifest |
| 177 | + table_def = ci.get_input_table_definition_by_name(table_name) |
| 178 | + |
| 179 | + print(f"Processing table: {table_name}") |
| 180 | + print(f" Columns: {table_def.column_names}") |
| 181 | + |
| 182 | + # Read data from the CSV file |
| 183 | + with open(table_def.full_path, "r") as input_file: |
| 184 | + csv_reader = csv.DictReader(input_file) |
| 185 | + for row in csv_reader: |
| 186 | + print(f" Row: {row}") |
| 187 | +``` |
| 188 | + |
| 189 | +### Processing Input Files |
| 190 | + |
| 191 | +```python |
| 192 | +import logging |
| 193 | + |
| 194 | +from keboola.component import CommonInterface |
| 195 | + |
| 196 | +ci = CommonInterface() |
| 197 | + |
| 198 | +# Get input files with specific tags (only latest versions) |
| 199 | +input_files = ci.get_input_files_definitions( |
| 200 | + tags=["images", "documents"], |
| 201 | + only_latest_files=True, |
| 202 | +) |
| 203 | + |
| 204 | +for file in input_files: |
| 205 | + print(f"Processing file: {file.name}") |
| 206 | + print(f" Full path: {file.full_path}") |
| 207 | + print(f" Tags: {file.tags}") |
| 208 | +``` |
| 209 | + |
| 210 | +### Creating Output Files |
| 211 | + |
| 212 | +```python |
| 213 | +import json |
| 214 | + |
| 215 | +from keboola.component import CommonInterface |
| 216 | + |
| 217 | +ci = CommonInterface() |
| 218 | + |
| 219 | +output_file = ci.create_out_file_definition( |
| 220 | + name="results.json", |
| 221 | + tags=["processed", "results"], |
| 222 | + is_public=False, |
| 223 | + is_permanent=True, |
| 224 | +) |
| 225 | + |
| 226 | +with open(output_file.full_path, "w") as f: |
| 227 | + json.dump({"status": "success", "processed_records": 42}, f) |
| 228 | + |
| 229 | +ci.write_manifest(output_file) |
| 230 | +``` |
| 231 | + |
| 232 | +### Using State Files for Incremental Processing |
| 233 | +[State files](https://developers.keboola.com/extend/common-interface/config-file/#state-file) allow your component to |
| 234 | +store and retrieve information between runs. This is useful for incremental processing or tracking the last processed data. |
| 235 | + |
| 236 | +```python |
| 237 | +from datetime import datetime |
| 238 | + |
| 239 | +from keboola.component import CommonInterface |
| 240 | + |
| 241 | +ci = CommonInterface() |
| 242 | + |
| 243 | +# Load state from the previous run |
| 244 | +state = ci.get_state_file() |
| 245 | + |
| 246 | +last_updated = state.get("last_updated", "1970-01-01T00:00:00Z") |
| 247 | +print(f"Last processed data up to: {last_updated}") |
| 248 | + |
| 249 | +# Process data (only new data since last_updated) |
| 250 | +processed_items = [ |
| 251 | + {"id": 1, "timestamp": "2023-05-15T10:30:00Z"}, |
| 252 | + {"id": 2, "timestamp": "2023-05-16T14:45:00Z"}, |
| 253 | +] |
| 254 | + |
| 255 | +if processed_items: |
| 256 | + processed_items.sort(key=lambda x: x["timestamp"]) |
| 257 | + new_last_updated = processed_items[-1]["timestamp"] |
| 258 | +else: |
| 259 | + new_last_updated = last_updated |
| 260 | + |
| 261 | +# Store the new state for the next run |
| 262 | +ci.write_state_file({ |
| 263 | + "last_updated": new_last_updated, |
| 264 | + "processed_count": len(processed_items), |
| 265 | + "last_run": datetime.now().isoformat(), |
| 266 | +}) |
| 267 | +``` |
| 268 | + |
| 269 | +### Handling Errors |
| 270 | + |
| 271 | +```python |
| 272 | +import logging |
| 273 | + |
| 274 | +from keboola.component.exceptions import UserException |
| 275 | + |
| 276 | +try: |
| 277 | + do_something() |
| 278 | +except UserException as exc: |
| 279 | + logging.exception(exc) |
| 280 | + exit(1) # Exit code 1 = user error |
| 281 | +except Exception as exc: |
| 282 | + logging.exception(exc) |
| 283 | + exit(2) # Exit code 2 = application error |
| 284 | +``` |
| 285 | + |
| 286 | +### Logging |
| 287 | +Always use the `logging` library, as it integrates with Keboola's rich logger after `CommonInterface` initialization: |
| 288 | + |
| 289 | +```python |
| 290 | +import logging |
| 291 | + |
| 292 | +from keboola.component import CommonInterface |
| 293 | + |
| 294 | +ci = CommonInterface() |
| 295 | + |
| 296 | +logging.info("Info message") |
| 297 | +logging.warning("Warning message") |
| 298 | +logging.exception(exception, extra={"additional_detail": "xxx"}) |
| 299 | +``` |
| 300 | + |
| 301 | +## Running Code from a Git Repository |
| 302 | +We have prepared a |
| 303 | +[simple example project](https://github.com/keboola/component-custom-python-example-repo-1) |
| 304 | +to help you get started with running custom Python code from a Git repository. It can also serve as a template |
| 305 | +for your future projects. |
| 306 | + |
| 307 | +Example configuration for running code from a public Git repository: |
| 308 | + |
| 309 | +```json |
| 310 | +{ |
| 311 | + "parameters": { |
| 312 | + "source": "git", |
| 313 | + "venv": "3.13", |
| 314 | + "git": { |
| 315 | + "url": "https://github.com/keboola/component-custom-python-example-repo-1.git", |
| 316 | + "branch": "main", |
| 317 | + "filename": "main.py", |
| 318 | + "auth": "none" |
| 319 | + }, |
| 320 | + "user_properties": { |
| 321 | + "debug": true |
| 322 | + } |
| 323 | + } |
| 324 | +} |
| 325 | +``` |
| 326 | + |
| 327 | +## Listing Pre-Installed Packages |
| 328 | +If you are using the shared `base` environment and want to see which packages are available, run the following code: |
| 329 | + |
| 330 | +```python |
| 331 | +import subprocess |
| 332 | + |
| 333 | +subprocess.check_call(["uv", "pip", "list"]) |
| 334 | +``` |
0 commit comments