-
Notifications
You must be signed in to change notification settings - Fork 53
04. Cloud Storage
Grok supports reading JPEG 2000 files from AWS S3 and S3-compatible object storage (MinIO, DigitalOcean Spaces, Backblaze B2, Cloudflare R2, etc.), as well as Azure Blob Storage, Google Cloud Storage, Azure Data Lake Gen2, and generic HTTP/HTTPS URLs.
All AWS environment variables are compatible with standard AWS SDK conventions.
Grok-specific configuration uses the GRK_ prefix.
| Path Prefix | Fetcher | Service |
|---|---|---|
/vsis3/ |
S3Fetcher | AWS S3, MinIO, R2, B2, etc. |
/vsis3_streaming/ |
S3Fetcher | AWS S3 streaming |
/vsigs/ |
GSFetcher | Google Cloud Storage |
/vsiaz/ |
AZFetcher | Azure Blob Storage |
/vsiadls/ |
ADLSFetcher | Azure Data Lake Gen2 |
/vsicurl/ |
HTTPFetcher | Generic HTTP/HTTPS |
https:// |
HTTPFetcher | Direct HTTPS URL |
Cloud storage paths are used directly in the -i option:
grk_decompress -i /vsis3/mybucket/image.jp2 -o output.tif
grk_decompress -i /vsiaz/mycontainer/image.jp2 -o output.tif
grk_decompress -i https://example.com/image.jp2 -o output.tif| Format | Example |
|---|---|
| VSI path | /vsis3/bucket/path/to/file.jp2 |
| VSI streaming | /vsis3_streaming/bucket/path/to/file.jp2 |
| HTTPS URL | https://s3.us-east-1.amazonaws.com/bucket/file.jp2 |
| HTTP URL | http://localhost:9000/bucket/file.jp2 |
| Virtual-hosted URL | https://bucket.s3.us-east-1.amazonaws.com/file.jp2 |
Credentials are resolved in the following order (first match wins). This matches GDAL's credential resolution order.
| Variable | Values | Description |
|---|---|---|
AWS_NO_SIGN_REQUEST |
YES / NO
|
Skip authentication entirely (public buckets) |
| Variable | Description |
|---|---|
AWS_ACCESS_KEY_ID |
AWS access key |
AWS_SECRET_ACCESS_KEY |
AWS secret key |
AWS_SESSION_TOKEN |
Temporary session token (STS, SSO, etc.) |
Previously obtained temporary credentials (from STS, SSO, EC2, etc.) are cached in memory and reused until they expire, with a 60-second safety margin. The cache is thread-safe and shared across all S3Fetcher instances.
Reads ~/.aws/credentials and ~/.aws/config (or overridden paths).
| Variable | Default | Description |
|---|---|---|
GRK_AWS_CREDENTIALS_FILE |
~/.aws/credentials |
Override credentials file path |
AWS_CONFIG_FILE |
~/.aws/config |
Override config file path |
AWS_PROFILE |
default |
AWS profile to use |
AWS_DEFAULT_PROFILE |
default |
Deprecated alias for AWS_PROFILE
|
The config file supports these advanced credential sources:
For EKS/Kubernetes service accounts (IRSA) configured in the AWS config file:
[profile my-profile]
role_arn = arn:aws:iam::123456789012:role/my-role
web_identity_token_file = /var/run/secrets/eks.amazonaws.com/serviceaccount/tokenFor cross-account access or role chaining:
[profile cross-account]
role_arn = arn:aws:iam::987654321098:role/target-role
source_profile = default
external_id = optional-external-id
role_session_name = optional-session-nameThe source profile can itself use web identity credentials (role chaining).
For AWS Single Sign-On / IAM Identity Center:
[profile sso-profile]
sso_start_url = https://my-org.awsapps.com/start
sso_account_id = 123456789012
sso_role_name = MyRole
[profile sso-session-profile]
sso_session = my-session
sso_account_id = 123456789012
sso_role_name = MyRole
[sso-session my-session]
sso_start_url = https://my-org.awsapps.com/startReads cached SSO tokens from ~/.aws/sso/cache/. Run aws sso login to refresh.
| Variable | Default | Description |
|---|---|---|
GRK_AWS_SSO_ENDPOINT |
portal.sso.<region>.amazonaws.com |
Override SSO endpoint |
For external credential providers:
[profile custom]
credential_process = /path/to/credential-provider --argThe command must output JSON with Version, AccessKeyId, SecretAccessKey,
SessionToken, and optionally Expiration fields.
See: https://docs.aws.amazon.com/sdkref/latest/guide/feature-process-credentials.html
For EKS pods and OIDC-federated workloads where credentials come from env vars:
| Variable | Description |
|---|---|
AWS_ROLE_ARN |
IAM role ARN to assume |
AWS_WEB_IDENTITY_TOKEN_FILE |
Path to OIDC token file |
AWS_ROLE_SESSION_NAME |
Session name (default: grok-session) |
GRK_AWS_WEB_IDENTITY_ENABLE |
YES (default) / NO — disable this method |
For tasks running on Amazon ECS or AWS Fargate:
| Variable | Description |
|---|---|
AWS_CONTAINER_CREDENTIALS_FULL_URI |
Full URL to credential endpoint |
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI |
Relative path (uses http://169.254.170.2) |
AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE |
Path to auth token file |
AWS_CONTAINER_AUTHORIZATION_TOKEN |
Auth token value |
Last resort: fetches temporary credentials from the EC2 instance metadata service. Uses IMDSv2 (PUT token request) with automatic fallback to IMDSv1.
| Variable | Default | Description |
|---|---|---|
GRK_AWS_EC2_API_ROOT_URL |
http://169.254.169.254 |
Override metadata endpoint |
GRK_AWS_AUTODETECT_EC2_DISABLE |
NO |
Set to YES to skip EC2 metadata |
Resolved in this order:
| Variable | Description |
|---|---|
AWS_REGION |
GDAL-compatible region setting (highest precedence) |
AWS_DEFAULT_REGION |
Standard AWS SDK region variable |
Config file region
|
From [profile X] section in ~/.aws/config
|
| (fallback) | us-east-1 |
By default, requests go to s3.<region>.amazonaws.com.
| Variable | Example | Description |
|---|---|---|
AWS_S3_ENDPOINT |
http://localhost:9000 |
Custom S3 endpoint |
AWS_HTTPS |
YES / NO
|
Use HTTPS (default: YES) |
AWS_VIRTUAL_HOSTING |
TRUE / FALSE
|
Virtual-hosted style URLs (default: FALSE) |
export AWS_S3_ENDPOINT=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
export AWS_HTTPS=NO
export AWS_VIRTUAL_HOSTING=FALSE
grk_decompress -i /vsis3/mybucket/image.jp2 -o output.tif| Variable | Default | Description |
|---|---|---|
AWS_STS_REGIONAL_ENDPOINTS |
regional |
regional → sts.<region>.amazonaws.com, other → sts.amazonaws.com
|
GRK_AWS_STS_ROOT_URL |
(auto) | Override STS endpoint entirely |
| Variable | Values | Description |
|---|---|---|
AWS_REQUEST_PAYER |
requester |
Adds x-amz-request-payer: requester header to all requests |
These variables apply to all cloud storage requests.
| Variable | Default | Description |
|---|---|---|
GRK_CURL_ALLOW_INSECURE |
NO |
Disable SSL certificate verification |
GRK_HTTP_UNSAFESSL |
NO |
Disable SSL verification (inherited from CurlFetcher) |
| Variable | Default | Description |
|---|---|---|
GRK_CURL_TIMEOUT |
(none) | Request timeout in seconds |
| Variable | Default | Description |
|---|---|---|
GRK_CURL_CACHE_SIZE |
(none) | Curl buffer size in bytes |
| Variable | Description |
|---|---|
GRK_CURL_NON_CACHED |
Colon-separated list of prefixes to disable connection reuse for (e.g. /vsis3/) |
| Variable | Description |
|---|---|
GRK_CURL_PROXY |
Proxy URL |
GRK_CURL_PROXYUSERPWD |
Proxy credentials (user:password) |
GRK_CURL_PROXYAUTH |
Proxy auth type (any value enables CURLAUTH_ANY) |
The CurlFetcher base class provides retry logic with configurable limits. Default: 3 retries with 1-second delay between attempts.
| Variable | Default | Description |
|---|---|---|
GRK_AWS_ROOT_DIR |
~/.aws |
Override AWS config root directory |
S3 requests are signed using AWS Signature Version 4 via libcurl's built-in
CURLOPT_AWS_SIGV4 support. The signing region is derived from the resolved
region configuration. The x-amz-date and x-amz-security-token headers
are added automatically.
Temporary credentials (from STS AssumeRole, Web Identity, SSO, ECS, EC2) are
cached in a static, thread-safe CredentialCache shared across all S3Fetcher
instances. Credentials are reused until 60 seconds before expiration, then
automatically refreshed on the next request.
The CurlFetcher base class maintains a curl_multi handle with up to 100
concurrent connections. Tile and chunk fetch requests are batched and processed
by a background worker thread using a producer/consumer pattern.