Skip to content

04. Cloud Storage

Aaron Boxer edited this page Mar 25, 2026 · 1 revision

4. Cloud Storage

Grok supports reading JPEG 2000 files from AWS S3 and S3-compatible object storage (MinIO, DigitalOcean Spaces, Backblaze B2, Cloudflare R2, etc.), as well as Azure Blob Storage, Google Cloud Storage, Azure Data Lake Gen2, and generic HTTP/HTTPS URLs.

All AWS environment variables are compatible with standard AWS SDK conventions. Grok-specific configuration uses the GRK_ prefix.

Supported Cloud Backends

Path Prefix Fetcher Service
/vsis3/ S3Fetcher AWS S3, MinIO, R2, B2, etc.
/vsis3_streaming/ S3Fetcher AWS S3 streaming
/vsigs/ GSFetcher Google Cloud Storage
/vsiaz/ AZFetcher Azure Blob Storage
/vsiadls/ ADLSFetcher Azure Data Lake Gen2
/vsicurl/ HTTPFetcher Generic HTTP/HTTPS
https:// HTTPFetcher Direct HTTPS URL

Cloud storage paths are used directly in the -i option:

grk_decompress -i /vsis3/mybucket/image.jp2 -o output.tif
grk_decompress -i /vsiaz/mycontainer/image.jp2 -o output.tif
grk_decompress -i https://example.com/image.jp2 -o output.tif

URL Formats (S3)

Format Example
VSI path /vsis3/bucket/path/to/file.jp2
VSI streaming /vsis3_streaming/bucket/path/to/file.jp2
HTTPS URL https://s3.us-east-1.amazonaws.com/bucket/file.jp2
HTTP URL http://localhost:9000/bucket/file.jp2
Virtual-hosted URL https://bucket.s3.us-east-1.amazonaws.com/file.jp2

S3 Credential Chain

Credentials are resolved in the following order (first match wins). This matches GDAL's credential resolution order.

1. Anonymous Access

Variable Values Description
AWS_NO_SIGN_REQUEST YES / NO Skip authentication entirely (public buckets)

2. Environment Variables

Variable Description
AWS_ACCESS_KEY_ID AWS access key
AWS_SECRET_ACCESS_KEY AWS secret key
AWS_SESSION_TOKEN Temporary session token (STS, SSO, etc.)

3. Cached Temporary Credentials

Previously obtained temporary credentials (from STS, SSO, EC2, etc.) are cached in memory and reused until they expire, with a 60-second safety margin. The cache is thread-safe and shared across all S3Fetcher instances.

4. AWS Config Files

Reads ~/.aws/credentials and ~/.aws/config (or overridden paths).

Variable Default Description
GRK_AWS_CREDENTIALS_FILE ~/.aws/credentials Override credentials file path
AWS_CONFIG_FILE ~/.aws/config Override config file path
AWS_PROFILE default AWS profile to use
AWS_DEFAULT_PROFILE default Deprecated alias for AWS_PROFILE

The config file supports these advanced credential sources:

4a. Web Identity Token (role_arn + web_identity_token_file)

For EKS/Kubernetes service accounts (IRSA) configured in the AWS config file:

[profile my-profile]
role_arn = arn:aws:iam::123456789012:role/my-role
web_identity_token_file = /var/run/secrets/eks.amazonaws.com/serviceaccount/token

4b. STS Assume Role (role_arn + source_profile)

For cross-account access or role chaining:

[profile cross-account]
role_arn = arn:aws:iam::987654321098:role/target-role
source_profile = default
external_id = optional-external-id
role_session_name = optional-session-name

The source profile can itself use web identity credentials (role chaining).

4c. SSO (sso_start_url + sso_account_id + sso_role_name)

For AWS Single Sign-On / IAM Identity Center:

[profile sso-profile]
sso_start_url = https://my-org.awsapps.com/start
sso_account_id = 123456789012
sso_role_name = MyRole

[profile sso-session-profile]
sso_session = my-session
sso_account_id = 123456789012
sso_role_name = MyRole

[sso-session my-session]
sso_start_url = https://my-org.awsapps.com/start

Reads cached SSO tokens from ~/.aws/sso/cache/. Run aws sso login to refresh.

Variable Default Description
GRK_AWS_SSO_ENDPOINT portal.sso.<region>.amazonaws.com Override SSO endpoint

4d. Credential Process

For external credential providers:

[profile custom]
credential_process = /path/to/credential-provider --arg

The command must output JSON with Version, AccessKeyId, SecretAccessKey, SessionToken, and optionally Expiration fields. See: https://docs.aws.amazon.com/sdkref/latest/guide/feature-process-credentials.html

5. Web Identity Token (from environment)

For EKS pods and OIDC-federated workloads where credentials come from env vars:

Variable Description
AWS_ROLE_ARN IAM role ARN to assume
AWS_WEB_IDENTITY_TOKEN_FILE Path to OIDC token file
AWS_ROLE_SESSION_NAME Session name (default: grok-session)
GRK_AWS_WEB_IDENTITY_ENABLE YES (default) / NO — disable this method

6. ECS Container Credentials

For tasks running on Amazon ECS or AWS Fargate:

Variable Description
AWS_CONTAINER_CREDENTIALS_FULL_URI Full URL to credential endpoint
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI Relative path (uses http://169.254.170.2)
AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE Path to auth token file
AWS_CONTAINER_AUTHORIZATION_TOKEN Auth token value

7. EC2 Instance Metadata

Last resort: fetches temporary credentials from the EC2 instance metadata service. Uses IMDSv2 (PUT token request) with automatic fallback to IMDSv1.

Variable Default Description
GRK_AWS_EC2_API_ROOT_URL http://169.254.169.254 Override metadata endpoint
GRK_AWS_AUTODETECT_EC2_DISABLE NO Set to YES to skip EC2 metadata

Region Configuration

Resolved in this order:

Variable Description
AWS_REGION GDAL-compatible region setting (highest precedence)
AWS_DEFAULT_REGION Standard AWS SDK region variable
Config file region From [profile X] section in ~/.aws/config
(fallback) us-east-1

Endpoint Configuration

AWS S3

By default, requests go to s3.<region>.amazonaws.com.

S3-Compatible Storage (MinIO, R2, etc.)

Variable Example Description
AWS_S3_ENDPOINT http://localhost:9000 Custom S3 endpoint
AWS_HTTPS YES / NO Use HTTPS (default: YES)
AWS_VIRTUAL_HOSTING TRUE / FALSE Virtual-hosted style URLs (default: FALSE)

MinIO Example

export AWS_S3_ENDPOINT=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
export AWS_HTTPS=NO
export AWS_VIRTUAL_HOSTING=FALSE

grk_decompress -i /vsis3/mybucket/image.jp2 -o output.tif

STS Endpoint

Variable Default Description
AWS_STS_REGIONAL_ENDPOINTS regional regionalsts.<region>.amazonaws.com, other → sts.amazonaws.com
GRK_AWS_STS_ROOT_URL (auto) Override STS endpoint entirely

Requester Pays

Variable Values Description
AWS_REQUEST_PAYER requester Adds x-amz-request-payer: requester header to all requests

HTTP / Curl Configuration

These variables apply to all cloud storage requests.

SSL / TLS

Variable Default Description
GRK_CURL_ALLOW_INSECURE NO Disable SSL certificate verification
GRK_HTTP_UNSAFESSL NO Disable SSL verification (inherited from CurlFetcher)

Timeouts

Variable Default Description
GRK_CURL_TIMEOUT (none) Request timeout in seconds

Caching

Variable Default Description
GRK_CURL_CACHE_SIZE (none) Curl buffer size in bytes

Connection Reuse

Variable Description
GRK_CURL_NON_CACHED Colon-separated list of prefixes to disable connection reuse for (e.g. /vsis3/)

Proxy

Variable Description
GRK_CURL_PROXY Proxy URL
GRK_CURL_PROXYUSERPWD Proxy credentials (user:password)
GRK_CURL_PROXYAUTH Proxy auth type (any value enables CURLAUTH_ANY)

Retry

The CurlFetcher base class provides retry logic with configurable limits. Default: 3 retries with 1-second delay between attempts.

File Paths

Variable Default Description
GRK_AWS_ROOT_DIR ~/.aws Override AWS config root directory

Request Signing

S3 requests are signed using AWS Signature Version 4 via libcurl's built-in CURLOPT_AWS_SIGV4 support. The signing region is derived from the resolved region configuration. The x-amz-date and x-amz-security-token headers are added automatically.

Architecture

Credential Caching

Temporary credentials (from STS AssumeRole, Web Identity, SSO, ECS, EC2) are cached in a static, thread-safe CredentialCache shared across all S3Fetcher instances. Credentials are reused until 60 seconds before expiration, then automatically refreshed on the next request.

Connection Pooling

The CurlFetcher base class maintains a curl_multi handle with up to 100 concurrent connections. Tile and chunk fetch requests are batched and processed by a background worker thread using a producer/consumer pattern.

Clone this wiki locally