S3 Storage#

Both the TileDB point-cloud array and the Zarr gridded store can be hosted on any S3-compatible object storage (e.g, AWS S3, Ceph/RadosGW). The API is identical to local storage; only the constructor arguments change.

TileDB array on S3#

Pass storage_type="s3" and supply the endpoint URL and credentials as a dictionary:

from alsdb import ALSDatabase, ALSProvider

s3_kwargs = dict(
    storage_type="s3",
    uri="s3://my-bucket/als_array",
    url="https://s3.example.com",       # endpoint URL (omit for AWS)
    region="eu-central-1",
    credentials={
        "AccessKeyId": "YOUR_ACCESS_KEY",
        "SecretAccessKey": "YOUR_SECRET_KEY",
    },
)

# Ingest
db = ALSDatabase(**s3_kwargs)
db.ingest("path/to/tile.laz")

# Query
reader = ALSProvider(**s3_kwargs)
df = reader.query_bbox(308_000, 4_688_000, 310_000, 4_690_000, year=2021)

Using boto3 to load credentials#

If you manage credentials via AWS profiles or environment variables, use boto3 to retrieve them:

import boto3

session = boto3.Session(profile_name="my-profile")
creds = session.get_credentials().get_frozen_credentials()

db = ALSDatabase(
    storage_type="s3",
    uri="s3://my-bucket/als_array",
    url="https://s3.example.com",
    region="eu-central-1",
    credentials={
        "AccessKeyId": creds.access_key,
        "SecretAccessKey": creds.secret_key,
    },
)

Zarr store on S3#

ALSZarrStore accepts an s3:// URI directly. Pass storage_options with your S3 credentials:

from alsdb.storage import ALSZarrStore

store = ALSZarrStore(
    "s3://my-bucket/forest.zarr",
    storage_options={
        "key": "YOUR_ACCESS_KEY",
        "secret": "YOUR_SECRET_KEY",
        "endpoint_url": "https://s3.example.com",
    },
)

# Use exactly like a local store
from alsdb.processing.chm import compute_chm
compute_chm(provider=reader, store=store, resolution=1.0, year=2021)

TileDB S3 performance tuning#

For large-scale campaigns on S3, tune the TileDB S3 settings via the tiledb_config argument:

db = ALSDatabase(
    storage_type="s3",
    uri="s3://my-bucket/als_array",
    url="https://s3.example.com",
    region="eu-central-1",
    credentials={
        "AccessKeyId": "...",
        "SecretAccessKey": "...",
    },
    tiledb_config={
        "vfs.s3.connect_timeout_ms": "300000",
        "vfs.s3.request_timeout_ms": "600000",
        "vfs.s3.connect_max_tries": "10",
        "vfs.s3.multipart_part_size": "52428800",   # 50 MB parts
        "vfs.s3.backoff_scale": "2.0",
        "vfs.s3.backoff_max_ms": "120000",
    },
)

Recommended settings for production S3 ingest:

  • multipart_part_size: 50–100 MB for large datasets.

  • connect_max_tries: 10 retries for transient S3 errors.

  • request_timeout_ms: 600 000 ms (10 min) for slow object stores.

Security considerations#

  • Never hardcode credentials in scripts. Use environment variables or an AWS credentials file.

  • Restrict bucket policies so that only the ingest user has write access; read-only access can be wider.

  • For public datasets, use anonymous access ("aws.skip_credentials": "true").

# Anonymous read-only access (public bucket)
reader = ALSProvider(
    storage_type="s3",
    uri="s3://public-bucket/als_array",
    tiledb_config={"vfs.s3.no_sign_request": "true"},
)