Data Provider#

alsdb.ALSProvider is the interface for querying point clouds stored in the TileDB array. It supports spatial queries by bounding box and year, returning results as pandas.DataFrame or xarray.Dataset.

Key capabilities#

  • Spatial queries: retrieve all points within a bounding box.

  • Temporal filtering: restrict queries to a single survey year.

  • Attribute selection: choose which LAS attributes to return.

  • Multi-year inspection: list all years stored in the array.

  • xarray output: get results as a labelled xarray.Dataset.

Basic query example#

from alsdb import ALSProvider

reader = ALSProvider(storage_type="local", uri="my_array")

# All points in a bounding box. It returns pandas DataFrame
df = reader.query_bbox(308_000, 4_688_000, 310_000, 4_690_000)

print(df.columns.tolist())
# ['X', 'Y', 'Z', 'Intensity', 'ReturnNumber', 'Classification',
#  'R', 'G', 'B', 'HeightAboveGround', 'Year', ...]

# Restrict to a single survey year
df = reader.query_bbox(308_000, 4_688_000, 310_000, 4_690_000, year=2021)

# Which years are available?
print(reader.available_years())   # e.g. [2019, 2021, 2023]

Selecting attributes#

By default all stored attributes are returned. Pass attrs to restrict the output:

df = reader.query_bbox(
    308_000, 4_688_000, 310_000, 4_690_000,
    year=2021,
    attrs=["Z", "Classification", "HeightAboveGround"],
)

To see what attributes are available:

print(reader.get_available_attributes())

xarray output#

ds = reader.to_xarray(308_000, 4_688_000, 310_000, 4_690_000, year=2021)

print(ds)
# <xarray.Dataset>
# Dimensions: (points: 1_482_310)
# Coordinates:
#   * X     (points) float64
#   * Y     (points) float64
# Data variables:
#     Z                (points) float64
#     Classification   (points) uint8
#     ReturnNumber     (points) uint8
#     ...

S3 provider#

The API is identical for S3-backed arrays; only the constructor arguments change:

reader = ALSProvider(
    storage_type="s3",
    uri="s3://owner.bucket/als_array",
    url="https://s3.example.com",
    region="eu-central-1",
    credentials={
        "AccessKeyId": "...",
        "SecretAccessKey": "...",
    },
)

df = reader.query_bbox(308_000, 4_688_000, 310_000, 4_690_000, year=2021)

See S3 Storage for a full S3 setup walkthrough.

Output columns#

The following columns are available depending on what was stored during ingestion:

Column

Type

Description

X

float64

UTM easting (m)

Y

float64

UTM northing (m)

Z

float64

Ellipsoidal height (m)

HeightAboveGround

float32

Height above the Delaunay ground TIN (m); available if ingested with HAG filter

Intensity

uint16

Return pulse intensity

ReturnNumber

uint8

Return number (1 = first, 2 = second, …)

NumberOfReturns

uint8

Total number of returns for this pulse

Classification

uint8

LAS classification code (1=unclassified, 2=ground, 3=low veg, 4=medium veg, 5=high veg, …)

R, G, B

uint16

RGB colour values (0–65535)

Year

int16

Survey acquisition year

Performance notes#

  • Queries scan only the TileDB fragments that overlap the requested X/Y/Year range. Fragment consolidation (see Data Ingestion) significantly reduces per-query overhead for large arrays.

  • For very large bounding boxes, consider using Processing Pipeline (run_tiled) rather than a single query_bbox call.

  • On S3, query performance depends on network bandwidth and TileDB S3 timeout settings.