Data Provider#
alsdb.ALSProvider is the interface for querying point clouds stored in the TileDB array. It supports spatial queries by bounding box and year, returning results as pandas.DataFrame or xarray.Dataset.
Key capabilities#
Spatial queries: retrieve all points within a bounding box.
Temporal filtering: restrict queries to a single survey year.
Attribute selection: choose which LAS attributes to return.
Multi-year inspection: list all years stored in the array.
xarray output: get results as a labelled
xarray.Dataset.
Basic query example#
from alsdb import ALSProvider
reader = ALSProvider(storage_type="local", uri="my_array")
# All points in a bounding box. It returns pandas DataFrame
df = reader.query_bbox(308_000, 4_688_000, 310_000, 4_690_000)
print(df.columns.tolist())
# ['X', 'Y', 'Z', 'Intensity', 'ReturnNumber', 'Classification',
# 'R', 'G', 'B', 'HeightAboveGround', 'Year', ...]
# Restrict to a single survey year
df = reader.query_bbox(308_000, 4_688_000, 310_000, 4_690_000, year=2021)
# Which years are available?
print(reader.available_years()) # e.g. [2019, 2021, 2023]
Selecting attributes#
By default all stored attributes are returned. Pass attrs to restrict the output:
df = reader.query_bbox(
308_000, 4_688_000, 310_000, 4_690_000,
year=2021,
attrs=["Z", "Classification", "HeightAboveGround"],
)
To see what attributes are available:
print(reader.get_available_attributes())
xarray output#
ds = reader.to_xarray(308_000, 4_688_000, 310_000, 4_690_000, year=2021)
print(ds)
# <xarray.Dataset>
# Dimensions: (points: 1_482_310)
# Coordinates:
# * X (points) float64
# * Y (points) float64
# Data variables:
# Z (points) float64
# Classification (points) uint8
# ReturnNumber (points) uint8
# ...
S3 provider#
The API is identical for S3-backed arrays; only the constructor arguments change:
reader = ALSProvider(
storage_type="s3",
uri="s3://owner.bucket/als_array",
url="https://s3.example.com",
region="eu-central-1",
credentials={
"AccessKeyId": "...",
"SecretAccessKey": "...",
},
)
df = reader.query_bbox(308_000, 4_688_000, 310_000, 4_690_000, year=2021)
See S3 Storage for a full S3 setup walkthrough.
Output columns#
The following columns are available depending on what was stored during ingestion:
Column |
Type |
Description |
|---|---|---|
|
float64 |
UTM easting (m) |
|
float64 |
UTM northing (m) |
|
float64 |
Ellipsoidal height (m) |
|
float32 |
Height above the Delaunay ground TIN (m); available if ingested with HAG filter |
|
uint16 |
Return pulse intensity |
|
uint8 |
Return number (1 = first, 2 = second, …) |
|
uint8 |
Total number of returns for this pulse |
|
uint8 |
LAS classification code (1=unclassified, 2=ground, 3=low veg, 4=medium veg, 5=high veg, …) |
|
uint16 |
RGB colour values (0–65535) |
|
int16 |
Survey acquisition year |
Performance notes#
Queries scan only the TileDB fragments that overlap the requested
X/Y/Yearrange. Fragment consolidation (see Data Ingestion) significantly reduces per-query overhead for large arrays.For very large bounding boxes, consider using Processing Pipeline (
run_tiled) rather than a singlequery_bboxcall.On S3, query performance depends on network bandwidth and TileDB S3 timeout settings.