Frequently Asked Questions (FAQ)#

How should I cite alsDB?#

Please use the following citation when referencing alsDB in your work:

Besnard, S. alsDB [Computer software]. simonbesnard1/alsdb

Why can’t I install alsDB with pip alone?#

pdal and python-pdal are only available through conda-forge and cannot be installed via pip. Use pixi which resolves both conda-forge and PyPI dependencies automatically:

git clone https://github.com/simonbesnard1/alsdb.git
cd alsdb && pixi install

See Installation for full instructions.

Why does ingestion say “already ingested, skipping”?#

The manifest records every file that has been successfully ingested. Re-running ingest() or ingest_many() on the same files is safe (i.e., they are skipped by default). To force re-ingestion, pass overwrite=True:

db.ingest("tile.laz", overwrite=True)
db.ingest_many(paths, overwrite=True)

What datasets does alsDB support?#

alsDB is dataset-agnostic. CRS, bounding box, and acquisition year are read from the LAZ/LAS header automatically via PDAL. Any dataset that follows the LAS specification works, including:

Spanish PNOA (Plan Nacional de Ortofotografía Aérea)
German ATKIS DGM
French RGE ALTI
UK Environment Agency National LiDAR Programme
US 3DEP (USGS 3D Elevation Program)
Any custom or research ALS campaign

For PNOA tiles, the tile name parser (PNOATileName) extracts year, CRS, and bounding box from the filename as a fallback if the LAZ header is incomplete.

What happens to cells with no data?#

Cells in the Zarr store that have no contributing points remain NaN. This happens for:

Sub-tiles outside the flight swath.
Cells where all points were filtered out by the classification filter.
Cells in raster products where insufficient points exist to compute a statistic.

NaN cells are preserved faithfully through all processing steps and are visible in to_dataset() output.

How do I process multiple survey years?#

Run the same processing function for each year. The overwrite=False default ensures existing years are never recomputed:

for year in [2017, 2021, 2023]:
    compute_chm(provider=reader, store=store, resolution=1.0,
                year=year, tile_size=500.0, n_workers=4)

Results are stored in the time axis of the Zarr arrays and can be accessed with .sel(time=year).

My query returns empty results, what is wrong?#

Check the following:

Year: call reader.available_years() to confirm the year is stored.
Bounding box: the coordinates must be in the same CRS as the array (typically UTM). Passing geographic coordinates (lon/lat) to a UTM array returns no data.
Fragment consolidation: if you just ingested data, try db.consolidate() first.
Domain bounds: data ingested outside the TileDB domain bounds is silently clipped. Check db.stored_crs() and verify the coordinates.

Why is my CHM noisy at tile edges?#

This is a HAG artefact. The Delaunay TIN built from ground points becomes unreliable near tile edges where only one side of the boundary has data. Increase tile_buffer from 50 m to 100 m:

compute_chm(provider=reader, store=store, resolution=1.0,
            year=2021, tile_size=500.0, tile_buffer=100.0)

How do I use a scikit-learn biomass model?#

Use alsdb.processing.biomass.wrap_sklearn_model to wrap any sklearn-compatible estimator. The features must be in the same order as the training data. The default feature set is all 16 metrics produced by compute_metrics() (_METRIC_NAMES): h50, h75, h95, hmax, hmean, cc, density, fhd, vci, crr, pv_0_2, pv_2_5, pv_5_10, pv_10_20, pv_20_40, pv_above40. Pass an explicit features list when the model was trained on a subset:

from alsdb.processing.biomass import compute_biomass, wrap_sklearn_model
from sklearn.ensemble import GradientBoostingRegressor

model = GradientBoostingRegressor()
model.fit(X_train, y_agb)   # X columns must match _METRIC_NAMES order

compute_biomass(
    provider=reader, store=store, resolution=10.0, year=2021,
    model_fn=wrap_sklearn_model(model),
)

# If the model was trained on a subset, declare the feature order explicitly:
model_subset = GradientBoostingRegressor()
model_subset.fit(X_train[["h95", "cc", "density"]], y_agb)
compute_biomass(
    provider=reader, store=store, resolution=10.0, year=2021,
    model_fn=wrap_sklearn_model(model_subset, features=["h95", "cc", "density"]),
)

How do I detect forest change between two surveys?#

Use alsdb.processing.change.compute_change after computing the same variable for both years. Three derived products are written: absolute change, relative change (%), and a gain/loss flag:

from alsdb.processing.change import compute_change

# CHM must already be present for both years in the store
compute_change(
    store, "chm",
    year_from=2017, year_to=2021,
    resolution=1.0,
    min_delta=0.5,    # suppress sub-0.5 m noise in the flag layer
    pct_min_abs=0.5,  # avoid extreme % changes where base value < 0.5 m
)

ds = store.to_dataset(resolution=1.0)
flag = ds["chm_change_flag"].sel(time=2021)   # +1 gain, −1 loss, 0 stable

How do I calibrate the Næsset biomass model?#

Use alsdb.processing.biomass.calibrate_naesset with co-located field inventory plots. At least 20 valid plots are required; fewer than 50 emits a quality warning:

from alsdb.processing.biomass import calibrate_naesset, naesset_model

# h95_plots, cc_plots, agb_plots — 1-D arrays, one value per field plot
(a, b, c), pcov = calibrate_naesset(
    h95_plots, cc_plots, agb_plots, return_cov=True
)
import numpy as np
print(f"a={a:.3f} ± {np.sqrt(pcov[0,0]):.3f}")

compute_biomass(
    provider=reader, store=store, resolution=10.0, year=2021,
    model_fn=lambda m: naesset_model(m, a=a, b=b, c=c),
)

Can I store the Zarr output on S3?#

Yes. Pass an s3:// URI and storage_options with your credentials:

store = ALSZarrStore(
    "s3://my-bucket/forest.zarr",
    storage_options={"key": "...", "secret": "...", "endpoint_url": "..."},
)

See S3 Storage for a complete S3 setup guide.

How do I contribute to alsDB?#

Contributions are welcome! See Contributing to alsDB for guidelines on submitting bug reports, feature requests, and pull requests.