alsdb.processing.waveform.simulate_batch#

alsdb.processing.waveform.simulate_batch(provider: TileDBProvider, shots: DataFrame, x_col: str = 'center_x', y_col: str = 'center_y', beam_col: str = 'beam', year: int | None = None, n_workers: int = 1, output_path: str | None = None, rng: Generator | None = None, **kwargs) DataFrame[source]#

Simulate waveforms for a batch of shot locations in parallel.

Parameters:
  • provider (TileDBProvider)

  • shots (pd.DataFrame) – Must contain x_col and y_col columns (UTM metres). If a beam_col column is present (e.g. "BEAM0000"), the per-beam mean TX pulse is used for each shot automatically. All other columns are preserved in the output.

  • x_col (str) – Column names for easting and northing.

  • y_col (str) – Column names for easting and northing.

  • beam_col (str) – Column name for the GEDI beam identifier. Ignored if not present in shots. Per-row beam overrides any beam_id passed via **kwargs.

  • year (int, optional) – Survey year filter applied to all shots.

  • n_workers (int) – Thread pool size (TileDB reads are thread-safe).

  • output_path (str, optional) – If provided, the result DataFrame is written to this path as a Parquet file (pyarrow engine). The file is created or overwritten. The DataFrame is still returned as usual.

  • rng (np.random.Generator, optional) – Random number generator forwarded to each simulate_waveform call for reproducible noise. Pass np.random.default_rng(seed). Note: when n_workers > 1, shots are processed concurrently and the per-shot draw order is non-deterministic even with a fixed seed.

  • **kwargs – Forwarded to simulate_waveform.

Returns:

Original columns plus z_ground, home, cover, n_points, rh0rh100. Shots with insufficient ALS coverage have NaN metric values.

Return type:

pd.DataFrame