KDB returns mixed-typed columns (most commonly: temporal nulls of varying precision -- 0Np, 0Nz, 0Nd -- interleaved) as q general lists. The faithful Arrow projection emits a `Union` of the per-element DataTypes. Polars and most DataFrame consumers reject `Union` outright, making such columns unusable downstream without a re-roundtrip dance. Add `HeterogeneousListMode::CoalesceTemporals` (off by default) on `ProjectionOptions`. When set, `project_heterogeneous_list` checks whether every arm is temporal (Timestamp, Date32, Date64, Time32, Time64, Duration) and, if so, casts each child to `Timestamp(Nanosecond, None)` via `arrow_cast::cast`, concatenates, and emits a flat array. Any non-temporal arm or cast error falls back to the existing Union path, so the flag is safe to enable globally. Plumbed through the Python `DecodeOptions` API as `with_coalesce_temporals(bool)` with matching getter and pyi stub. The default stays `False`; users opt in when they know the consumer (Polars) can't handle Union and accept the lossy precision promotion. Tests cover (a) default-Union, (b) all-temporal coalesce, and (c) non-temporal fallback to Union.
25 lines
559 B
TOML
25 lines
559 B
TOML
[package]
|
|
name = "qroissant-arrow"
|
|
version.workspace = true
|
|
edition.workspace = true
|
|
license.workspace = true
|
|
publish = false
|
|
|
|
[lib]
|
|
name = "qroissant_arrow"
|
|
path = "src/lib.rs"
|
|
|
|
[dependencies]
|
|
arrow-array = "58.0.0"
|
|
arrow-buffer = "58.0.0"
|
|
arrow-cast = "58.0.0"
|
|
arrow-schema = "58.0.0"
|
|
arrow-select = "58.0.0"
|
|
bytemuck = { version = "1", features = ["derive", "extern_crate_alloc"] }
|
|
bytes = "1.11.1"
|
|
chrono = "0.4.44"
|
|
qroissant-core = { path = "../qroissant-core" }
|
|
qroissant-kernels = { path = "../qroissant-kernels" }
|
|
rayon = "1.10"
|
|
thiserror = "2.0.18"
|
|
|