feat(arrow): opt-in coalesce of heterogeneous temporal lists to Timestamp(ns)

KDB returns mixed-typed columns (most commonly: temporal nulls of
varying precision -- 0Np, 0Nz, 0Nd -- interleaved) as q general lists.
The faithful Arrow projection emits a `Union` of the per-element
DataTypes. Polars and most DataFrame consumers reject `Union` outright,
making such columns unusable downstream without a re-roundtrip dance.

Add `HeterogeneousListMode::CoalesceTemporals` (off by default) on
`ProjectionOptions`. When set, `project_heterogeneous_list` checks
whether every arm is temporal (Timestamp, Date32, Date64, Time32,
Time64, Duration) and, if so, casts each child to
`Timestamp(Nanosecond, None)` via `arrow_cast::cast`, concatenates, and
emits a flat array. Any non-temporal arm or cast error falls back to
the existing Union path, so the flag is safe to enable globally.

Plumbed through the Python `DecodeOptions` API as
`with_coalesce_temporals(bool)` with matching getter and pyi stub. The
default stays `False`; users opt in when they know the consumer
(Polars) can't handle Union and accept the lossy precision promotion.

Tests cover (a) default-Union, (b) all-temporal coalesce, and
(c) non-temporal fallback to Union.
This commit is contained in:
Cam Zalewski 2026-05-20 14:42:00 +01:00
parent aa2c0a2ec7
commit a1a621ddfd
8 changed files with 186 additions and 0 deletions

View file

@ -3,6 +3,7 @@ use std::sync::Arc;
use pyo3::prelude::*;
use pyo3::types::PyAny;
use pyo3::types::PyBytes;
use qroissant_arrow::HeterogeneousListMode;
use qroissant_arrow::ListProjection;
use qroissant_arrow::ProjectionOptions;
use qroissant_arrow::StringProjection;
@ -56,6 +57,11 @@ pub fn decode_options_to_proj_opts(opts: Option<&DecodeOptions>) -> Arc<Projecti
crate::types::UnionMode::Dense => qroissant_arrow::UnionMode::Dense,
crate::types::UnionMode::Sparse => qroissant_arrow::UnionMode::Sparse,
},
heterogeneous_list_mode: if opts.coalesce_temporals_value() {
HeterogeneousListMode::CoalesceTemporals
} else {
HeterogeneousListMode::Union
},
treat_infinity_as_null: opts.treat_infinity_as_null(),
parallel: opts.parallel_value(),
assume_symbol_utf8: opts.assume_symbol_utf8_value(),