feat(arrow): opt-in coalesce of heterogeneous temporal lists to Timestamp(ns)
KDB returns mixed-typed columns (most commonly: temporal nulls of varying precision -- 0Np, 0Nz, 0Nd -- interleaved) as q general lists. The faithful Arrow projection emits a `Union` of the per-element DataTypes. Polars and most DataFrame consumers reject `Union` outright, making such columns unusable downstream without a re-roundtrip dance. Add `HeterogeneousListMode::CoalesceTemporals` (off by default) on `ProjectionOptions`. When set, `project_heterogeneous_list` checks whether every arm is temporal (Timestamp, Date32, Date64, Time32, Time64, Duration) and, if so, casts each child to `Timestamp(Nanosecond, None)` via `arrow_cast::cast`, concatenates, and emits a flat array. Any non-temporal arm or cast error falls back to the existing Union path, so the flag is safe to enable globally. Plumbed through the Python `DecodeOptions` API as `with_coalesce_temporals(bool)` with matching getter and pyi stub. The default stays `False`; users opt in when they know the consumer (Polars) can't handle Union and accept the lossy precision promotion. Tests cover (a) default-Union, (b) all-temporal coalesce, and (c) non-temporal fallback to Union.
This commit is contained in:
parent
aa2c0a2ec7
commit
a1a621ddfd
8 changed files with 186 additions and 0 deletions
|
|
@ -260,6 +260,14 @@ class DecodeOptions:
|
|||
def treat_infinity_as_null(self) -> bool:
|
||||
"""Whether ±∞ sentinels are mapped to ``None`` in Arrow arrays."""
|
||||
...
|
||||
@property
|
||||
def coalesce_temporals(self) -> bool:
|
||||
"""Whether heterogeneous all-temporal lists are flattened to ``Timestamp(ns)``
|
||||
instead of being emitted as Arrow ``Union``. Use this when the
|
||||
downstream consumer (e.g. Polars) cannot ingest ``Union`` types.
|
||||
Lossy: mixed precisions are promoted to nanoseconds.
|
||||
"""
|
||||
...
|
||||
|
||||
|
||||
class DecodeOptionsBuilder:
|
||||
|
|
@ -303,6 +311,15 @@ class DecodeOptionsBuilder:
|
|||
def with_treat_infinity_as_null(self, value: bool, /) -> DecodeOptionsBuilder:
|
||||
"""Set whether ±∞ sentinels are mapped to ``None`` in Arrow arrays."""
|
||||
...
|
||||
def with_coalesce_temporals(self, value: bool, /) -> DecodeOptionsBuilder:
|
||||
"""Set whether heterogeneous all-temporal lists are flattened to
|
||||
``Timestamp(ns)`` instead of being emitted as Arrow ``Union``.
|
||||
|
||||
Set to ``True`` for consumers (e.g. Polars) that reject Arrow union
|
||||
types. Lossy: mixed precisions are promoted to nanoseconds. Default
|
||||
``False`` (faithful ``Union`` representation).
|
||||
"""
|
||||
...
|
||||
def build(self) -> DecodeOptions:
|
||||
"""Finalize the builder into an immutable :class:`DecodeOptions` instance."""
|
||||
...
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue