Commit graph

6 commits

Author SHA1 Message Date
Cam
de214336a9 next try 2026-05-20 13:57:06 -04:00
a1ec8ba292 bug fixes 2026-05-20 18:29:19 +01:00
a1a621ddfd feat(arrow): opt-in coalesce of heterogeneous temporal lists to Timestamp(ns)
KDB returns mixed-typed columns (most commonly: temporal nulls of
varying precision -- 0Np, 0Nz, 0Nd -- interleaved) as q general lists.
The faithful Arrow projection emits a `Union` of the per-element
DataTypes. Polars and most DataFrame consumers reject `Union` outright,
making such columns unusable downstream without a re-roundtrip dance.

Add `HeterogeneousListMode::CoalesceTemporals` (off by default) on
`ProjectionOptions`. When set, `project_heterogeneous_list` checks
whether every arm is temporal (Timestamp, Date32, Date64, Time32,
Time64, Duration) and, if so, casts each child to
`Timestamp(Nanosecond, None)` via `arrow_cast::cast`, concatenates, and
emits a flat array. Any non-temporal arm or cast error falls back to
the existing Union path, so the flag is safe to enable globally.

Plumbed through the Python `DecodeOptions` API as
`with_coalesce_temporals(bool)` with matching getter and pyi stub. The
default stays `False`; users opt in when they know the consumer
(Polars) can't handle Union and accept the lossy precision promotion.

Tests cover (a) default-Union, (b) all-temporal coalesce, and
(c) non-temporal fallback to Union.
2026-05-20 14:42:00 +01:00
aa2c0a2ec7 fix(kernels): drop removed std::simd::Select import
`std::simd::Select` no longer exists in current nightly's portable_simd
API; methods now hang off `Mask` inherently. The unused import was
preventing the crate (and everything depending on it -- qroissant-arrow,
qroissant-python) from building on a current nightly toolchain.
2026-05-20 14:41:37 +01:00
f24af467ec fix: align typed column buffers to T in decode paths
cast_slice::<u8, T> panics with TargetAlignmentGreaterAndInputNotAligned
on KDB IPC payloads where a variable-length column leaves a numeric
column at a misaligned wire offset. The sync decode path's alignment
fallback used Bytes::copy_from_slice (Vec<u8> layout, align=1), which
only happens to work because most allocators over-align byte blocks --
not guaranteed by Rust's allocator API. The async pipelined path went
through read_bytes(len * size) directly, with no alignment branch at
all, and panicked in arrow projection's as_*_slice on Windows release
builds under AsyncPool.query.

Both paths now back typed columns with Vec<T> (Layout::array::<T>
guarantees align_of::<T>()), exposed as bytes::Bytes via a new
AlignedTBuf<T> AsRef<[u8]> owner passed to Bytes::from_owner. Sync
fallback uses the same wrapper. Pipelined typed reads route through
a new read_typed_bytes::<T> helper that swaps in for every typed
Primitive arm in decode_vector_async.

Regression test in pipelined::tests constructs a table with an odd-
length symbol column followed by Long, exercising the previously
panicking path.
2026-05-20 14:13:41 +01:00
53ac90fe84 Vendor qroissant 0.3.0 baseline 2026-05-20 14:11:30 +01:00