No description
Find a file
CamZalewski a1a621ddfd feat(arrow): opt-in coalesce of heterogeneous temporal lists to Timestamp(ns)
KDB returns mixed-typed columns (most commonly: temporal nulls of
varying precision -- 0Np, 0Nz, 0Nd -- interleaved) as q general lists.
The faithful Arrow projection emits a `Union` of the per-element
DataTypes. Polars and most DataFrame consumers reject `Union` outright,
making such columns unusable downstream without a re-roundtrip dance.

Add `HeterogeneousListMode::CoalesceTemporals` (off by default) on
`ProjectionOptions`. When set, `project_heterogeneous_list` checks
whether every arm is temporal (Timestamp, Date32, Date64, Time32,
Time64, Duration) and, if so, casts each child to
`Timestamp(Nanosecond, None)` via `arrow_cast::cast`, concatenates, and
emits a flat array. Any non-temporal arm or cast error falls back to
the existing Union path, so the flag is safe to enable globally.

Plumbed through the Python `DecodeOptions` API as
`with_coalesce_temporals(bool)` with matching getter and pyi stub. The
default stays `False`; users opt in when they know the consumer
(Polars) can't handle Union and accept the lossy precision promotion.

Tests cover (a) default-Union, (b) all-temporal coalesce, and
(c) non-temporal fallback to Union.
2026-05-20 14:42:00 +01:00
crates feat(arrow): opt-in coalesce of heterogeneous temporal lists to Timestamp(ns) 2026-05-20 14:42:00 +01:00
python/qroissant feat(arrow): opt-in coalesce of heterogeneous temporal lists to Timestamp(ns) 2026-05-20 14:42:00 +01:00
.gitignore Vendor qroissant 0.3.0 baseline 2026-05-20 14:11:30 +01:00
Cargo.lock feat(arrow): opt-in coalesce of heterogeneous temporal lists to Timestamp(ns) 2026-05-20 14:42:00 +01:00
Cargo.toml Vendor qroissant 0.3.0 baseline 2026-05-20 14:11:30 +01:00
PKG-INFO Vendor qroissant 0.3.0 baseline 2026-05-20 14:11:30 +01:00
pyproject.toml Vendor qroissant 0.3.0 baseline 2026-05-20 14:11:30 +01:00
README.md Vendor qroissant 0.3.0 baseline 2026-05-20 14:11:30 +01:00

qroissant

qroissant is a minimal q/kdb+ IPC client library with first-class support for the Apache Arrow ecosystem.

  • Lightweight — qroissant is a minimal library weighing in at less than 4 MiB with no required dependencies.
  • Fast — qroissant is written in Rust, a safe and high-performance systems programming language. Moreover, qroissant uses your system resources to the best extent possible by leveraging zero-copy, multithreading, and other vectorization techniques such as SIMD.
  • Modular — qroissant relies heavily on the Apache Arrow PyCapsule Interface for communicating with other libraries from the Apache Arrow ecosystem with zero-copy. This includes pyarrow, polars, duckdb, pandas, datafusion, and more.
  • Type hints — qroissant provides type annotations for all of its functionality.

Installation

pip install qroissant

Requires Python 3.10+. Wheels are available for Linux (x86_64, aarch64), macOS (universal2), and Windows (x86_64).


Quick start

Connect and query

import qroissant as q

endpoint = q.Endpoint.tcp("localhost", 5000)

with q.Connection(endpoint) as conn:
    result = conn.query("select from trade where date = .z.d")
    print(result)  # Table

To Arrow / Polars / PyArrow

Decoded values implement the Arrow PyCapsule protocol — pass them straight to any Arrow-aware library:

import polars as pl
import pyarrow as pa

with q.Connection(endpoint) as conn:
    table = conn.query("select from trade")

# zero-copy — no intermediate Python objects
df = pl.from_arrow(table)
pa_table = pa.RecordBatch.from_batches([pa.record_batch(table)])

Async

import asyncio
import qroissant as q

async def main():
    endpoint = q.Endpoint.tcp("localhost", 5000)
    async with q.AsyncConnection(endpoint) as conn:
        result = await conn.query("1 + 1")
        print(result)  # Atom → 2

asyncio.run(main())

Connection pool

pool_opts = q.PoolOptions(
    max_size=10,
    min_idle=2,
    checkout_timeout_ms=5_000,
    test_on_checkout=True,
)

with q.Pool(endpoint, pool=pool_opts) as pool:
    pool.prewarm()                        # open idle connections eagerly
    result = pool.query("count trade")    # checked out and returned automatically
    print(pool.metrics())                 # PoolMetrics(connections=2, idle=2, …)

Streaming raw response

For large results you can stream the raw IPC bytes before decoding:

with q.Connection(endpoint) as conn:
    with conn.query("select from trade", raw=True) as resp:
        print(resp.header)        # MessageHeader(size=…, compression=…)
        value = resp.decode()     # decode on demand

Standalone encode / decode

# decode an IPC payload you already have
payload: bytes = ...
value = q.decode(payload)

# encode a value back to IPC bytes
frame = q.encode(value, message_type=q.MessageType.SYNCHRONOUS)

Value types

Every conn.query() call returns a Value subclass:

q type Python type Arrow export
scalar (atom) Atom __arrow_c_array__
typed list Vector __arrow_c_array__
mixed list List __arrow_c_array__
dictionary Dictionary __arrow_c_array__ (StructArray)
table Table __arrow_c_stream__

Decode options

Control how IPC data is projected into Arrow:

opts = (
    q.DecodeOptions.builder()
    .with_symbol_interpretation(q.SymbolInterpretation.DICTIONARY)  # dict-encode symbols
    .with_temporal_nulls(True)          # map q null sentinels → None
    .with_treat_infinity_as_null(True)  # map ±∞ → None
    .with_parallel(True)                # decode table columns in parallel
    .build()
)

with q.Connection(endpoint, options=opts) as conn:
    result = conn.query("select from trade")

Endpoints

# TCP
endpoint = q.Endpoint.tcp(
    "localhost", 5000,
    username="user",
    password="pass",
    timeout_ms=3_000,
)

# Unix domain socket
endpoint = q.Endpoint.unix(
    "/tmp/qroissant.sock",
    username="user",
    password="pass",
)

Error handling

from qroissant import (
    QroissantError,   # base class
    DecodeError,      # malformed IPC payload
    ProtocolError,    # bad frame header
    TransportError,   # socket / IO failure
    QRuntimeError,    # q process returned an error
    PoolError,        # pool management failure
    PoolClosedError,  # operation on a closed pool
)

try:
    result = conn.query("invalid expression")
except q.QRuntimeError as e:
    print(f"q error: {e}")
except q.TransportError as e:
    print(f"connection lost: {e}")

Architecture

qroissant is organized as a Rust workspace with strict crate boundaries:

crates/
├── qroissant-core       # q protocol, value types, encode/decode
├── qroissant-transport  # sync & async TCP/Unix socket connections
├── qroissant-arrow      # zero-copy Arrow projection
├── qroissant-kernels    # SIMD / nightly-sensitive hot paths
└── qroissant-python     # PyO3 bindings (the _native extension module)

The Python package at python/qroissant/ re-exports everything from the compiled _native extension. The .pyi stub files in that directory define the public API contract.


Development

# Install Python dependencies
uv sync --group dev --group docs

# Build the Rust extension (required before running Python tests)
uv run maturin develop

# Run tests
uv run pytest
cargo test --workspace

# Lint and format
uv run ruff check python/ tests/
cargo fmt --all

Transport integration tests require a q binary. Set Q_BIN to the path of your q executable before running pytest.


License

Apache 2.0 — see LICENSE.