attempt 3

This commit is contained in:
Cam 2026-05-20 14:16:56 -04:00
parent de214336a9
commit 7c3fbd73fd
12 changed files with 594832 additions and 64277 deletions

114
BUILD.md Normal file
View file

@ -0,0 +1,114 @@
# Building qroissant wheels
This project ships as a [maturin](https://www.maturin.rs/) / PyO3 mixed
Rust+Python package. The native extension uses `#![feature(portable_simd)]` so
**nightly Rust is required**, pinned via `rust-toolchain.toml`.
All builds run inside a Docker image (`scripts/Dockerfile.build`) so the host
needs only Docker and Python 3 — no Rust, no MSVC SDK, no mingw to install.
## Prerequisites
- Docker (tested with 29.x)
- Python 3.11+ on the host, **only** if you want to install/test the produced
Linux wheel locally via `scripts/build.sh check`.
## One-time setup
```sh
scripts/build.sh image
```
Builds `qroissant-build:latest`. Pulls `ghcr.io/rust-cross/cargo-xwin`,
installs the nightly toolchain + `x86_64-pc-windows-msvc` target + maturin.
~5 min on a cold cache, then it's a no-op.
## Build commands
```sh
scripts/build.sh linux # -> dist-linux/qroissant-*-manylinux_*_x86_64.whl
scripts/build.sh windows # -> dist-windows/qroissant-*-win_amd64.whl
scripts/build.sh all # image + linux + windows
scripts/build.sh check # install latest linux wheel into .venv, import-smoke,
# then `zipfile -l` the windows wheel
scripts/build.sh clean # rm dist-linux dist-windows
scripts/build.sh clean-cache # also drop the cargo/target/xwin Docker volumes
```
The wheels target **abi3-py311**, so a single artifact per platform covers
CPython 3.11, 3.12, 3.13, and onward.
## What lives where
| Artifact | Path |
| --- | --- |
| Build image definition | `scripts/Dockerfile.build` |
| Wrapper script | `scripts/build.sh` |
| Toolchain pin | `rust-toolchain.toml` |
| Linux wheels | `dist-linux/` (gitignored) |
| Windows wheels | `dist-windows/` (gitignored) |
| Cargo registry cache | Docker volume `qroissant-cargo-registry` |
| Cargo target dir | Docker volume `qroissant-target` |
| cargo-xwin MSVC cache | Docker volume `qroissant-xwin` |
The three Docker volumes persist between runs so reruns only rebuild changed
crates; nuke them with `scripts/build.sh clean-cache` if anything looks stale.
## Debug symbols on Windows
The release profile sets `debug = "line-tables-only"`, so a `_native.pdb` is
produced alongside the wheel. `scripts/build.sh windows` copies it into
`dist-windows/`. To get resolved frames in a Rust panic backtrace on a Windows
machine:
1. `pip install qroissant-*-win_amd64.whl`
2. Locate the installed extension, e.g.
`Lib\site-packages\qroissant\_native.pyd`
3. Drop `_native.pdb` next to it (same directory, same basename).
Symbol info is line-tables-only, so frames resolve to `file:line` but local
variables and types aren't included. That's enough for backtraces and is
~30 MB; full debug info would be much larger.
## How Windows cross-compile works
The Windows path uses [cargo-xwin](https://github.com/rust-cross/cargo-xwin),
which downloads the Microsoft CRT + Windows SDK headers (cached in the xwin
volume; license auto-accepted via `XWIN_ACCEPT_LICENSE=1`). PyO3's
`generate-import-lib` feature — enabled in
`crates/qroissant-python/Cargo.toml` — lets us produce the `python3.dll`
import library at build time, so no Windows Python install is required on the
host.
## Quick wheel sanity checks
```sh
# Listing
python3 -m zipfile -l dist-windows/qroissant-*-win_amd64.whl
# Validate the dist-info
pipx run twine check dist-linux/*.whl dist-windows/*.whl
```
## Iterating on Python code
For Python-side changes (no Rust touched), the fastest loop is still
`maturin develop` against a local rustup install. The Docker flow above is
optimized for producing wheels, not for inner-loop iteration. If you have a
nightly rustup toolchain on the host, you can do:
```sh
python3 -m venv .venv && .venv/bin/pip install 'maturin>=1.8,<2.0'
.venv/bin/maturin develop --release
.venv/bin/python -c 'import qroissant'
```
## Troubleshooting
- **"manylinux_2_34" wheel won't install on old distros** — the image base is
Debian 13 (glibc 2.38). For broader compatibility, build inside a
manylinux2014 image or run `auditwheel repair` against the produced wheel.
- **"failed to generate python3.dll import library"** — means `llvm-dlltool`
is missing inside the image. Rebuild it with `scripts/build.sh image`.
- **xwin download is slow / fails** — Microsoft's CDN occasionally rate-limits.
Drop the cache (`docker volume rm qroissant-xwin`) and retry.

10
Cargo.lock generated
View file

@ -868,6 +868,7 @@ version = "0.28.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8bf94ee265674bf76c09fa430b0e99c26e319c945d96ca0d5a8215f31bf81cf7"
dependencies = [
"python3-dll-a",
"target-lexicon",
]
@ -906,6 +907,15 @@ dependencies = [
"syn 2.0.117",
]
[[package]]
name = "python3-dll-a"
version = "0.2.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d80ba7540edb18890d444c5aa8e1f1f99b1bdf26fb26ae383135325f4a36042b"
dependencies = [
"cc",
]
[[package]]
name = "qroissant-arrow"
version = "0.3.1"

View file

@ -12,6 +12,11 @@ repository = "https://github.com/qroissant/qroissant"
lto = "fat"
codegen-units = 1
opt-level = 3
# Line tables only: keeps optimisations identical to a normal release build
# but emits enough info for Rust panic backtraces to resolve to file:line.
# On windows-msvc this produces a PDB next to _native.pyd (see BUILD.md).
debug = "line-tables-only"
strip = "none"
[workspace.dependencies]
pyo3 = "0.28.2"

View file

@ -14,7 +14,7 @@ path = "src/lib.rs"
bb8 = "0.9.0"
bytes = "1.11.1"
chrono = "0.4.44"
pyo3 = { workspace = true, features = ["extension-module", "abi3-py311"] }
pyo3 = { workspace = true, features = ["extension-module", "abi3-py311", "generate-import-lib"] }
pyo3-arrow = { version = "0.17.0", default-features = false }
pyo3-async-runtimes = { version = "0.28.0", features = ["tokio-runtime"] }
qroissant-arrow = { path = "../qroissant-arrow" }

BIN
dist-windows/_native.pdb Normal file

Binary file not shown.

530253
dist-windows/_native.pdb.txt Normal file

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

4
rust-toolchain.toml Normal file
View file

@ -0,0 +1,4 @@
[toolchain]
channel = "nightly"
components = ["rustfmt", "clippy", "rust-src"]
targets = ["x86_64-unknown-linux-gnu"]

25
scripts/Dockerfile.build Normal file
View file

@ -0,0 +1,25 @@
# Build environment for qroissant.
#
# Base: ghcr.io/rust-cross/cargo-xwin — Debian 13, ships with cargo-xwin,
# llvm-dlltool, and the x86_64-pc-windows-msvc rustup target preinstalled.
# We layer on nightly Rust (project uses #![feature(portable_simd)]),
# lld (so cargo-xwin can invoke lld-link), and maturin.
FROM ghcr.io/rust-cross/cargo-xwin:latest
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update \
&& apt-get install -y --no-install-recommends lld python3-pip python3-venv \
&& rm -rf /var/lib/apt/lists/* \
&& ln -sf /usr/bin/lld /usr/bin/lld-link \
&& ln -sf /usr/bin/clang /usr/bin/clang-cl
RUN rustup toolchain install nightly --profile minimal \
--component rustfmt --component clippy --component rust-src \
--component llvm-tools-preview \
&& rustup target add --toolchain nightly x86_64-pc-windows-msvc \
&& rustup default nightly
RUN pip3 install --no-cache-dir --break-system-packages 'maturin>=1.8,<2.0'
ENV PATH="/usr/local/cargo/bin:${PATH}"

158
scripts/build.sh Executable file
View file

@ -0,0 +1,158 @@
#!/usr/bin/env bash
# Build qroissant wheels via Docker. Both Linux and Windows builds run inside
# the qroissant-build:latest image (defined in scripts/Dockerfile.build) so the
# host needs only Docker + Python.
#
# Usage:
# scripts/build.sh image Build the qroissant-build Docker image.
# scripts/build.sh linux Build a manylinux x86_64 wheel -> dist-linux/
# scripts/build.sh windows Build a win_amd64 wheel -> dist-windows/
# scripts/build.sh all image + linux + windows.
# scripts/build.sh check Install the latest linux wheel into .venv
# and import qroissant as a smoke test.
# scripts/build.sh clean Remove dist-*/ output dirs (volumes kept).
# scripts/build.sh clean-cache Also drop the Docker volumes (cargo
# registry, target dir, xwin SDK cache).
set -euo pipefail
SCRIPT_DIR=$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" &>/dev/null && pwd)
ROOT_DIR=$(cd -- "$SCRIPT_DIR/.." &>/dev/null && pwd)
cd "$ROOT_DIR"
IMAGE=qroissant-build:latest
VOL_CARGO=qroissant-cargo-registry
VOL_TARGET=qroissant-target
VOL_XWIN=qroissant-xwin
need_docker() {
command -v docker >/dev/null 2>&1 || {
echo "error: docker is required but not on PATH" >&2
exit 1
}
}
ensure_volumes() {
local uid gid
uid=$(id -u); gid=$(id -g)
for v in "$VOL_CARGO" "$VOL_TARGET" "$VOL_XWIN"; do
if ! docker volume inspect "$v" >/dev/null 2>&1; then
docker volume create "$v" >/dev/null
fi
# Re-chown the volume to the invoking user every run; cheap and idempotent.
docker run --rm -v "$v":/v --entrypoint=sh "$IMAGE" -c "chown -R $uid:$gid /v" >/dev/null
done
}
ensure_image() {
if ! docker image inspect "$IMAGE" >/dev/null 2>&1; then
echo ">>> building $IMAGE"
docker build -t "$IMAGE" -f "$SCRIPT_DIR/Dockerfile.build" "$SCRIPT_DIR"
fi
}
run_in_image() {
# Run as the invoking user so produced wheels are owned by them, not root.
# CARGO_HOME / XDG_CACHE_HOME are redirected into the persistent volumes,
# which must therefore be owned by the same uid (handled by ensure_volumes).
docker run --rm \
--user "$(id -u):$(id -g)" \
-e CARGO_HOME=/cargo \
-e XDG_CACHE_HOME=/xdg-cache \
-e HOME=/tmp \
-e XWIN_ACCEPT_LICENSE=1 \
-v "$ROOT_DIR":/io -w /io \
-v "$VOL_CARGO":/cargo \
-v "$VOL_TARGET":/io/target \
-v "$VOL_XWIN":/xdg-cache/cargo-xwin \
--entrypoint=sh \
"$IMAGE" -c "$1"
}
cmd_image() {
need_docker
echo ">>> (re)building $IMAGE"
docker build -t "$IMAGE" -f "$SCRIPT_DIR/Dockerfile.build" "$SCRIPT_DIR"
}
cmd_linux() {
need_docker
ensure_image
ensure_volumes
echo ">>> building Linux wheel -> dist-linux/"
run_in_image 'maturin build --release --out /io/dist-linux'
ls -1 dist-linux/*.whl
}
cmd_windows() {
need_docker
ensure_image
ensure_volumes
echo ">>> building Windows wheel -> dist-windows/"
run_in_image '
maturin build --release --target x86_64-pc-windows-msvc --out /io/dist-windows
# Copy the PDB next to the wheel when one was produced (release profile
# has debug = "line-tables-only"). Place it next to _native.pyd at
# runtime on Windows to get resolved panic backtraces.
pdb=$(find /io/target/x86_64-pc-windows-msvc/release -maxdepth 2 -name "_native.pdb" 2>/dev/null | head -1)
if [ -n "$pdb" ]; then
cp "$pdb" /io/dist-windows/
echo "+ copied $(basename "$pdb") -> dist-windows/"
fi
'
ls -1 dist-windows/
}
cmd_check() {
local venv="$ROOT_DIR/.venv"
if [[ ! -x "$venv/bin/python" ]]; then
echo ">>> creating venv at $venv"
python3 -m venv "$venv"
"$venv/bin/pip" install --quiet --upgrade pip
fi
local wheel
wheel=$(ls -t dist-linux/qroissant-*-linux*.whl dist-linux/qroissant-*-manylinux*.whl 2>/dev/null | head -1 || true)
if [[ -z "$wheel" ]]; then
echo "error: no Linux wheel in dist-linux/; run '$0 linux' first" >&2
exit 1
fi
echo ">>> installing $wheel"
"$venv/bin/pip" install --quiet --force-reinstall "$wheel"
"$venv/bin/python" -c 'import qroissant; print("ok: qroissant", qroissant.__version__ if hasattr(qroissant, "__version__") else "(no __version__)")'
if [[ -f dist-windows/qroissant-*-win_amd64.whl ]] 2>/dev/null || ls dist-windows/qroissant-*-win_amd64.whl >/dev/null 2>&1; then
echo ">>> inspecting Windows wheel"
for w in dist-windows/qroissant-*-win_amd64.whl; do
python3 -m zipfile -l "$w" | grep -E '_native\.pyd|WHEEL' || true
done
fi
}
cmd_clean() {
rm -rf dist-linux dist-windows
echo ">>> removed dist-linux/ dist-windows/"
}
cmd_clean_cache() {
cmd_clean
for v in "$VOL_CARGO" "$VOL_TARGET" "$VOL_XWIN"; do
docker volume rm "$v" >/dev/null 2>&1 && echo ">>> removed volume $v" || true
done
}
case "${1:-}" in
image) cmd_image ;;
linux) cmd_linux ;;
windows) cmd_windows ;;
all) cmd_image; cmd_linux; cmd_windows ;;
check) cmd_check ;;
clean) cmd_clean ;;
clean-cache) cmd_clean_cache ;;
""|-h|--help)
sed -n '2,/^set -euo/p' "${BASH_SOURCE[0]}" | sed 's/^# \{0,1\}//' | sed '/^set -euo/d'
;;
*)
echo "error: unknown command: $1" >&2
echo "run '$0 --help' for usage" >&2
exit 2
;;
esac