Skip to content

Fix Android NDK relocation for Python 3.13+#8

Merged
FeodorFitsner merged 1 commit into
mainfrom
fix/android-ndk-relocation-3.13
Jun 3, 2026
Merged

Fix Android NDK relocation for Python 3.13+#8
FeodorFitsner merged 1 commit into
mainfrom
fix/android-ndk-relocation-3.13

Conversation

@FeodorFitsner
Copy link
Copy Markdown
Contributor

Summary

  • android/build.sh: add a third \$toolchain fallback that walks \$ANDROID_HOME/ndk/\$cpython_ndk_version/toolchains/llvm/prebuilt/*. For 3.13+, CPython's in-tree Android tooling resolves the NDK there without ever exporting NDK_HOME, so neither of the existing fallbacks fires and normalize_mobile_forge_install.py --ndk-toolchain '' ends up baking _build_ndk = '' into the sysconfig.
  • android/normalize_mobile_forge_install.py: reorder NDK substitution to run before install-prefix substitution. For 3.13+ the CI's NDK lives under /usr/local/lib/android/sdk/ndk/..., which is inside one of _install_prefixes — so the prefix rule was mangling the NDK path string before the NDK rule could match it.

Symptom (3.13/3.14 on macOS)

forge android:arm64-v8a cryptography
ERROR: Cannot find cross-compiler ('<install>/lib/android/sdk/ndk/27.3.13750724/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android24-clang')!

The path points at linux-x86_64 (binaries built and shipped for Linux only) on a Mac, where the right slot would be darwin-x86_64. After both fixes, the relocation block finds the user's local NDK (via ~/Library/Android/sdk/ndk/<ver>, NDK_HOME, etc.) and rewrites CC/CXX/AR/STRIP/RANLIB to the matching darwin-x86_64 toolchain.

Why 3.12 was unaffected

3.12's build path sources android-env.sh (which sets \$toolchain directly) and the CI's NDK lives under ~/ndk/r27d/..., which is not inside _install_prefixes — so neither bug fires for that version.

Test plan

  • Verified the 3.13 / 3.14 sysconfig files on disk carry _build_ndk = '' and the resulting CC path with the wrong host slot
  • Re-run the python-build CI on this branch and confirm 3.13 / 3.14 tarballs land with non-empty _build_ndk in the relocation block
  • Download the new tarballs on macOS and confirm `forge android:arm64-v8a ` (Python 3.13 and 3.14) resolves CC to the local NDK's darwin-x86_64 toolchain
  • Confirm Python 3.12 builds still pass (no regression)

🤖 Generated with Claude Code

The mobile-forge sysconfig relocation block was a no-op on macOS for
3.13/3.14 builds because two related bugs combined to leave
`_build_ndk` empty and—even when set—matched the wrong substring.

1. android/build.sh: the existing `$toolchain` fallbacks only fired for
   the 3.12 build path (which sources `android-env.sh`, setting
   `$toolchain`) or when `$HOME/ndk/r27d` exists. For 3.13+, CPython's
   in-tree Android tooling resolves the NDK under
   `$ANDROID_HOME/ndk/$cpython_ndk_version/` without ever exporting
   `NDK_HOME`, so `$toolchain` stayed empty and
   `normalize_mobile_forge_install.py --ndk-toolchain ""` baked
   `_build_ndk = ''` into the sysconfig. Add a third fallback that
   reads the NDK location CPython actually used.

2. android/normalize_mobile_forge_install.py: the relocation block did
   install-prefix substitution before NDK substitution. For 3.13+ the
   CI's NDK lives under `/usr/local/lib/android/sdk/ndk/...`, which is
   inside one of `_install_prefixes` — so the prefix rule rewrote the
   NDK path string first, leaving nothing for the NDK rule to match.
   Reorder so NDK substitution runs first.

Symptom on macOS: `forge android:arm64-v8a <pkg>` crashed in crossenv
with "Cannot find cross-compiler" pointing at
`<install>/lib/android/sdk/ndk/.../prebuilt/linux-x86_64/bin/...` —
binaries that aren't shipped (the tarball only contains the host the
build ran on) and the wrong host slot for macOS anyway. After both
fixes, the relocation block finds the user's local NDK
(via `~/Library/Android/sdk/ndk/<ver>`, `NDK_HOME`, etc.) and rewrites
CC/CXX/AR/STRIP/RANLIB to the matching `darwin-x86_64` toolchain.

3.12 was unaffected because its build path sourced `android-env.sh`
(setting `$toolchain`) and the CI NDK lived under `~/ndk/r27d/...`,
which is not inside `_install_prefixes`.
@FeodorFitsner FeodorFitsner merged commit 831e74d into main Jun 3, 2026
13 checks passed
@FeodorFitsner FeodorFitsner deleted the fix/android-ndk-relocation-3.13 branch June 3, 2026 00:53
ndonkoHenri added a commit to ndonkoHenri/python-build that referenced this pull request Jun 3, 2026
Five `python3 -m unittest` cases covering the relocator's substitution
chain on the consumer host, all under android/tests/. Each test renders
the relocation block into a stub _sysconfigdata__linux_.py inside a
tempdir, builds the on-disk layout the relocator's NDK lookup expects,
exec's the rendered file as a Python module, and asserts the resulting
build_time_vars.

The first test pins down the regression introduced by flet-dev#8 and fixed in
the previous commit: when the consumer's NDK_HOME contains /usr/local
(GitHub runners exposing the Android SDK there), the previous
`_install_prefixes = (_build_prefix, "/usr/local")` would mangle CC into
e.g. `<consumer_prefix>/lib/android/sdk/ndk/<ver>/.../bin/clang` —
ENOENT, crossenv silently dies. The test was verified to FAIL against
the pre-fix code:

  AssertionError: CC must resolve to the consumer's NDK toolchain.
  '/tmp/runner_root/<consumer_prefix>/lib/android/sdk/ndk/.../bin/clang'
  vs.
  '/tmp/runner_root/usr/local/lib/android/sdk/ndk/.../bin/clang'

Cases covered:

- test_mobile_forge_runner_ndk_under_usr_local — the regression.
- test_macos_dev_ndk_under_library — 3.13+ dev box flow; CC should
  resolve to the darwin-x86_64 toolchain with no /usr/local left.
- test_no_consumer_ndk_keeps_build_path — relocator must leave CC alone
  when no consumer NDK is reachable, not blank it or corrupt it.
- test_build_prefix_substitution — `_build_prefix` correctly rewrites
  to the consumer's install prefix.
- test_compound_value_both_substitutions_apply_cleanly — a single
  baked value containing BOTH the NDK and the install prefix must have
  both rewritten without either rule mangling the other's output;
  guards against future order-of-substitution regressions of the same
  shape as flet-dev#8.

Test scoping infra: `android/tests/_testlib.py` exposes
`requires_python(*versions)` — a `unittest.skipUnless` wrapper driven
by the `PYTHON_VERSION_SHORT` env var that build-python-version.yml
already exports. The relocator test itself is version-agnostic and
takes no decorator; future tests that only apply to e.g. 3.12 can drop
in alongside it and decorate `@requires_python("3.12")` without any
new CI plumbing.

CI: build-python-version.yml gets one extra step right after
setup-python that runs `python3 -m unittest discover` on each matrix
shard. The step is required (no `continue-on-error`) — silent test
failures are worse than no tests.
ndonkoHenri added a commit to ndonkoHenri/python-build that referenced this pull request Jun 3, 2026
A test scaffold under android/tests/ that scales without growing the
workflow file. Tests opt INTO a CI phase via `@pre_build` or
`@post_build` decorators from `_testlib`; the workflow has exactly
two test steps (one per phase) and never has to learn about new
tests as more get added.

Test scoping API (android/tests/_testlib.py):

  @pre_build       — run before build-all.sh (also runs locally when
                     MOBILE_FORGE_TEST_PHASE is unset).
  @post_build      — run after build-all.sh, with the freshly built
                     install tree accessible via MOBILE_FORGE_INSTALL_TREE.
  @requires_python — gate on PYTHON_VERSION_SHORT (one or more values).

All three are stdlib unittest.skipUnless wrappers — no pytest, no
custom test loader. They compose freely (@post_build + @requires_python).
Skip reasons are verbose by design: they tell future-you which env var
was mis-set instead of silently no-running.

Tests landed in this commit:

  android/tests/test_normalize_mobile_forge_install.py — @pre_build
    Five unit tests for `normalize_mobile_forge_install.append_relocation_block()`.
    The first one pins the flet-dev#8 regression that the previous commit
    (63f9fe3) fixed: when consumer NDK_HOME contains /usr/local
    (mobile-forge runner case), the relocator must not mangle the path
    the NDK rule just resolved. Verified to FAIL against pre-fix code
    and PASS against the fix:
        AssertionError:
        '<consumer_prefix>/lib/android/sdk/ndk/<ver>/.../bin/clang'  (broken)
        vs.
        '/.../usr/local/lib/android/sdk/ndk/<ver>/.../bin/clang'    (fixed)

  android/tests/test_built_install_tree.py — @post_build
    Four integration tests that read the freshly built install tree
    and assert structural invariants on every shipped sysconfigdata:

      - at least one sysconfigdata present (catches build-all.sh
        silently producing no artifacts)
      - the relocator block marker is present (catches build-all.sh
        skipping the normalize_mobile_forge_install.py invocation)
      - `_install_prefixes` does NOT contain /usr/local (regression
        guard for flet-dev#8 at the shipped artifact level)
      - `_build_ndk` is not empty (regression guard for the $toolchain
        fallback chain PR flet-dev#8 introduced for 3.13+)

CI wiring (.github/workflows/build-python-version.yml):

  - new step before build-all.sh:
      `Run android tests (pre-build)`
      env: MOBILE_FORGE_TEST_PHASE=pre_build
      → catches source-only regressions in ~1s, before the ~12 min build.

  - new step after build-all.sh:
      `Run android tests (post-build)`
      env: MOBILE_FORGE_TEST_PHASE=post_build,
           MOBILE_FORGE_INSTALL_TREE=$GITHUB_WORKSPACE/android/install
      → verifies the shipped artifacts have the structure consumers
        (mobile-forge crossenv, flet build) depend on.

Both steps are required to pass — no continue-on-error. Adding new
test files later doesn't touch the workflow.
FeodorFitsner pushed a commit that referenced this pull request Jun 3, 2026
* Fix sysconfig relocator mangling NDK paths via overly-broad /usr/local prefix

`_install_prefixes` was `(_build_prefix, "/usr/local")`. The second entry
is overly broad and never legitimately matches anything in the Android
cross-compile sysconfig outside of a build-time NDK path that happens to
live under /usr/local (3.13+ CI). When the consumer also has its NDK
under /usr/local — e.g. mobile-forge CI where the runner exports
`NDK_HOME=/usr/local/lib/android/sdk/ndk/<version>` — the substitution
chain breaks:

  1. NDK rule rewrites _build_ndk → _local_ndk. _local_ndk starts
     with /usr/local on the runner.
  2. install-prefix rule rewrites "/usr/local" → _prefix (the consumer's
     python install dir). This mangles the path the NDK rule just
     correctly resolved.

Result on the mobile-forge runner against the post-#8 v3.12 release:

  CC = '/home/runner/projects/python-build/android/install/android/
        arm64-v8a/python-3.12.12/lib/android/sdk/ndk/<ver>/toolchains/
        llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android24-clang'

That path doesn't exist. crossenv reads CC out of sysconfigdata and
exits ENOENT — which is the silent `CalledProcessError` seen in 30+
Android wheel jobs in ndonkoHenri/mobile-forge run 26856516473.

The fix:

- Drop "/usr/local" from `_install_prefixes`. The dedicated
  `_build_ndk → _local_ndk` substitution already handles every NDK
  path correctly; the install-prefix entry adds no new coverage and
  breaks the case where the consumer's NDK happens to share that
  prefix.

- Restore the original ordering of the two substitution rules
  (install-prefix first, then NDK). #8 swapped these as a workaround
  for the /usr/local entry's order-dependency. With /usr/local gone
  from the tuple, the two rules operate on disjoint substrings and the
  order between them is irrelevant — both orders produce identical
  output for every consumer environment we ship to (3.12/3.13/3.14 ×
  Linux/macOS).

3.12 mobile-forge consumers, post-fix:

  CC = '/usr/local/lib/android/sdk/ndk/<ver>/toolchains/llvm/prebuilt/
        linux-x86_64/bin/aarch64-linux-android24-clang'

3.13+ macOS dev consumers (the case #8 was actually trying to fix),
post-fix:

  CC = '~/Library/Android/sdk/ndk/<ver>/toolchains/llvm/prebuilt/
        darwin-x86_64/bin/aarch64-linux-android24-clang'

Both correct, no mangling.

Follow-up that should land in a separate PR: add a unit test that
exercises the relocator with NDK_HOME under /usr/local so this exact
regression can't slip in again unnoticed.

* Add android unit tests + two-phase CI test framework

A test scaffold under android/tests/ that scales without growing the
workflow file. Tests opt INTO a CI phase via `@pre_build` or
`@post_build` decorators from `_testlib`; the workflow has exactly
two test steps (one per phase) and never has to learn about new
tests as more get added.

Test scoping API (android/tests/_testlib.py):

  @pre_build       — run before build-all.sh (also runs locally when
                     MOBILE_FORGE_TEST_PHASE is unset).
  @post_build      — run after build-all.sh, with the freshly built
                     install tree accessible via MOBILE_FORGE_INSTALL_TREE.
  @requires_python — gate on PYTHON_VERSION_SHORT (one or more values).

All three are stdlib unittest.skipUnless wrappers — no pytest, no
custom test loader. They compose freely (@post_build + @requires_python).
Skip reasons are verbose by design: they tell future-you which env var
was mis-set instead of silently no-running.

Tests landed in this commit:

  android/tests/test_normalize_mobile_forge_install.py — @pre_build
    Five unit tests for `normalize_mobile_forge_install.append_relocation_block()`.
    The first one pins the #8 regression that the previous commit
    (63f9fe3) fixed: when consumer NDK_HOME contains /usr/local
    (mobile-forge runner case), the relocator must not mangle the path
    the NDK rule just resolved. Verified to FAIL against pre-fix code
    and PASS against the fix:
        AssertionError:
        '<consumer_prefix>/lib/android/sdk/ndk/<ver>/.../bin/clang'  (broken)
        vs.
        '/.../usr/local/lib/android/sdk/ndk/<ver>/.../bin/clang'    (fixed)

  android/tests/test_built_install_tree.py — @post_build
    Four integration tests that read the freshly built install tree
    and assert structural invariants on every shipped sysconfigdata:

      - at least one sysconfigdata present (catches build-all.sh
        silently producing no artifacts)
      - the relocator block marker is present (catches build-all.sh
        skipping the normalize_mobile_forge_install.py invocation)
      - `_install_prefixes` does NOT contain /usr/local (regression
        guard for #8 at the shipped artifact level)
      - `_build_ndk` is not empty (regression guard for the $toolchain
        fallback chain PR #8 introduced for 3.13+)

CI wiring (.github/workflows/build-python-version.yml):

  - new step before build-all.sh:
      `Run android tests (pre-build)`
      env: MOBILE_FORGE_TEST_PHASE=pre_build
      → catches source-only regressions in ~1s, before the ~12 min build.

  - new step after build-all.sh:
      `Run android tests (post-build)`
      env: MOBILE_FORGE_TEST_PHASE=post_build,
           MOBILE_FORGE_INSTALL_TREE=$GITHUB_WORKSPACE/android/install
      → verifies the shipped artifacts have the structure consumers
        (mobile-forge crossenv, flet build) depend on.

Both steps are required to pass — no continue-on-error. Adding new
test files later doesn't touch the workflow.

* Add concurrency control and restrict publishing

- Introduced concurrency to cancel in-progress runs on the same ref.
- Limited workflow execution to the `main` branch.

* apply review patches

* Fix post-build tests on 3.13+: glob _sysconfigdata__*.py, not just __linux_.py

The post-build install-tree tests hardcoded the 3.12 sysconfigdata
filename (`_sysconfigdata__linux_.py`), which doesn't exist on 3.13+.
CPython's official Android tooling (used in the 3.13+ build path)
names the file by host triple instead:

  3.12   (legacy android-env.sh):    _sysconfigdata__linux_.py
  3.13+  (CPython's Android tooling): _sysconfigdata__android_<host>-linux-android.py

The relocator itself was unaffected — `find_sysconfigdata()` in
`normalize_mobile_forge_install.py` already globs `_sysconfigdata__*.py`
and applies the relocation block to whatever variant exists. It's
purely the post-build tests in `test_built_install_tree.py` that
needed the same glob. Symptom in CI: 3.13/3.14 matrix shards failed
`test_at_least_one_sysconfigdata_present` (the safety net); the other
three post-build tests iterated zero files and degenerated to silent
vacuous passes.

Fix: switch `_sysconfigdata_files()` to glob `_sysconfigdata__*.py`
inside the per-version lib dir (same pattern the production-side
`find_sysconfigdata()` uses). Updated the docstring + the
`test_at_least_one_sysconfigdata_present` error message to drop the
3.12-only filename mention so future failures point at the right
thing.

Verified locally by synthesizing a 3.13 install tree with
Android-named sysconfigdata files (`_sysconfigdata__android_arm64_v8a-linux-android.py`
and `_sysconfigdata__android_x86_64-linux-android.py`) and running
the post-build suite — all 4 tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant