Compiler Compliance

Kairo Compiler Standards & Compliance

This document covers every standard, ABI, and specification Kairo targets, what compliance means in practice, and where Kairo intentionally deviates and why. It is written for engineers who need to know exactly what guarantees the compiler makes and what it doesn’t.


Floating-Point: IEEE 754-2019

Kairo targets IEEE 754-2019 (the 2019 revision of IEEE 754-2008) for all binary floating-point types: f16, f32, f64, f128.

What this means concretely:

  • All four basic arithmetic operations (+, -, *, /), sqrt, and fma produce correctly rounded results in round-to-nearest-even mode (the IEEE default). The result is the representable value closest to the infinitely precise result, with ties going to even.
  • Intermediate results within a single expression may be fused into FMA operations by the compiler (FP contraction is on by default, not fast). This can produce results that differ from performing the operations separately with two roundings. If you need strict per-operation rounding, use --fp-contract=off.
  • Overflow produces inf, underflow produces 0.0. Operations that produce NaN (e.g., 0.0 / 0.0, sqrt(-1.0)) propagate NaN per IEEE 754; no crash, no trap. Check for NaN explicitly with std::is_nan() when needed.
  • NaN payload bits are not guaranteed to be preserved across arithmetic operations and are not guaranteed to be portable across architectures. Inspecting NaN bit patterns is explicitly unsupported behavior. If you need NaN payloads, use f32::from_bits / f64::from_bits and document why.
  • The 2019 standard deprecated the non-associative minNum/maxNum operations from 2008 and replaced them with minimum/maximum which have well-defined NaN handling. Kairo’s min/max builtins follow the 2019 semantics.
  • Subnormal (denormal) numbers are handled per IEEE spec by default. Flush-to- zero mode is not enabled unless you explicitly opt in with --fp-model=fast or the @float_mode(fast) block annotation.

What Kairo does not guarantee:

  • Reproducibility of floating-point results across different optimization levels, different target architectures, or different LLVM versions. If you need bitwise-reproducible results, use -fp-contract=off -fp-model=strict and target a specific CPU with --mcpu.
  • Strict exception observability by default. FP exceptions (divide-by-zero, overflow, underflow, inexact, invalid) are not observable through fenv.h equivalents unless you compile with --fp-model=strict. The default (precise) treats exceptions as ignorable, which is what lets the optimizer e.g. hoist FP computations out of loops.

FP mode quick reference:

ModeWhat it doesWhen to use
precise (default)IEEE arithmetic, no unsafe rewrites, FMA onGeneral use
strictIEEE + observable exceptions + no FMAScientific code, reproducibility audits
fastEverything unsafe on (-ffast-math)SIMD kernels, numerical code you’ve already validated
@float_mode(fast)Block-scoped fast-mathHot inner loops only

Text Encoding: Unicode 15.1 / UTF-8

Source Files

Kairo source files are UTF-8 only. No BOM. No other encoding is accepted. The lexer validates UTF-8 byte sequences and rejects invalid sequences as hard errors with a diagnostic pointing at the exact byte offset and offending bytes. A BOM at the start of a .k file is an error.

String Internal Representation

Kairo’s string type is a 32-byte value type using UTF-8 encoding internally with small string optimization (SSO). Strings up to 23 bytes are stored inline without a heap allocation. Longer strings are heap-allocated.

  • Codepoint indexing: s[i] returns a char (4-byte Unicode scalar value) at codepoint position i. This is O(1) amortized; the implementation uses a sparse breadcrumb cache to locate the nearest codepoint boundary and decodes from there.
  • Byte indexing: s.bytes[i] returns a raw byte at byte offset i. This is O(1) always but returns raw UTF-8 bytes, not characters.
  • Iteration: for ch in s yields char values (decoded codepoints) sequentially.

See Primitives for the full string API.

Identifiers

Kairo identifiers may contain any Unicode scalar value that has the ID_Start or ID_Continue property (Unicode UAX #31), plus emoji and other non-letter code points that UAX #31 excludes. This is an intentional extension beyond the standard identifier grammar. Concretely:

  • Letters and digits from any script: Latin, Arabic, Chinese, Cyrillic, Devanagari, etc. all valid.
  • Emoji: valid.
  • Combining marks: valid as continuation characters.
  • Whitespace, control characters: never valid in identifiers.

Two identifiers are equal if and only if their sequences of scalar values are equal. Kairo does not perform Unicode normalization on identifiers. This matches the behavior of most modern languages. If you want normalization, normalize your source before feeding it to the compiler.

String Indexing and Grapheme Clusters

Indexing a string by integer (s[i]) returns the codepoint at position i, not a grapheme cluster. For most text this is the same thing, but for combining sequences and emoji with modifiers it is not:

  • "hello": s[1] returns 'e'. Five codepoints, five elements.
  • A string with a combining sequence (e.g., e + combining acute): the base character and the combining mark are separate codepoints and separate indices.
  • Emoji with ZWJ sequences (e.g., family emoji): multiple codepoints, multiple elements. len() returns the codepoint count, not the visual glyph count.

string.graphemes() returns an iterator of grapheme clusters per UAX #29 for code that needs to operate on user-visible characters.

Conversions and C/C++ Interop

string provides explicit conversion methods. When calling C++ functions with Kairo primitives, conversions to C++ string types are implicit:

ConversionOutputUse case
as libcxx::stringUTF-8 std::stringNatural fit; Kairo strings are UTF-8 internally
as libcxx::u16stringUTF-16 std::u16stringWindows WinAPI (LPWSTR), Java interop
as libcxx::u32stringUTF-32 std::u32stringAPIs expecting char32_t sequences
as unsafe *i8Null-terminated UTF-8Passing to C functions expecting const char*

See C/C++ Interop for the full FFI model.

What Kairo Does Not Do

  • Locale-sensitive operations: string comparison is binary (codepoint by codepoint). Locale-sensitive collation is not in the standard library. Use a third-party ICU binding if you need locale-aware sorting.
  • Encoding detection: Kairo does not detect or guess source file encoding. UTF-8, no BOM.
  • Null termination: string is not null-terminated internally. Conversion methods for C interop add a null byte. Passing raw string data directly to a C function expecting a null-terminated string is undefined behavior.

ABI: Platform psABIs

Kairo follows the platform processor ABI for the target triple. It does not define its own ABI and does not need to because Kairo compiles to native object files via Clang’s backend. The relevant documents per platform:

x86-64 Linux / BSD / macOS

  • System V AMD64 psABI v1.0 - calling convention, register usage, stack alignment (16-byte at call site), parameter passing, return value encoding, varargs layout, TLS model, ELF object format.
  • Kairo respects the red zone (128 bytes below RSP) by default. Kernel code should compile with --kernel-mode.
  • SIMD types (f32x4, f32x8, etc.) follow the psABI’s XMM/YMM/ZMM classification rules. Vectors passed on the stack are 16-byte aligned minimum.

AArch64 Linux

  • ARM64 ELF psABI (AAPCS64) - 16 general-purpose argument registers (x0-x7 for integers, v0-v7 for floats/vectors), caller-saved x0-x17, callee-saved x19-x28.
  • Pointer authentication is not enabled by default. Use --mbranch-protection to opt in.

macOS (Mach-O)

  • Apple ARM64 ABI on Apple Silicon - extends AAPCS64. Stack pointer must be 16-byte aligned at all call sites. Clang enforces this.
  • System V AMD64 on Intel Mac.
  • --macos-version-min is required if you target macOS older than the build machine SDK. Kairo does not guess a deployment target.

Windows (PE/COFF)

  • Microsoft x64 calling convention - 4 register parameters (rcx, rdx, r8, r9 for integers; xmm0-xmm3 for floats), shadow space requirement (32 bytes), caller cleans stack.
  • C++ ABI on Windows is MSVC. Set --cxx-abi=msvc when interoperating with MSVC-compiled C++ code. The default is Itanium which is what Clang uses on Windows by default for Clang-compiled code.

WASM / WASI

  • WebAssembly MVP + proposals per the flags you enable. WASI preview1 or preview2 depending on --wasi-version. The WASM calling convention is defined by the Wasm binary format spec; there is no psABI in the traditional sense.

Debug Info: DWARF 5

When compiled with -g / --debug-info, Kairo emits DWARF 5 by default. DWARF 4 is available via --dwarf-version=4 for targets that require it (e.g., older GDB versions, some embedded targets).

What Kairo emits:

  • DW_AT_language is set to DW_LANG_C_plus_plus_14 in Stage 0 (the current compiler transpiles Kairo to C++, so the DWARF reflects the C++ output). A proper DW_LANG_Kairo language tag will be registered with the DWARF committee once the language spec stabilizes and the Stage 1 compiler (which emits LLVM IR directly) is complete.
  • Source file paths in DWARF use the paths as given to the compiler, resolved relative to --working-dir if set. Use --remap-path-prefix or --debug-prefix-map to strip build-machine-specific paths for reproducible builds.
  • Line tables include column information (DWARF 5 DW_LNS_set_column), which allows debuggers to point at the specific expression within a line.
  • Split DWARF (--split-dwarf) produces .dwo files. This reduces link-time memory usage on large projects. The .dwo files must be accessible at debug time (either in the same directory or via debuginfod).
  • Type units are emitted for types defined in headers, reducing debug info size in multi-TU builds.

Comments and trivia in debug info:

The Kairo CST preserves comments and whitespace trivia. These are currently not emitted into DWARF (there’s no standard DWARF attribute for them). They are preserved in the CST for tooling use (formatters, LSP, etc.) but stripped before codegen.


Object Format: ELF / Mach-O / PE-COFF / WASM

Kairo emits the native object format for the target:

PlatformFormatNotes
Linux, BSD, AndroidELF64 (ELF32 with --m32)
macOS, iOSMach-O 64-bit
WindowsPE/COFF
WASMWebAssembly binary
Bare metalELF (usually)depends on target ABI

Kairo does not emit its own intermediate object format. The output of a compilation is a standard native object file that any conformant linker for that platform can process.


C++ Interop ABI

When Kairo calls into C++ via ffi "c++" import, the C++ code is compiled by Clang using the same backend invocation. Kairo has full native interop for all C++ standard versions through C++26.

The ABI boundary between Kairo-emitted code and C++-emitted code is:

  • Itanium C++ ABI on all platforms except Windows MSVC targets.
  • MSVC C++ ABI on Windows when --cxx-abi=msvc is set.
  • Name mangling follows the Itanium C++ ABI scheme by default. Kairo functions exported across the FFI boundary use C++ name mangling to support overloading and seamless interop with Clang-compiled C++ code. Explicit ffi "C" linkage (no mangling) is only used for C interop or when specifically requested.
  • C++ exceptions propagate across the Kairo/C++ boundary normally because both sides use the same unwinding tables (.eh_frame / __eh_frame). Kairo functions are trivially noexcept at the ABI level (panics are returned as tagged values, not thrown), so the unwinder can pass through them cleanly.
  • Virtual dispatch into C++ classes works without special handling because Kairo lowers method calls on C++ objects directly to Clang, which handles vtable dispatch normally.
  • C++ smart pointers (std::unique_ptr<T>, std::shared_ptr<T>, std::weak_ptr<T>) map to Kairo’s AMT-tracked equivalents (std::Unique<*T>, std::Shared<*T>, std::Weak<*T>) automatically. No unsafe block is required for FFI calls using smart pointers, references, move references, or value types. Only raw pointer parameters (T*, void*) require an unsafe block.

See C/C++ Interop and Unsafe for the full FFI safety model.


Linker: LLD / Platform Linker

kld (Kairo’s linker driver) defaults to LLD for all targets. LLD’s compatibility targets:

ModeCompatibility
ELFGNU ld compatible
Mach-OApple ld64 compatible
PE/COFFMSVC link.exe compatible
WASMwasm-ld (LLD’s WASM mode)

Kairo does not require LLD. You can use the platform linker by passing --ld-flags to forward arguments directly, though the default flag translation assumes LLD syntax. GNU ld is supported for ELF targets.


Reproducible Builds

Kairo supports reproducible builds when the following conditions are met:

  1. Same source, same flags, same --target, same --cxx-std.
  2. --no-timestamps is set (strips all embedded timestamps).
  3. --remap-path-prefix is used to normalize absolute paths.
  4. SOURCE_DATE_EPOCH environment variable is set (Clang reads this for embedded timestamps in DWARF; --source-date-epoch sets it for you).
  5. --lto=off (LTO can produce non-deterministic output across runs due to parallel section ordering in ThinLTO).
  6. Same LLVM version. The backend is not byte-for-byte stable across LLVM major versions.

Kairo does not guarantee reproducibility across different --opt levels or when --fp-model=fast is used (fast-math can produce different reassociations depending on whether the optimizer’s heuristics fire).


What Kairo Explicitly Does Not Target

  • POSIX.1-2024: Kairo’s standard library does not wrap POSIX. You call POSIX functions via FFI if you need them. The Kairo stdlib provides its own file, threading, and memory abstractions that map to the platform without requiring POSIX compliance from the target.
  • C++ standard (ISO/IEC 14882): Kairo is not C++ and does not attempt C++ conformance. C++ interop works through Clang, not through implementing the C++ standard in Kairo.
  • MISRA / CERT / SEI coding standards: These are auditable properties of specific programs, not language-level guarantees. Kairo does not ship a MISRA checker. You can run your Kairo-generated C++ through existing MISRA tools if your target requires this.
  • ISO/IEC 9899 (C standard): Same as C++. C interop works through the FFI layer. Kairo is not a C compiler.

Version Pinning

StandardVersion targeted
IEEE floating-point754-2019
Unicode15.1
UTF-8 encodingRFC 3629
DWARF debug info5 (default), 4 (opt-in)
x86-64 psABISystem V AMD64 v1.0
AArch64 psABIAAPCS64 (ARM IHI0055)
Itanium C++ ABICurrent (Clang tracks this)
C++ interopAll versions through C++26
WASIpreview1 (default), preview2 (opt-in)
WebAssembly binaryMVP + explicit proposals
ELFSystem V gABI + platform psABI
Mach-O64-bit Mach-O
PE/COFFMicrosoft PE32+