Kairo Compiler Standards & Compliance
This document covers every standard, ABI, and specification Kairo targets, what compliance means in practice, and where Kairo intentionally deviates and why. It is written for engineers who need to know exactly what guarantees the compiler makes and what it doesn’t.
Floating-Point: IEEE 754-2019
Kairo targets IEEE 754-2019 (the 2019 revision of IEEE 754-2008) for all
binary floating-point types: f16, f32, f64, f128.
What this means concretely:
- All four basic arithmetic operations (
+,-,*,/),sqrt, andfmaproduce correctly rounded results in round-to-nearest-even mode (the IEEE default). The result is the representable value closest to the infinitely precise result, with ties going to even. - Intermediate results within a single expression may be fused into FMA
operations by the compiler (FP contraction is
onby default, notfast). This can produce results that differ from performing the operations separately with two roundings. If you need strict per-operation rounding, use--fp-contract=off. - Overflow produces
inf, underflow produces0.0. Operations that produceNaN(e.g.,0.0 / 0.0,sqrt(-1.0)) propagateNaNper IEEE 754; no crash, no trap. Check forNaNexplicitly withstd::is_nan()when needed. - NaN payload bits are not guaranteed to be preserved across arithmetic
operations and are not guaranteed to be portable across architectures.
Inspecting NaN bit patterns is explicitly unsupported behavior. If you need
NaN payloads, use
f32::from_bits/f64::from_bitsand document why. - The 2019 standard deprecated the non-associative
minNum/maxNumoperations from 2008 and replaced them withminimum/maximumwhich have well-defined NaN handling. Kairo’smin/maxbuiltins follow the 2019 semantics. - Subnormal (denormal) numbers are handled per IEEE spec by default. Flush-to-
zero mode is not enabled unless you explicitly opt in with
--fp-model=fastor the@float_mode(fast)block annotation.
What Kairo does not guarantee:
- Reproducibility of floating-point results across different optimization levels,
different target architectures, or different LLVM versions. If you need
bitwise-reproducible results, use
-fp-contract=off -fp-model=strictand target a specific CPU with--mcpu. - Strict exception observability by default. FP exceptions (divide-by-zero,
overflow, underflow, inexact, invalid) are not observable through
fenv.hequivalents unless you compile with--fp-model=strict. The default (precise) treats exceptions as ignorable, which is what lets the optimizer e.g. hoist FP computations out of loops.
FP mode quick reference:
| Mode | What it does | When to use |
|---|---|---|
precise (default) | IEEE arithmetic, no unsafe rewrites, FMA on | General use |
strict | IEEE + observable exceptions + no FMA | Scientific code, reproducibility audits |
fast | Everything unsafe on (-ffast-math) | SIMD kernels, numerical code you’ve already validated |
@float_mode(fast) | Block-scoped fast-math | Hot inner loops only |
Text Encoding: Unicode 15.1 / UTF-8
Source Files
Kairo source files are UTF-8 only. No BOM. No other encoding is accepted.
The lexer validates UTF-8 byte sequences and rejects invalid sequences as hard
errors with a diagnostic pointing at the exact byte offset and offending bytes.
A BOM at the start of a .k file is an error.
String Internal Representation
Kairo’s string type is a 32-byte value type using UTF-8 encoding internally
with small string optimization (SSO). Strings up to 23 bytes are stored inline
without a heap allocation. Longer strings are heap-allocated.
- Codepoint indexing:
s[i]returns achar(4-byte Unicode scalar value) at codepoint positioni. This is O(1) amortized; the implementation uses a sparse breadcrumb cache to locate the nearest codepoint boundary and decodes from there. - Byte indexing:
s.bytes[i]returns a rawbyteat byte offseti. This is O(1) always but returns raw UTF-8 bytes, not characters. - Iteration:
for ch in syieldscharvalues (decoded codepoints) sequentially.
See Primitives for the full string API.
Identifiers
Kairo identifiers may contain any Unicode scalar value that has the ID_Start
or ID_Continue property (Unicode UAX #31), plus emoji and other
non-letter code points that UAX #31 excludes. This is an intentional extension
beyond the standard identifier grammar. Concretely:
- Letters and digits from any script: Latin, Arabic, Chinese, Cyrillic, Devanagari, etc. all valid.
- Emoji: valid.
- Combining marks: valid as continuation characters.
- Whitespace, control characters: never valid in identifiers.
Two identifiers are equal if and only if their sequences of scalar values are equal. Kairo does not perform Unicode normalization on identifiers. This matches the behavior of most modern languages. If you want normalization, normalize your source before feeding it to the compiler.
String Indexing and Grapheme Clusters
Indexing a string by integer (s[i]) returns the codepoint at position i,
not a grapheme cluster. For most text this is the same thing, but for combining
sequences and emoji with modifiers it is not:
"hello":s[1]returns'e'. Five codepoints, five elements.- A string with a combining sequence (e.g.,
e+ combining acute): the base character and the combining mark are separate codepoints and separate indices. - Emoji with ZWJ sequences (e.g., family emoji): multiple codepoints, multiple
elements.
len()returns the codepoint count, not the visual glyph count.
string.graphemes() returns an iterator of grapheme clusters per UAX #29 for
code that needs to operate on user-visible characters.
Conversions and C/C++ Interop
string provides explicit conversion methods. When calling C++ functions with
Kairo primitives, conversions to C++ string types are implicit:
| Conversion | Output | Use case |
|---|---|---|
as libcxx::string | UTF-8 std::string | Natural fit; Kairo strings are UTF-8 internally |
as libcxx::u16string | UTF-16 std::u16string | Windows WinAPI (LPWSTR), Java interop |
as libcxx::u32string | UTF-32 std::u32string | APIs expecting char32_t sequences |
as unsafe *i8 | Null-terminated UTF-8 | Passing to C functions expecting const char* |
See C/C++ Interop for the full FFI model.
What Kairo Does Not Do
- Locale-sensitive operations:
stringcomparison is binary (codepoint by codepoint). Locale-sensitive collation is not in the standard library. Use a third-party ICU binding if you need locale-aware sorting. - Encoding detection: Kairo does not detect or guess source file encoding. UTF-8, no BOM.
- Null termination:
stringis not null-terminated internally. Conversion methods for C interop add a null byte. Passing raw string data directly to a C function expecting a null-terminated string is undefined behavior.
ABI: Platform psABIs
Kairo follows the platform processor ABI for the target triple. It does not define its own ABI and does not need to because Kairo compiles to native object files via Clang’s backend. The relevant documents per platform:
x86-64 Linux / BSD / macOS
- System V AMD64 psABI v1.0 - calling convention, register usage, stack alignment (16-byte at call site), parameter passing, return value encoding, varargs layout, TLS model, ELF object format.
- Kairo respects the red zone (128 bytes below RSP) by default. Kernel code
should compile with
--kernel-mode. - SIMD types (
f32x4,f32x8, etc.) follow the psABI’s XMM/YMM/ZMM classification rules. Vectors passed on the stack are 16-byte aligned minimum.
AArch64 Linux
- ARM64 ELF psABI (AAPCS64) - 16 general-purpose argument registers (x0-x7 for integers, v0-v7 for floats/vectors), caller-saved x0-x17, callee-saved x19-x28.
- Pointer authentication is not enabled by default. Use
--mbranch-protectionto opt in.
macOS (Mach-O)
- Apple ARM64 ABI on Apple Silicon - extends AAPCS64. Stack pointer must be 16-byte aligned at all call sites. Clang enforces this.
- System V AMD64 on Intel Mac.
--macos-version-minis required if you target macOS older than the build machine SDK. Kairo does not guess a deployment target.
Windows (PE/COFF)
- Microsoft x64 calling convention - 4 register parameters (rcx, rdx, r8, r9 for integers; xmm0-xmm3 for floats), shadow space requirement (32 bytes), caller cleans stack.
- C++ ABI on Windows is MSVC. Set
--cxx-abi=msvcwhen interoperating with MSVC-compiled C++ code. The default is Itanium which is what Clang uses on Windows by default for Clang-compiled code.
WASM / WASI
- WebAssembly MVP + proposals per the flags you enable. WASI preview1 or
preview2 depending on
--wasi-version. The WASM calling convention is defined by the Wasm binary format spec; there is no psABI in the traditional sense.
Debug Info: DWARF 5
When compiled with -g / --debug-info, Kairo emits DWARF 5 by default.
DWARF 4 is available via --dwarf-version=4 for targets that require it
(e.g., older GDB versions, some embedded targets).
What Kairo emits:
DW_AT_languageis set toDW_LANG_C_plus_plus_14in Stage 0 (the current compiler transpiles Kairo to C++, so the DWARF reflects the C++ output). A properDW_LANG_Kairolanguage tag will be registered with the DWARF committee once the language spec stabilizes and the Stage 1 compiler (which emits LLVM IR directly) is complete.- Source file paths in DWARF use the paths as given to the compiler, resolved
relative to
--working-dirif set. Use--remap-path-prefixor--debug-prefix-mapto strip build-machine-specific paths for reproducible builds. - Line tables include column information (DWARF 5
DW_LNS_set_column), which allows debuggers to point at the specific expression within a line. - Split DWARF (
--split-dwarf) produces.dwofiles. This reduces link-time memory usage on large projects. The.dwofiles must be accessible at debug time (either in the same directory or viadebuginfod). - Type units are emitted for types defined in headers, reducing debug info size in multi-TU builds.
Comments and trivia in debug info:
The Kairo CST preserves comments and whitespace trivia. These are currently not emitted into DWARF (there’s no standard DWARF attribute for them). They are preserved in the CST for tooling use (formatters, LSP, etc.) but stripped before codegen.
Object Format: ELF / Mach-O / PE-COFF / WASM
Kairo emits the native object format for the target:
| Platform | Format | Notes |
|---|---|---|
| Linux, BSD, Android | ELF64 (ELF32 with --m32) | |
| macOS, iOS | Mach-O 64-bit | |
| Windows | PE/COFF | |
| WASM | WebAssembly binary | |
| Bare metal | ELF (usually) | depends on target ABI |
Kairo does not emit its own intermediate object format. The output of a compilation is a standard native object file that any conformant linker for that platform can process.
C++ Interop ABI
When Kairo calls into C++ via ffi "c++" import, the C++ code is compiled by
Clang using the same backend invocation. Kairo has full native interop for all
C++ standard versions through C++26.
The ABI boundary between Kairo-emitted code and C++-emitted code is:
- Itanium C++ ABI on all platforms except Windows MSVC targets.
- MSVC C++ ABI on Windows when
--cxx-abi=msvcis set. - Name mangling follows the Itanium C++ ABI scheme by default.
Kairo functions exported across the FFI boundary use C++ name mangling to
support overloading and seamless interop with Clang-compiled C++ code.
Explicit
ffi "C"linkage (no mangling) is only used for C interop or when specifically requested. - C++ exceptions propagate across the Kairo/C++ boundary normally because both
sides use the same unwinding tables (
.eh_frame/__eh_frame). Kairo functions are triviallynoexceptat the ABI level (panics are returned as tagged values, not thrown), so the unwinder can pass through them cleanly. - Virtual dispatch into C++ classes works without special handling because Kairo lowers method calls on C++ objects directly to Clang, which handles vtable dispatch normally.
- C++ smart pointers (
std::unique_ptr<T>,std::shared_ptr<T>,std::weak_ptr<T>) map to Kairo’s AMT-tracked equivalents (std::Unique<*T>,std::Shared<*T>,std::Weak<*T>) automatically. Nounsafeblock is required for FFI calls using smart pointers, references, move references, or value types. Only raw pointer parameters (T*,void*) require anunsafeblock.
See C/C++ Interop and Unsafe for the full FFI safety model.
Linker: LLD / Platform Linker
kld (Kairo’s linker driver) defaults to LLD for all targets. LLD’s
compatibility targets:
| Mode | Compatibility |
|---|---|
| ELF | GNU ld compatible |
| Mach-O | Apple ld64 compatible |
| PE/COFF | MSVC link.exe compatible |
| WASM | wasm-ld (LLD’s WASM mode) |
Kairo does not require LLD. You can use the platform linker by passing
--ld-flags to forward arguments directly, though the default flag translation
assumes LLD syntax. GNU ld is supported for ELF targets.
Reproducible Builds
Kairo supports reproducible builds when the following conditions are met:
- Same source, same flags, same
--target, same--cxx-std. --no-timestampsis set (strips all embedded timestamps).--remap-path-prefixis used to normalize absolute paths.SOURCE_DATE_EPOCHenvironment variable is set (Clang reads this for embedded timestamps in DWARF;--source-date-epochsets it for you).--lto=off(LTO can produce non-deterministic output across runs due to parallel section ordering in ThinLTO).- Same LLVM version. The backend is not byte-for-byte stable across LLVM major versions.
Kairo does not guarantee reproducibility across different --opt levels or when
--fp-model=fast is used (fast-math can produce different reassociations
depending on whether the optimizer’s heuristics fire).
What Kairo Explicitly Does Not Target
- POSIX.1-2024: Kairo’s standard library does not wrap POSIX. You call POSIX functions via FFI if you need them. The Kairo stdlib provides its own file, threading, and memory abstractions that map to the platform without requiring POSIX compliance from the target.
- C++ standard (ISO/IEC 14882): Kairo is not C++ and does not attempt C++ conformance. C++ interop works through Clang, not through implementing the C++ standard in Kairo.
- MISRA / CERT / SEI coding standards: These are auditable properties of specific programs, not language-level guarantees. Kairo does not ship a MISRA checker. You can run your Kairo-generated C++ through existing MISRA tools if your target requires this.
- ISO/IEC 9899 (C standard): Same as C++. C interop works through the FFI layer. Kairo is not a C compiler.
Version Pinning
| Standard | Version targeted |
|---|---|
| IEEE floating-point | 754-2019 |
| Unicode | 15.1 |
| UTF-8 encoding | RFC 3629 |
| DWARF debug info | 5 (default), 4 (opt-in) |
| x86-64 psABI | System V AMD64 v1.0 |
| AArch64 psABI | AAPCS64 (ARM IHI0055) |
| Itanium C++ ABI | Current (Clang tracks this) |
| C++ interop | All versions through C++26 |
| WASI | preview1 (default), preview2 (opt-in) |
| WebAssembly binary | MVP + explicit proposals |
| ELF | System V gABI + platform psABI |
| Mach-O | 64-bit Mach-O |
| PE/COFF | Microsoft PE32+ |