Skip to content

Compilation Pipeline

Every .nox script passes through a multi-stage pipeline before execution. Each stage progressively transforms the code and validates it, ensuring that by the time the VM begins execution, the bytecode is fully verified and optimized.

┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Source │ │ Token │ │ Abstract │ │ Validated│ │ Optimized│ │ Bytecode │
│ Text │───▶│ Stream │───▶│ Syntax │───▶│ AST │───▶│ AST │───▶│+Metadata │
│ (.nox) │ │ │ │ Tree │ │ │ │ │ │ │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
Stage 1: Stage 2: Stage 3: Stage 4: Stage 5: Stage 6:
Lexing Parsing Semantic Constant Compilation Execution
Validation Folding

The raw .nox text is fed into an ANTLR 4-generated lexer and parser.

Breaks the source text into a stream of typed tokens:

Source: int count = 42;
Tokens: [KEYWORD_INT] [IDENTIFIER:"count"] [OP_ASSIGN] [INT_LITERAL:42] [SEMICOLON]

Constructs an Abstract Syntax Tree (AST) from the token stream according to the NSL grammar. The AST preserves line and column metadata for every node, enabling precise error reporting later.

VariableDeclaration
╱ │ ╲
Type:int Name:"count" Init:IntLiteral(42)

If the source text violates the grammar rules, a SyntaxError is thrown (however we try and catch as many as possible). The error includes the exact line, column, and a snippet of the problematic code.

SyntaxError at line 7, column 15:
7 | int x = ;
^
Expected: expression

Before any bytecode is generated, a dedicated Validator walks the AST to catch logical errors that the parser cannot detect. This is the first layer of security, allowing us to catching bugs before they become runtime problems. This is also the driving layer behind many of the optimizations in the Interpretor, allowing us to gain maximum performance.

CheckExample Error
Type Mismatchesint x = "hello"; -> “Cannot assign string to int
Undeclared Variablesy = 10; (where y was never declared) -> “Undeclared variable y
Undeclared Functionsfoo(); (where foo doesn’t exist) -> “Undeclared function foo
Arity Mismatchesadd(1, 2, 3) when add takes 2 params -> “Expected 2 arguments, got 3”
UFCS ResolutionmyObj.nonexistent() -> “No method nonexistent found for type”
Schema ViolationsPoint p = {x: 1}; when Point requires x and y -> “Missing field y
Operator Type Checks"hello" + 42 -> “Operator + not defined for (string, int)“
Return Type ChecksFunction declared int but returns string
Invalid yield Usageyield used outside of any function context

Semantic errors include rich context for debugging:

SemanticError at line 14, column 25:
Type Mismatch: Operator '+' cannot be applied to 'string' and 'int'.
Use string interpolation `...` instead.
13 | int count = 10;
14 | string msg = "Total: " + count;
| ^

Static validation provides instant feedback. Whether a program is written by a human developer or generated by an automated system, validation failures are returned immediately with precise, actionable error messages. The author can fix their code without ever reaching runtime, allowing for a tight feedback loop that catches bugs early and dramatically reduces debugging time.

If validation passes, the Compiler transforms the validated AST into a flat array of 64-bit bytecode instructions plus associated metadata.

1. Register Allocation (Liveness Analysis)

Section titled “1. Register Allocation (Liveness Analysis)”

The compiler performs a linear scan to determine the exact lifespan of every variable:

Line 1: var x = 10 <- x starts (Reg 0)
Line 2: var y = 20 <- y starts (Reg 1)
Line 3: z = x + y <- x last used, y last used
Line 4: var a = z * 2 <- a starts -> assigned to Reg 0 (reused!)

Impact: A function with 100 sequential variables might only need 5 physical registers. This keeps stack frames minimal and cache-friendly.

All static data (strings, large numbers, type identifiers) are collected into a deduplicated Constant Pool:

Pool Index 0: "user_name" (used 50 times in code -> stored once)
Pool Index 1: 3.14159
Pool Index 2: "status"
Pool Index 3: TypeID(ApiConfig)

The VM has no concept of default parameters. The compiler handles them entirely:

// Definition:
void log(string msg, string level = "INFO") { ... }
// Call site:
log("Server started");
// Compiler emits bytecode equivalent to:
log("Server started", "INFO");

High-level constructs are flattened into jumps:

Source ConstructBytecode Translation
if/elseJIF (Jump if False) + JMP (unconditional jump)
whileJIF to exit + JMP back to condition
forUnrolled into while equivalent
foreachDesugared to index-based while with .length()
breakJMP to loop exit
continueJMP to loop start

The compiler builds a separate metadata table for try-catch blocks (see Error Handling):

Start PCEnd PCException TypeJump TargetMessage Register
100150ArrayOutOfBounds200Reg 5
100150TypeMismatch250Reg 6

For every reference-type variable, the compiler emits an explicit KILL_REF instruction at the end of its scope:

KILL_REF Reg 3 // Nulls out rMem[bp + 3], enabling GC

This prevents memory leaks from objects that are no longer needed.

The compiler produces a compiled unit containing:

ComponentPurpose
long[] bytecodeThe flat instruction array
Object[] constantPoolDeduplicated constants
ExceptionEntry[] exceptionTableTry-catch mapping table
FunctionMeta[] functionTablePer-function metadata (name, arity, register count)

The compiled bytecode is loaded into the VM and executed. See Memory Model and Instruction Set for execution details.

All errors — regardless of which stage catches them — are formatted with full source location context. This is achieved by preserving ANTLR’s line/column metadata through every transformation.

{
"error": {
"type": "SemanticError",
"message": "Type Mismatch: Cannot assign 'string' to 'int'.",
"details": {
"file": "data_processor.nox",
"line": 14,
"column": 25,
"snippet": "13 | string name = \"Alice\";\n14 | int x = name;\n | ^"
}
}
}

Possible fixes are suggested wherever applicable, helping developers and code generators self-correct:

Suggestion: Use `int x = name.length();` to get the string length as an integer,
or change the variable type to `string`.