Compilation Pipeline
Compilation Pipeline
Section titled “Compilation Pipeline”Pipeline Overview
Section titled “Pipeline Overview”Every .nox script passes through a multi-stage pipeline before execution. Each stage progressively transforms the code and validates it, ensuring that by the time the VM begins execution, the bytecode is fully verified and optimized.
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Source │ │ Token │ │ Abstract │ │ Validated│ │ Optimized│ │ Bytecode │ │ Text │───▶│ Stream │───▶│ Syntax │───▶│ AST │───▶│ AST │───▶│+Metadata │ │ (.nox) │ │ │ │ Tree │ │ │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ Stage 1: Stage 2: Stage 3: Stage 4: Stage 5: Stage 6: Lexing Parsing Semantic Constant Compilation Execution Validation FoldingStage 1: Parsing (ANTLR 4)
Section titled “Stage 1: Parsing (ANTLR 4)”The raw .nox text is fed into an ANTLR 4-generated lexer and parser.
Breaks the source text into a stream of typed tokens:
Source: int count = 42;Tokens: [KEYWORD_INT] [IDENTIFIER:"count"] [OP_ASSIGN] [INT_LITERAL:42] [SEMICOLON]Parser
Section titled “Parser”Constructs an Abstract Syntax Tree (AST) from the token stream according to the NSL grammar. The AST preserves line and column metadata for every node, enabling precise error reporting later.
VariableDeclaration ╱ │ ╲ Type:int Name:"count" Init:IntLiteral(42)Error: SyntaxError
Section titled “Error: SyntaxError”If the source text violates the grammar rules, a SyntaxError is thrown (however we try and catch as many as possible). The error includes the exact line, column, and a snippet of the problematic code.
SyntaxError at line 7, column 15: 7 | int x = ; ^ Expected: expressionStage 2: Semantic Validation
Section titled “Stage 2: Semantic Validation”Before any bytecode is generated, a dedicated Validator walks the AST to catch logical errors that the parser cannot detect. This is the first layer of security, allowing us to catching bugs before they become runtime problems. This is also the driving layer behind many of the optimizations in the Interpretor, allowing us to gain maximum performance.
Checks Performed
Section titled “Checks Performed”| Check | Example Error |
|---|---|
| Type Mismatches | int x = "hello"; -> “Cannot assign string to int” |
| Undeclared Variables | y = 10; (where y was never declared) -> “Undeclared variable y” |
| Undeclared Functions | foo(); (where foo doesn’t exist) -> “Undeclared function foo” |
| Arity Mismatches | add(1, 2, 3) when add takes 2 params -> “Expected 2 arguments, got 3” |
| UFCS Resolution | myObj.nonexistent() -> “No method nonexistent found for type” |
| Schema Violations | Point p = {x: 1}; when Point requires x and y -> “Missing field y” |
| Operator Type Checks | "hello" + 42 -> “Operator + not defined for (string, int)“ |
| Return Type Checks | Function declared int but returns string |
Invalid yield Usage | yield used outside of any function context |
Error: SemanticError
Section titled “Error: SemanticError”Semantic errors include rich context for debugging:
SemanticError at line 14, column 25: Type Mismatch: Operator '+' cannot be applied to 'string' and 'int'. Use string interpolation `...` instead.
13 | int count = 10; 14 | string msg = "Total: " + count; | ^Why This Matters
Section titled “Why This Matters”Static validation provides instant feedback. Whether a program is written by a human developer or generated by an automated system, validation failures are returned immediately with precise, actionable error messages. The author can fix their code without ever reaching runtime, allowing for a tight feedback loop that catches bugs early and dramatically reduces debugging time.
Stage 3: Compilation
Section titled “Stage 3: Compilation”If validation passes, the Compiler transforms the validated AST into a flat array of 64-bit bytecode instructions plus associated metadata.
The Compiler’s Responsibilities
Section titled “The Compiler’s Responsibilities”1. Register Allocation (Liveness Analysis)
Section titled “1. Register Allocation (Liveness Analysis)”The compiler performs a linear scan to determine the exact lifespan of every variable:
Line 1: var x = 10 <- x starts (Reg 0)Line 2: var y = 20 <- y starts (Reg 1)Line 3: z = x + y <- x last used, y last usedLine 4: var a = z * 2 <- a starts -> assigned to Reg 0 (reused!)Impact: A function with 100 sequential variables might only need 5 physical registers. This keeps stack frames minimal and cache-friendly.
2. Constant Pool Generation
Section titled “2. Constant Pool Generation”All static data (strings, large numbers, type identifiers) are collected into a deduplicated Constant Pool:
Pool Index 0: "user_name" (used 50 times in code -> stored once)Pool Index 1: 3.14159Pool Index 2: "status"Pool Index 3: TypeID(ApiConfig)3. Default Parameter Injection
Section titled “3. Default Parameter Injection”The VM has no concept of default parameters. The compiler handles them entirely:
// Definition:void log(string msg, string level = "INFO") { ... }
// Call site:log("Server started");
// Compiler emits bytecode equivalent to:log("Server started", "INFO");4. Control Flow Linearization
Section titled “4. Control Flow Linearization”High-level constructs are flattened into jumps:
| Source Construct | Bytecode Translation |
|---|---|
if/else | JIF (Jump if False) + JMP (unconditional jump) |
while | JIF to exit + JMP back to condition |
for | Unrolled into while equivalent |
foreach | Desugared to index-based while with .length() |
break | JMP to loop exit |
continue | JMP to loop start |
5. Exception Table Construction
Section titled “5. Exception Table Construction”The compiler builds a separate metadata table for try-catch blocks (see Error Handling):
| Start PC | End PC | Exception Type | Jump Target | Message Register |
|---|---|---|---|---|
| 100 | 150 | ArrayOutOfBounds | 200 | Reg 5 |
| 100 | 150 | TypeMismatch | 250 | Reg 6 |
6. Scope Cleanup (KILL_REF)
Section titled “6. Scope Cleanup (KILL_REF)”For every reference-type variable, the compiler emits an explicit KILL_REF instruction at the end of its scope:
KILL_REF Reg 3 // Nulls out rMem[bp + 3], enabling GCThis prevents memory leaks from objects that are no longer needed.
Compilation Output
Section titled “Compilation Output”The compiler produces a compiled unit containing:
| Component | Purpose |
|---|---|
long[] bytecode | The flat instruction array |
Object[] constantPool | Deduplicated constants |
ExceptionEntry[] exceptionTable | Try-catch mapping table |
FunctionMeta[] functionTable | Per-function metadata (name, arity, register count) |
Stage 4: Execution
Section titled “Stage 4: Execution”The compiled bytecode is loaded into the VM and executed. See Memory Model and Instruction Set for execution details.
Error Reporting Across Stages
Section titled “Error Reporting Across Stages”All errors — regardless of which stage catches them — are formatted with full source location context. This is achieved by preserving ANTLR’s line/column metadata through every transformation.
{ "error": { "type": "SemanticError", "message": "Type Mismatch: Cannot assign 'string' to 'int'.", "details": { "file": "data_processor.nox", "line": 14, "column": 25, "snippet": "13 | string name = \"Alice\";\n14 | int x = name;\n | ^" } }}Possible fixes are suggested wherever applicable, helping developers and code generators self-correct:
Suggestion: Use `int x = name.length();` to get the string length as an integer, or change the variable type to `string`.