Skip to content

Instruction Set

One complex instruction (a Kotlin function call) is always faster than ten simple instructions (ten VM loop iterations).

In a JVM-based VM, the fetch-decode-execute loop has a fixed overhead. Every iteration of the while(running) loop costs CPU cycles just to read an instruction and dispatch to the right handler. The strategy is to compress as much work as possible into single instructions, minimizing the time spent in the dispatch loop and maximizing the time spent in HotSpot-optimized Kotlin code.

Example obj.a += 1:

ApproachOpcodes UsedVM Loop Iterations
Naive VMGET_FIELDLDCADDSET_FIELD4
Nox VMHMOD [SubOp: ADD_INT]1

Every instruction is encoded as a single long (64 bits) with a fixed layout:

63 56 55 48 47 32 31 16 15 0
┌───────────┬───────────┬─────────────────┬─────────────────┬─────────────────┐
│ Opcode │ Sub-Opcode│ Operand A │ Operand B │ Operand C │
│ (8 bits) │ (8 bits) │ (16 bits) │ (16 bits) │ (16 bits) │
└───────────┴───────────┴─────────────────┴─────────────────┴─────────────────┘
FieldBitsPurpose
Opcode63–56 (8 bits)The primary operation (e.g., IADD, MOV, CALL). Supports up to 256 unique opcodes.
Sub-Opcode55–48 (8 bits)Secondary intent for “super-instructions” (e.g., ADD_INT, SET_STRING). Unused by standard opcodes.
Operand A47–32 (16 bits)Typically the destination register. Address space: 0–65,535.
Operand B31–16 (16 bits)Typically source 1 or a constant pool index.
Operand C15–0 (16 bits)Typically source 2 or additional data.

Each operand can carry flags to modify its interpretation:

  • Global flag [G]: Read from global memory (gMem) instead of the local frame
  • Constant flag [K]: The operand is a constant pool index, not a register

Arithmetic and comparison opcodes use the SubOp field to encode how operand C should be interpreted:

SubOpConstantMeaning
REG_REG0x00Default: C is a register index
REG_IMM0x01C is a 16-bit unsigned immediate (0-65535)
REG_POOL0x02C is a constant pool index

This allows the compiler to bake constant operands directly into instructions, eliminating the preceding LDI/LDC instruction. For example, x + 10 emits a single IADD [REG_IMM] dest, x, 10 instead of LDI tmp, 10 followed by IADD dest, x, tmp.

The optimization applies to: IADD-IMOD, DADD-DMOD, IEQ-IGE, DEQ-DGE, AND, OR, BAND, BOR, BXOR, SHL, SHR, USHR.

For double-precision operations, only REG_POOL is used (doubles don’t fit in 16 bits).

OpcodeSyntaxDescription
IADDIADD A, B, CInteger add: pMem[A] = pMem[B] + pMem[C]
ISUBISUB A, B, CInteger subtract: pMem[A] = pMem[B] - pMem[C]
IMULIMUL A, B, CInteger multiply
IDIVIDIV A, B, CInteger divide (throws on division by zero)
IMODIMOD A, B, CInteger modulo
INEGINEG A, BInteger negate: pMem[A] = -pMem[B]
DADDDADD A, B, CDouble add (operands decoded via longBitsToDouble)
DSUBDSUB A, B, CDouble subtract
DMULDMUL A, B, CDouble multiply
DDIVDDIV A, B, CDouble divide
DMODDMOD A, B, CDouble modulo
DNEGDNEG A, BDouble negate
ANDAND A, B, CLogical AND (boolean)
OROR A, B, CLogical OR (boolean)
NOTNOT A, BLogical NOT: pMem[A] = pMem[B] == 0 ? 1 : 0
OpcodeSyntaxDescription
IEQIEQ A, B, CInteger equals: pMem[A] = (pMem[B] == pMem[C]) ? 1 : 0
INEINE A, B, CInteger not-equals
ILTILT A, B, CInteger less-than
ILEILE A, B, CInteger less-than-or-equal
IGTIGT A, B, CInteger greater-than
IGEIGE A, B, CInteger greater-than-or-equal
DEQDEQ A, B, CDouble equals
DNEDNE A, B, CDouble not-equals
DLTDLT A, B, CDouble less-than
DLEDLE A, B, CDouble less-than-or-equal
DGTDGT A, B, CDouble greater-than
DGEDGE A, B, CDouble greater-than-or-equal
SEQSEQ A, B, CString value equals: pMem[A] = rMem[B].equals(rMem[C]) ? 1 : 0
SNESNE A, B, CString not-equals
OpcodeSyntaxDescription
MOVMOV A, BCopy primitive: pMem[A] = pMem[B]
MOVRMOVR A, BCopy reference: rMem[A] = rMem[B]
LDCLDC A, PoolIdxLoad constant from the constant pool into pMem[A] or rMem[A]
LDILDI A, ImmLoad immediate (small integer that fits in 16 bits)
KILL_REFKILL_REF ANull out rMem[A] to enable garbage collection
OpcodeSyntaxDescription
JMPJMP targetUnconditional jump: pc = target
JIFJIF A, targetJump if false: if (pMem[A] == 0) pc = target
JITJIT A, targetJump if true: if (pMem[A] != 0) pc = target
CALLCALL [subOp] funcId, primArgStart, refArgStartPush frame, slide bp and bpRef, jump to function. subOp indicates return type (0=REF, 1=PRIM, 2=VOID).
RETRET [subOp] CReturns from function. SubOp encodes the value type: VOID (0x20), INT (0x21), DBL (0x22), BOOL (0x23), REF (0x24). Operand C holds the source register.
OpcodeSyntaxDescription
SCALLSCALL [subOp] funcId, primArgStart, refArgStartSystem call via FFI. subOp determines result type: primitive (1) or reference (0). Result overwrites the first argument register (primArgStart or refArgStart).
OpcodeSyntaxDescription
NEW_OBJNEW_OBJ ACreates a new empty NoxObject and stores it in rMem[A].
OBJ_SETOBJ_SET A, keyId, valSets property pool[keyId] on object rMem[A] to val.
CAST_STRUCTCAST_STRUCT [SubOp] A, B, typeIdValidates rMem[B] against TypeDescriptor at pool[typeId], storing result in rMem[A]. If SubOp=1, validates array of structs.
OpcodeSyntaxDescription
HMODHMOD [SubOp] A, key, valHost Modify: modify a property on a host object
HACCHACC [SubOp] A, B, keyHost Access: read a property from a host object
AGET_IDXAGET_IDX [SubOp] A, B, CGet element at index C from collection B, store in A
AGET_PATHAGET_PATH [SubOp] A, B, pathTraverse a cached static path on object B, store in A
ASET_IDXASET_IDX [SubOp] A, B, CSet element at index B in collection A to value C
SCONCATSCONCAT A, B, CString concat: rMem[A] = rMem[B] + rMem[C]
OpcodeSyntaxDescription
YIELDYIELD [subOp] ASend intermediate output via RuntimeContext.yield(). Uses the same type tags as RET: INT (0x21), DBL (0x22), BOOL (0x23), REF (0x24).
OpcodeSyntaxDescription
IINCIINC AInteger increment: pMem[A] = pMem[A] + 1
IDECIDEC AInteger decrement: pMem[A] = pMem[A] - 1
IINCNIINCN A, BInteger increment by N: pMem[A] = pMem[A] + pMem[B]
IDECNIDECN A, BInteger decrement by N: pMem[A] = pMem[A] - pMem[B]
DINCDINC ADouble increment by 1.0
DDECDDEC ADouble decrement by 1.0
DINCNDINCN A, BDouble increment by N
DDECNDDECN A, BDouble decrement by N

These enable single-instruction compilation of i++, i--, i += N, and i -= N.

OpcodeSyntaxDescription
BANDBAND A, B, CBitwise AND: pMem[A] = pMem[B] & pMem[C]
BORBOR A, B, CBitwise OR: pMem[A] = pMem[B] | pMem[C]
BXORBXOR A, B, CBitwise XOR: pMem[A] = pMem[B] ^ pMem[C]
BNOTBNOT A, BBitwise NOT: pMem[A] = ~pMem[B]
SHLSHL A, B, CShift left: pMem[A] = pMem[B] << pMem[C]
SHRSHR A, B, CArithmetic shift right: pMem[A] = pMem[B] >> pMem[C]
USHRUSHR A, B, CUnsigned shift right: pMem[A] = pMem[B] >>> pMem[C]
OpcodeSyntaxDescription
THROWTHROW AThrows an exception with the message from register A
KILLKILLTerminates execution (used for resource guard exceptions)

NOTE: KILL is a special instruction that is used to terminate execution of the current thread. It is used for resource guard exceptions and is not intended to be used by the programmer. It is inserted by the compiler at the end of a resource guard catch block that is intended to terminate execution.

The 16-bit operand fields cannot hold large values (strings, big numbers, doubles). These are stored in a separate Constant Pool, an array generated at compile time.

Index 0: "user_name" (String)
Index 1: 3.14159265358979 (Double)
Index 2: "status" (String)
Index 3: 100000 (Large Integer)
Index 4: TypeDescriptor("ApiConfig") (Struct Schema)

If the script references "user_name" in 50 different places, the constant pool stores it once. All 50 instructions reference the same pool index.

Impact: Dramatically reduces the memory footprint of the bytecode.

LDC R3, #5 // Load constant pool entry 5 into register 3
Execution:
1. Read pool index (5)
2. Look up ConstantPool[5] -> "hello world"
3. Write to rMem[bp + 3] = "hello world"

The VM decodes instructions using bitwise operations for maximum speed:

val inst = bytecode[pc]
val opcode = ((inst ushr 56) and 0xFF).toInt()
val subOp = ((inst ushr 48) and 0xFF).toInt()
val opA = ((inst ushr 32) and 0xFFFF).toInt()
val opB = ((inst ushr 16) and 0xFFFF).toInt()
val opC = (inst and 0xFFFF).toInt()

This is a single array read followed by five bit-shift operations, among the fastest operations a CPU can perform.

The VM’s core is a while loop with a switch dispatch:

while (running) {
val inst = bytecode[pc++]
val opcode = ((inst ushr 56) and 0xFF).toInt()
// Watchdog: instruction counter
if (++instructionCount > MAX_INSTRUCTIONS) {
throw QuotaExceededException()
}
when (opcode) {
IADD -> { /* ... */ }
CALL -> { /* ... */ }
SCALL -> { /* ... */ }
HMOD -> { /* ... */ }
// ...
}
}

The JVM’s JIT compiler aggressively optimizes this pattern, inlining handler code and eliminating bounds checks where safe.