Core Concepts

Early Development Notice

All MVP.Express projects are currently in active development (pre-1.0.0) and should not be used in production environments. APIs may change without notice, and breaking changes are expected until each project reaches version 1.0.0. We welcome early adopters and contributors, but please use at your own risk.

This guide introduces the three foundational technologies that power the MYRA Stack. Understanding these concepts will help you get the most out of our libraries and make informed architectural decisions.

Table of Contents

  1. Foreign Function & Memory API (FFM)
  2. Zero-Copy I/O
  3. Linux io_uring
  4. How MVP.Express Combines These

Foreign Function & Memory API (FFM)

What is FFM?

The Foreign Function & Memory (FFM) API is a modern Java feature (finalized in Java 22) that enables:

  1. Direct memory access outside the JVM heap (off-heap memory)
  2. Native function calls without JNI boilerplate
  3. Structured memory layouts with type-safe access

Previously, Java developers had to choose between:

  • JNI (complex, error-prone, requires native compilation)
  • Unsafe (internal API, no guarantees, may break)
  • ByteBuffer (limited, lacks structured access)

FFM provides a clean, safe, and performant alternative.

Why Off-Heap Memory Matters

Java’s garbage collector is excellent for general workloads, but for high-performance scenarios:

ChallengeImpactFFM Solution
GC pausesUnpredictable latency spikesOff-heap data avoids GC entirely
Object headers12-16 bytes overhead per objectDirect memory has zero overhead
Memory fragmentationInefficient memory usageContiguous allocation
Cache localityPoor performance for large datasetsControlled memory layout

FFM Key Concepts

1. Memory Segments

A MemorySegment represents a contiguous region of memory:

// Allocate 1KB of off-heap memory
try (Arena arena = Arena.ofConfined()) {
    MemorySegment segment = arena.allocate(1024);
    
    // Write directly to memory
    segment.set(ValueLayout.JAVA_LONG, 0, 42L);
    segment.set(ValueLayout.JAVA_DOUBLE, 8, 3.14159);
    
    // Read back
    long value = segment.get(ValueLayout.JAVA_LONG, 0);
    // Memory automatically freed when arena closes
}

2. Arenas (Lifecycle Management)

Arenas control when memory is released:

Arena TypeLifecycleThread SafetyUse Case
Arena.ofConfined()Explicit closeSingle threadRequest-scoped data
Arena.ofShared()Explicit closeMulti-threadShared buffers
Arena.ofAuto()GC-managedMulti-threadLong-lived pools
Arena.global()Never freedMulti-threadStatic data

3. Memory Layouts

Define structured data with compile-time safety:

// Define a C-like struct
MemoryLayout orderLayout = MemoryLayout.structLayout(
    ValueLayout.JAVA_LONG.withName("orderId"),
    ValueLayout.JAVA_LONG.withName("timestamp"),
    ValueLayout.JAVA_INT.withName("quantity"),
    ValueLayout.JAVA_INT.withName("price"),
    ValueLayout.JAVA_BYTE.withName("side")
);

// Create type-safe accessors
VarHandle orderIdHandle = orderLayout.varHandle(
    PathElement.groupElement("orderId")
);
VarHandle quantityHandle = orderLayout.varHandle(
    PathElement.groupElement("quantity")
);

// Access fields by name
orderIdHandle.set(segment, 0, 12345L);
int qty = (int) quantityHandle.get(segment, 0);

4. Native Function Calls (Downcalls)

Call C library functions directly:

// Get a handle to the native 'strlen' function
Linker linker = Linker.nativeLinker();
MethodHandle strlen = linker.downcallHandle(
    linker.defaultLookup().find("strlen").orElseThrow(),
    FunctionDescriptor.of(ValueLayout.JAVA_LONG, ValueLayout.ADDRESS)
);

// Call it with a native string
try (Arena arena = Arena.ofConfined()) {
    MemorySegment str = arena.allocateFrom("Hello, FFM!");
    long len = (long) strlen.invokeExact(str);  // Returns 11
}

MVP.Express FFM Usage

Roray FFM Utils builds on these primitives to provide:

  • MemorySegmentPool - GC-free buffer pooling with metrics
  • Utf8View - Zero-allocation string comparisons
  • DowncallFactory - Simplified native function binding
  • BinaryReader/BinaryWriter - Efficient structured I/O

Zero-Copy I/O

What is Zero-Copy?

Zero-copy means transferring data without copying it between buffers. In traditional I/O:

Application → JVM Heap → Direct Buffer → Kernel Buffer → Network
            (copy 1)      (copy 2)         (copy 3)

With zero-copy:

Application Buffer → Kernel → Network
                  (no copies)

Why Copies Are Expensive

Each memory copy has costs:

Cost TypeImpact
CPU cycles~1 cycle per byte copied
Memory bandwidthSaturates memory bus
Cache pollutionEvicts useful data from L1/L2/L3
LatencyAdds microseconds per copy

For a 64KB message with 3 copies:

  • 64KB × 3 copies = 192KB of memory traffic
  • At 50GB/s memory bandwidth = ~4μs overhead

Zero-Copy Techniques in MVP.Express

1. Flyweight Pattern (MyraCodec)

Instead of deserializing into objects, access data in-place:

// Traditional approach (allocates objects)
Order order = deserialize(buffer);  // Creates Order, String, etc.
long id = order.getId();
String symbol = order.getSymbol();

// Flyweight approach (zero allocation)
OrderFlyweight flyweight = new OrderFlyweight();
flyweight.wrap(segment, offset);
long id = flyweight.getOrderId();        // Direct memory read
Utf8View symbol = flyweight.getSymbol(); // Returns view, not String

The flyweight wraps the binary data and provides accessors that read directly from the underlying memory.

2. Buffer Pools (Roray FFM Utils)

Pre-allocate buffers and reuse them:

// Create a pool of 256 buffers, 4KB each
MemorySegmentPool pool = new MemorySegmentPool(4096, 256, 512);

// Acquire a buffer (no allocation after warmup)
MemorySegment buffer = pool.acquire();
try {
    // Use buffer for I/O
    processData(buffer);
} finally {
    // Return to pool (no deallocation)
    pool.release(buffer);
}

3. Registered Buffers (MyraTransport)

Pre-register buffers with the kernel to eliminate address validation:

// Without registered buffers:
// Kernel must validate buffer address on every I/O operation

// With registered buffers:
// Buffers validated once at registration, kernel uses index
RegisteredBufferPool pool = new RegisteredBufferPoolImpl(config);
backend.registerBufferPool(pool);

// I/O operations use buffer index instead of address
backend.receive(registeredBuffer, token);  // ~1.7x faster

4. View-Based String Handling

Compare strings without allocation:

Utf8View symbolView = flyweight.getSymbol();

// Zero-allocation comparison
if (symbolView.equalsString("AAPL")) {
    // Match found - no String objects created
}

// Only allocate when you really need a String
String symbol = symbolView.toString();  // Allocation happens here

Zero-Copy Best Practices

  1. Pool everything - Buffers, flyweights, views
  2. Avoid toString() - Use views for comparisons
  3. Size buffers appropriately - Match typical message sizes
  4. Align data - Memory alignment improves access speed
  5. Batch operations - Amortize any unavoidable copies

Linux io_uring

What is io_uring?

io_uring is a Linux kernel interface (5.1+) for asynchronous I/O that provides:

  • Batched system calls - Submit multiple operations in one syscall
  • True async I/O - Operations complete independently
  • Zero-copy receives - Kernel writes directly to user buffers
  • SQPOLL mode - Kernel thread polls for submissions (no syscalls)

Traditional I/O vs io_uring

Traditional (epoll + non-blocking):

For each operation:
  1. epoll_wait()     → syscall
  2. read()/write()   → syscall
  3. Handle EAGAIN    → retry

io_uring:

Queue N operations:
  1. io_uring_submit() → single syscall
  2. io_uring_wait()   → single syscall (gets M completions)

With SQPOLL mode, even the submit syscall is eliminated.

io_uring Architecture

┌────────────────────────────────────────────────────────┐
│                    User Space                          │
│  ┌──────────────┐        ┌──────────────┐             │
│  │ Submission   │        │ Completion   │             │
│  │ Queue (SQ)   │        │ Queue (CQ)   │             │
│  │              │        │              │             │
│  │ [SQE][SQE]   │        │ [CQE][CQE]   │             │
│  │ [SQE][SQE]   │        │ [CQE][CQE]   │             │
│  └──────┬───────┘        └──────▲───────┘             │
│         │                       │                      │
├─────────┼───────────────────────┼──────────────────────┤
│         ▼       Kernel          │                      │
│  ┌──────────────────────────────┴───────┐              │
│  │          io_uring Instance           │              │
│  │  ┌─────────────────────────────────┐ │              │
│  │  │  Registered Buffers (optional)  │ │              │
│  │  │  [buf0][buf1][buf2][buf3]...    │ │              │
│  │  └─────────────────────────────────┘ │              │
│  │                                      │              │
│  │  ┌─────────────────────────────────┐ │              │
│  │  │  SQPOLL Thread (optional)       │ │              │
│  │  │  Polls SQ without syscalls      │ │              │
│  │  └─────────────────────────────────┘ │              │
│  └──────────────────────────────────────┘              │
└────────────────────────────────────────────────────────┘

Key io_uring Features Used by MVP.Express

1. Registered Buffers

Pre-validated memory regions eliminate per-operation address checks:

// Register buffers with kernel at startup
backend.registerBufferPool(pool);

// All subsequent I/O uses buffer indices
// Kernel skips address validation → 1.7x throughput improvement

2. Batch Submission

Submit multiple operations with one syscall:

// Queue multiple operations
backend.receive(buffer1, token1);
backend.receive(buffer2, token2);
backend.send(buffer3, token3);

// Single syscall submits all
int submitted = backend.submitBatch();  // Returns 3

3. Multi-Shot Receive

Keep a receive operation active across multiple completions:

// Traditional: resubmit after each receive
while (running) {
    backend.receive(buffer, token);    // Submit
    backend.submitBatch();             // Syscall
    int n = backend.waitForCompletion(...);  // Get result
}

// Multi-shot: submit once, receive many
backend.receiveMultishot(buffer, token);  // Submit once
backend.submitBatch();                     // One syscall
while (running) {
    // Each completion automatically rearms the receive
    int n = backend.waitForCompletion(...);
}

4. SQPOLL Mode

Dedicated kernel thread polls for submissions:

TransportConfig config = TransportConfig.builder()
    .sqPollEnabled(true)          // Enable SQPOLL
    .sqPollCpuAffinity(3)         // Pin to CPU 3
    .sqPollIdleTimeout(500)       // Sleep after 500μs idle
    .build();

// With SQPOLL:
// - No syscall for io_uring_submit()
// - Kernel thread continuously polls SQ
// - Sub-microsecond submission latency

5. Zero-Copy Send

Avoid user→kernel copy for large payloads:

// Regular send: copies data to kernel
backend.send(buffer, length, token);

// Zero-copy send: kernel reads directly from user buffer
backend.sendZeroCopy(buffer, length, token);
// Note: Two completions - send complete + notification
// Buffer must not be modified until notification received

io_uring Performance Benefits

MetricTraditional (epoll)io_uringImprovement
Syscalls per op2-30.1-0.54-30x fewer
p50 latency25-50μs5-15μs3-5x lower
p99 latency100-500μs20-50μs5-10x lower
Throughput200-500K msg/s1-2M msg/s3-5x higher

How MVP.Express Combines These

The MVP.Express stack integrates FFM, zero-copy, and io_uring at every layer:

┌─────────────────────────────────────────────────────────────┐
│                     Your Application                         │
│  • Works with typed flyweights and views                    │
│  • No manual memory management                              │
│  • Zero-GC hot path                                         │
└────────────────────────┬────────────────────────────────────┘
┌────────────────────────▼────────────────────────────────────┐
│                      MyraCodec                               │
│  • Schema-driven code generation                            │
│  • Flyweight accessors (FFM MemorySegment)                  │
│  • Zero-copy encode/decode                                  │
└────────────────────────┬────────────────────────────────────┘
┌────────────────────────▼────────────────────────────────────┐
│                    MyraTransport                             │
│  • io_uring backend with registered buffers                 │
│  • Batch submission and multi-shot receive                  │
│  • SQPOLL for minimum latency                               │
└────────────────────────┬────────────────────────────────────┘
┌────────────────────────▼────────────────────────────────────┐
│                   Roray FFM Utils                            │
│  • MemorySegmentPool for buffer management                  │
│  • Utf8View for zero-alloc string handling                  │
│  • DowncallFactory for native bindings                      │
└─────────────────────────────────────────────────────────────┘

Data Flow Example

A message arriving from the network:

  1. MyraTransport receives data into a registered buffer (zero-copy from kernel)
  2. Roray FFM Utils provides the MemorySegment wrapping the buffer
  3. MyraCodec flyweight wraps the segment for structured access
  4. Application reads fields via flyweight (direct memory reads, no allocation)
  5. Application uses Utf8View.equalsString() for routing (no String allocation)
  6. Response written via flyweight directly to send buffer
  7. MyraTransport sends via io_uring (zero-copy to kernel with SEND_ZC)

Result: End-to-end processing with zero heap allocations, minimal syscalls, and no data copies.


Next Steps

Now that you understand the core concepts: