Core Concepts

Early Development Notice

All MVP.Express projects are currently in active development (pre-1.0.0) and should not be used in production environments. APIs may change without notice, and breaking changes are expected until each project reaches version 1.0.0. We welcome early adopters and contributors, but please use at your own risk.

This article introduces the three foundational technologies that power the MYRA Stack. Understanding these concepts will help you get the most out of our libraries and make informed architectural decisions.

Table of Contents

  1. Foreign Function & Memory API (FFM)
  2. Zero-Copy I/O
  3. Linux io_uring
  4. How MVP.Express Combines These

Foreign Function & Memory API (FFM)

What is FFM?

The Foreign Function & Memory (FFM) API is a modern Java feature (finalized in Java 22) that enables:

  1. Direct memory access outside the JVM heap (off-heap memory)
  2. Native function calls without JNI boilerplate
  3. Structured memory layouts with type-safe access

Previously, Java developers had to choose between:

  • JNI (complex, error-prone, requires native compilation)
  • Unsafe (internal API, no guarantees, may break)
  • ByteBuffer (limited, lacks structured access)

FFM provides a clean, safe, and performant alternative.

Why Off-Heap Memory Matters

Java’s garbage collector is excellent for general workloads, but for high-performance scenarios:

Challenge Impact FFM Solution
GC pauses Unpredictable latency spikes Off-heap data avoids GC entirely
Object headers 12-16 bytes overhead per object Direct memory has zero overhead
Memory fragmentation Inefficient memory usage Contiguous allocation
Cache locality Poor performance for large datasets Controlled memory layout

FFM Key Concepts

1. Memory Segments

A MemorySegment represents a contiguous region of memory:

// Allocate 1KB of off-heap memory
try (Arena arena = Arena.ofConfined()) {
    MemorySegment segment = arena.allocate(1024);
    
    // Write directly to memory
    segment.set(ValueLayout.JAVA_LONG, 0, 42L);
    segment.set(ValueLayout.JAVA_DOUBLE, 8, 3.14159);
    
    // Read back
    long value = segment.get(ValueLayout.JAVA_LONG, 0);
    // Memory automatically freed when arena closes
}

2. Arenas (Lifecycle Management)

Arenas control when memory is released:

Arena Type Lifecycle Thread Safety Use Case
Arena.ofConfined() Explicit close Single thread Request-scoped data
Arena.ofShared() Explicit close Multi-thread Shared buffers
Arena.ofAuto() GC-managed Multi-thread Long-lived pools
Arena.global() Never freed Multi-thread Static data

3. Memory Layouts

Define structured data with compile-time safety:

// Define a C-like struct
MemoryLayout orderLayout = MemoryLayout.structLayout(
    ValueLayout.JAVA_LONG.withName("orderId"),
    ValueLayout.JAVA_LONG.withName("timestamp"),
    ValueLayout.JAVA_INT.withName("quantity"),
    ValueLayout.JAVA_INT.withName("price"),
    ValueLayout.JAVA_BYTE.withName("side")
);

// Create type-safe accessors
VarHandle orderIdHandle = orderLayout.varHandle(
    PathElement.groupElement("orderId")
);
VarHandle quantityHandle = orderLayout.varHandle(
    PathElement.groupElement("quantity")
);

// Access fields by name
orderIdHandle.set(segment, 0, 12345L);
int qty = (int) quantityHandle.get(segment, 0);

4. Native Function Calls (Downcalls)

Call C library functions directly:

// Get a handle to the native 'strlen' function
Linker linker = Linker.nativeLinker();
MethodHandle strlen = linker.downcallHandle(
    linker.defaultLookup().find("strlen").orElseThrow(),
    FunctionDescriptor.of(ValueLayout.JAVA_LONG, ValueLayout.ADDRESS)
);

// Call it with a native string
try (Arena arena = Arena.ofConfined()) {
    MemorySegment str = arena.allocateFrom("Hello, FFM!");
    long len = (long) strlen.invokeExact(str);  // Returns 11
}

MVP.Express FFM Usage

Roray FFM Utils builds on these primitives to provide:

  • MemorySegmentPool - GC-free buffer pooling with metrics
  • Utf8View - Zero-allocation string comparisons
  • DowncallFactory - Simplified native function binding
  • BinaryReader/BinaryWriter - Efficient structured I/O

Zero-Copy I/O

What is Zero-Copy?

Zero-copy means transferring data without copying it between buffers. In traditional I/O:

Application → JVM Heap → Direct Buffer → Kernel Buffer → Network
            (copy 1)      (copy 2)         (copy 3)

With zero-copy:

Application Buffer → Kernel → Network
                  (no copies)

Why Copies Are Expensive

Each memory copy has costs:

Cost Type Impact
CPU cycles ~1 cycle per byte copied
Memory bandwidth Saturates memory bus
Cache pollution Evicts useful data from L1/L2/L3
Latency Adds microseconds per copy

For a 64KB message with 3 copies:

  • 64KB × 3 copies = 192KB of memory traffic
  • At 50GB/s memory bandwidth = ~4μs overhead

Zero-Copy Techniques in MVP.Express

1. Flyweight Pattern (MyraCodec)

Instead of deserializing into objects, access data in-place:

// Traditional approach (allocates objects)
Order order = deserialize(buffer);  // Creates Order, String, etc.
long id = order.getId();
String symbol = order.getSymbol();

// Flyweight approach (zero allocation)
OrderFlyweight flyweight = new OrderFlyweight();
flyweight.wrap(segment, offset);
long id = flyweight.getOrderId();        // Direct memory read
Utf8View symbol = flyweight.getSymbol(); // Returns view, not String

The flyweight wraps the binary data and provides accessors that read directly from the underlying memory.

2. Buffer Pools (Roray FFM Utils)

Pre-allocate buffers and reuse them:

// Create a pool of 256 buffers, 4KB each
MemorySegmentPool pool = new MemorySegmentPool(4096, 256, 512);

// Acquire a buffer (no allocation after warmup)
MemorySegment buffer = pool.acquire();
try {
    // Use buffer for I/O
    processData(buffer);
} finally {
    // Return to pool (no deallocation)
    pool.release(buffer);
}

3. Registered Buffers (MyraTransport)

Pre-register buffers with the kernel to eliminate address validation:

TransportConfig config = TransportConfig.builder()
    .registeredBuffers(TransportConfig.RegisteredBuffersConfig.builder()
        .numBuffers(256)
        .bufferSize(4096)
        .build())
    .build();

IoUringBackend backend = new IoUringBackend();
backend.initialize(config);

RegisteredBufferPool pool = TransportFactory.createBufferPool(config);
backend.registerBufferPool(pool);

RegisteredBuffer buffer = pool.acquire();
backend.receive(buffer, token);

4. View-Based String Handling

Compare strings without allocation:

Utf8View symbolView = flyweight.getSymbol();

// Zero-allocation comparison
if (symbolView.equalsString("AAPL")) {
    // Match found - no String objects created
}

// Only allocate when you really need a String
String symbol = symbolView.toString();  // Allocation happens here

Zero-Copy Best Practices

  1. Pool everything - Buffers, flyweights, views
  2. Avoid toString() - Use views for comparisons
  3. Size buffers appropriately - Match typical message sizes
  4. Align data - Memory alignment improves access speed
  5. Batch operations - Amortize any unavoidable copies

Linux io_uring

What is io_uring?

io_uring is a Linux kernel interface (5.1+) for asynchronous I/O that provides:

  • Batched system calls - Submit multiple operations in one syscall
  • True async I/O - Operations complete independently
  • Zero-copy receives - Kernel writes directly to user buffers
  • SQPOLL mode - Kernel thread polls for submissions (no syscalls)

Traditional I/O vs io_uring

Traditional (epoll + non-blocking):

For each operation:
  1. epoll_wait()     → syscall
  2. read()/write()   → syscall
  3. Handle EAGAIN    → retry

io_uring:

Queue N operations:
  1. io_uring_submit() → single syscall
  2. io_uring_wait()   → single syscall (gets M completions)

With SQPOLL mode, even the submit syscall is eliminated.

io_uring Architecture

User Space

Submission Queue (SQ)

SQE SQE
SQE SQE

Completion Queue (CQ)

CQE CQE
CQE CQE
submit
Kernel
complete
io_uring Instance

Registered Buffers (optional)

buf0 buf1 buf2 buf3 ...

SQPOLL Thread (optional)

Polls SQ without syscalls.

Key io_uring Features Used by MVP.Express

1. Registered Buffers

Pre-validated memory regions eliminate per-operation address checks:

IoUringBackend backend = ...;

// Register buffers with kernel at startup
backend.registerBufferPool(pool);

// All subsequent I/O uses registered buffers
// Kernel skips address validation → 1.7x throughput improvement

2. Batch Submission

Submit multiple operations with one syscall:

IoUringBackend backend = ...;
RegisteredBuffer buffer1 = ...;
RegisteredBuffer buffer2 = ...;
RegisteredBuffer buffer3 = ...;

// Queue multiple operations
backend.receive(buffer1, token1);
backend.receive(buffer2, token2);
backend.send(buffer3, token3);

// Single syscall submits all
int submitted = backend.submitBatch(); // Returns 3

3. Multi-Shot Receive

Keep a receive operation active across multiple completions:

IoUringBackend backend = ...;
RegisteredBuffer buffer = ...;

// Traditional: resubmit after each receive
while (running) {
    backend.receive(buffer, token);      // Queue
    backend.submitBatch();               // Syscall
    int n = backend.waitForCompletion(1000, (tok, res) -> {}); // Handle
}

// Multi-shot: submit once, receive many
backend.receiveMultishot(buffer, token); // Queue once
backend.submitBatch();                   // One syscall
while (running) {
    int n = backend.waitForCompletion(1000, (tok, res) -> {});
}

4. SQPOLL Mode

Dedicated kernel thread polls for submissions:

TransportConfig config = TransportConfig.builder()
    .sqPollEnabled(true)           // Enable SQPOLL
    .sqPollCpuAffinity(3)          // Pin to CPU 3
    .sqPollIdleTimeout(500)        // 500 microseconds idle
    .build();

// With SQPOLL:
// - No syscall for io_uring_submit()
// - Kernel thread continuously polls SQ
// - Sub-microsecond submission latency

5. Zero-Copy Send

Avoid user→kernel copy for large payloads:

IoUringBackend backend = ...;
RegisteredBuffer buffer = ...;

// Regular send: copies data to kernel
backend.send(buffer, token);

// Zero-copy send: kernel reads directly from user buffer
backend.sendZeroCopy(buffer, token);
// Note: Two completions - send complete + notification
// Buffer must not be modified until notification received

io_uring Performance Benefits

Metric Traditional (epoll) io_uring Improvement
Syscalls per op 2-3 0.1-0.5 4-30x fewer
p50 latency 25-50μs 5-15μs 3-5x lower
p99 latency 100-500μs 20-50μs 5-10x lower
Throughput 200-500K msg/s 1-2M msg/s 3-5x higher

How MVP.Express Combines These

The MVP.Express stack integrates FFM, zero-copy, and io_uring at every layer:

Your Application
  • Works with typed flyweights and views
  • No manual memory management
  • Zero-GC hot path
MyraCodec
  • Schema-driven code generation
  • Flyweight accessors (FFM MemorySegment)
  • Zero-copy encode/decode
MyraTransport
  • io_uring backend with registered buffers
  • Batch submission and multi-shot receive
  • SQPOLL for minimum latency
Roray FFM Utils
  • MemorySegmentPool for buffer management
  • Utf8View for zero-alloc string handling
  • DowncallFactory for native bindings

Data Flow Example

A message arriving from the network:

  1. MyraTransport receives data into a registered buffer (zero-copy from kernel)
  2. Roray FFM Utils provides the MemorySegment wrapping the buffer
  3. MyraCodec flyweight wraps the segment for structured access
  4. Application reads fields via flyweight (direct memory reads, no allocation)
  5. Application uses Utf8View.equalsString() for routing (no String allocation)
  6. Response written via flyweight directly to send buffer
  7. MyraTransport sends via io_uring (zero-copy to kernel with SEND_ZC)

Result: End-to-end processing with zero heap allocations, minimal syscalls, and no data copies.


Next Steps

Now that you understand the core concepts: