Core Concepts
This guide introduces the three foundational technologies that power the MYRA Stack. Understanding these concepts will help you get the most out of our libraries and make informed architectural decisions.
Table of Contents
Foreign Function & Memory API (FFM)
What is FFM?
The Foreign Function & Memory (FFM) API is a modern Java feature (finalized in Java 22) that enables:
- Direct memory access outside the JVM heap (off-heap memory)
- Native function calls without JNI boilerplate
- Structured memory layouts with type-safe access
Previously, Java developers had to choose between:
- JNI (complex, error-prone, requires native compilation)
- Unsafe (internal API, no guarantees, may break)
- ByteBuffer (limited, lacks structured access)
FFM provides a clean, safe, and performant alternative.
Why Off-Heap Memory Matters
Java’s garbage collector is excellent for general workloads, but for high-performance scenarios:
| Challenge | Impact | FFM Solution |
|---|---|---|
| GC pauses | Unpredictable latency spikes | Off-heap data avoids GC entirely |
| Object headers | 12-16 bytes overhead per object | Direct memory has zero overhead |
| Memory fragmentation | Inefficient memory usage | Contiguous allocation |
| Cache locality | Poor performance for large datasets | Controlled memory layout |
FFM Key Concepts
1. Memory Segments
A MemorySegment represents a contiguous region of memory:
// Allocate 1KB of off-heap memory
try
2. Arenas (Lifecycle Management)
Arenas control when memory is released:
| Arena Type | Lifecycle | Thread Safety | Use Case |
|---|---|---|---|
Arena.ofConfined() | Explicit close | Single thread | Request-scoped data |
Arena.ofShared() | Explicit close | Multi-thread | Shared buffers |
Arena.ofAuto() | GC-managed | Multi-thread | Long-lived pools |
Arena.global() | Never freed | Multi-thread | Static data |
3. Memory Layouts
Define structured data with compile-time safety:
// Define a C-like struct
MemoryLayout orderLayout ;
// Create type-safe accessors
VarHandle orderIdHandle ;
VarHandle quantityHandle ;
// Access fields by name
orderIdHandle.;
int qty ;
4. Native Function Calls (Downcalls)
Call C library functions directly:
// Get a handle to the native 'strlen' function
Linker linker ;
MethodHandle strlen ;
// Call it with a native string
try
MVP.Express FFM Usage
Roray FFM Utils builds on these primitives to provide:
MemorySegmentPool- GC-free buffer pooling with metricsUtf8View- Zero-allocation string comparisonsDowncallFactory- Simplified native function bindingBinaryReader/BinaryWriter- Efficient structured I/O
Zero-Copy I/O
What is Zero-Copy?
Zero-copy means transferring data without copying it between buffers. In traditional I/O:
Application → JVM Heap → Direct Buffer → Kernel Buffer → Network
(copy 1) (copy 2) (copy 3)
With zero-copy:
Application Buffer → Kernel → Network
(no copies)
Why Copies Are Expensive
Each memory copy has costs:
| Cost Type | Impact |
|---|---|
| CPU cycles | ~1 cycle per byte copied |
| Memory bandwidth | Saturates memory bus |
| Cache pollution | Evicts useful data from L1/L2/L3 |
| Latency | Adds microseconds per copy |
For a 64KB message with 3 copies:
- 64KB × 3 copies = 192KB of memory traffic
- At 50GB/s memory bandwidth = ~4μs overhead
Zero-Copy Techniques in MVP.Express
1. Flyweight Pattern (MyraCodec)
Instead of deserializing into objects, access data in-place:
// Traditional approach (allocates objects)
Order order ; // Creates Order, String, etc.
long id ;
String symbol ;
// Flyweight approach (zero allocation)
OrderFlyweight flyweight ;
flyweight.;
long id ; // Direct memory read
Utf8View symbol ; // Returns view, not String
The flyweight wraps the binary data and provides accessors that read directly from the underlying memory.
2. Buffer Pools (Roray FFM Utils)
Pre-allocate buffers and reuse them:
// Create a pool of 256 buffers, 4KB each
MemorySegmentPool pool ;
// Acquire a buffer (no allocation after warmup)
MemorySegment buffer ;
try finally
3. Registered Buffers (MyraTransport)
Pre-register buffers with the kernel to eliminate address validation:
// Without registered buffers:
// Kernel must validate buffer address on every I/O operation
// With registered buffers:
// Buffers validated once at registration, kernel uses index
RegisteredBufferPool pool ;
backend.;
// I/O operations use buffer index instead of address
backend.; // ~1.7x faster
4. View-Based String Handling
Compare strings without allocation:
Utf8View symbolView ;
// Zero-allocation comparison
if
// Only allocate when you really need a String
String symbol ; // Allocation happens here
Zero-Copy Best Practices
- Pool everything - Buffers, flyweights, views
- Avoid toString() - Use views for comparisons
- Size buffers appropriately - Match typical message sizes
- Align data - Memory alignment improves access speed
- Batch operations - Amortize any unavoidable copies
Linux io_uring
What is io_uring?
io_uring is a Linux kernel interface (5.1+) for asynchronous I/O that provides:
- Batched system calls - Submit multiple operations in one syscall
- True async I/O - Operations complete independently
- Zero-copy receives - Kernel writes directly to user buffers
- SQPOLL mode - Kernel thread polls for submissions (no syscalls)
Traditional I/O vs io_uring
Traditional (epoll + non-blocking):
For each operation:
1. epoll_wait() → syscall
2. read()/write() → syscall
3. Handle EAGAIN → retry
io_uring:
Queue N operations:
1. io_uring_submit() → single syscall
2. io_uring_wait() → single syscall (gets M completions)
With SQPOLL mode, even the submit syscall is eliminated.
io_uring Architecture
┌────────────────────────────────────────────────────────┐
│ User Space │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Submission │ │ Completion │ │
│ │ Queue (SQ) │ │ Queue (CQ) │ │
│ │ │ │ │ │
│ │ [SQE][SQE] │ │ [CQE][CQE] │ │
│ │ [SQE][SQE] │ │ [CQE][CQE] │ │
│ └──────┬───────┘ └──────▲───────┘ │
│ │ │ │
├─────────┼───────────────────────┼──────────────────────┤
│ ▼ Kernel │ │
│ ┌──────────────────────────────┴───────┐ │
│ │ io_uring Instance │ │
│ │ ┌─────────────────────────────────┐ │ │
│ │ │ Registered Buffers (optional) │ │ │
│ │ │ [buf0][buf1][buf2][buf3]... │ │ │
│ │ └─────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌─────────────────────────────────┐ │ │
│ │ │ SQPOLL Thread (optional) │ │ │
│ │ │ Polls SQ without syscalls │ │ │
│ │ └─────────────────────────────────┘ │ │
│ └──────────────────────────────────────┘ │
└────────────────────────────────────────────────────────┘
Key io_uring Features Used by MVP.Express
1. Registered Buffers
Pre-validated memory regions eliminate per-operation address checks:
// Register buffers with kernel at startup
backend.;
// All subsequent I/O uses buffer indices
// Kernel skips address validation → 1.7x throughput improvement
2. Batch Submission
Submit multiple operations with one syscall:
// Queue multiple operations
backend.;
backend.;
backend.;
// Single syscall submits all
int submitted ; // Returns 3
3. Multi-Shot Receive
Keep a receive operation active across multiple completions:
// Traditional: resubmit after each receive
while
// Multi-shot: submit once, receive many
backend.; // Submit once
backend.; // One syscall
while
4. SQPOLL Mode
Dedicated kernel thread polls for submissions:
TransportConfig config ;
// With SQPOLL:
// - No syscall for io_uring_submit()
// - Kernel thread continuously polls SQ
// - Sub-microsecond submission latency
5. Zero-Copy Send
Avoid user→kernel copy for large payloads:
// Regular send: copies data to kernel
backend.;
// Zero-copy send: kernel reads directly from user buffer
backend.;
// Note: Two completions - send complete + notification
// Buffer must not be modified until notification received
io_uring Performance Benefits
| Metric | Traditional (epoll) | io_uring | Improvement |
|---|---|---|---|
| Syscalls per op | 2-3 | 0.1-0.5 | 4-30x fewer |
| p50 latency | 25-50μs | 5-15μs | 3-5x lower |
| p99 latency | 100-500μs | 20-50μs | 5-10x lower |
| Throughput | 200-500K msg/s | 1-2M msg/s | 3-5x higher |
How MVP.Express Combines These
The MVP.Express stack integrates FFM, zero-copy, and io_uring at every layer:
┌─────────────────────────────────────────────────────────────┐
│ Your Application │
│ • Works with typed flyweights and views │
│ • No manual memory management │
│ • Zero-GC hot path │
└────────────────────────┬────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────┐
│ MyraCodec │
│ • Schema-driven code generation │
│ • Flyweight accessors (FFM MemorySegment) │
│ • Zero-copy encode/decode │
└────────────────────────┬────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────┐
│ MyraTransport │
│ • io_uring backend with registered buffers │
│ • Batch submission and multi-shot receive │
│ • SQPOLL for minimum latency │
└────────────────────────┬────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────┐
│ Roray FFM Utils │
│ • MemorySegmentPool for buffer management │
│ • Utf8View for zero-alloc string handling │
│ • DowncallFactory for native bindings │
└─────────────────────────────────────────────────────────────┘
Data Flow Example
A message arriving from the network:
- MyraTransport receives data into a registered buffer (zero-copy from kernel)
- Roray FFM Utils provides the
MemorySegmentwrapping the buffer - MyraCodec flyweight wraps the segment for structured access
- Application reads fields via flyweight (direct memory reads, no allocation)
- Application uses
Utf8View.equalsString()for routing (no String allocation) - Response written via flyweight directly to send buffer
- MyraTransport sends via io_uring (zero-copy to kernel with SEND_ZC)
Result: End-to-end processing with zero heap allocations, minimal syscalls, and no data copies.
Next Steps
Now that you understand the core concepts:
- Roray FFM Utils Guide - Deep dive into memory management
- MyraCodec Guide - Schema design and code generation
- MyraTransport Guide - io_uring configuration and tuning
- Benchmarks - See the performance benefits in action