Class IoUringBackend

java.lang.Object
express.mvp.myra.transport.iouring.IoUringBackend
All Implemented Interfaces:
TransportBackend, AutoCloseable

public final class IoUringBackend extends Object implements TransportBackend
io_uring-based transport backend for maximum I/O performance on Linux.

This backend leverages Linux's io_uring interface (introduced in kernel 5.1) to achieve significantly higher throughput and lower latency compared to traditional NIO. It implements the TransportBackend interface with io_uring-specific optimizations.

Performance Characteristics

  • 1.7x throughput improvement vs NIO with registered buffers (pre-validated memory regions eliminate per-operation address validation overhead)
  • 100x syscall reduction with batch submission (multiple operations submitted in a single io_uring_submit() call)
  • 2-5μs end-to-end latency vs 50-100μs for NIO (measured ping-pong)
  • SQPOLL mode (optional): Kernel thread polls submission queue, eliminating submit syscalls entirely for additional 2-5x improvement

Architecture

The backend manages:

  • Ring Memory: Shared memory region containing submission and completion queues
  • Buffer Pool: Pre-registered memory buffers for zero-copy I/O
  • Fixed Files: Pre-registered file descriptors for faster fd lookup
  • Pre-allocated Structures: Timespec, sockaddr, iovec to avoid hot-path allocations

Advanced Features

Usage Example

TransportConfig config = TransportConfig.builder()
    .backendType(BackendType.IO_URING)
    .registeredBuffers(RegisteredBuffersConfig.builder()
        .numBuffers(256)
        .bufferSize(65536)
        .build())
    .sqPollEnabled(true)
    .sqPollCpuAffinity(3)
    .build();

IoUringBackend backend = new IoUringBackend();
backend.initialize(config);
backend.registerBufferPool(bufferPool);
backend.connect(new InetSocketAddress("localhost", 8080), token);
backend.submitBatch();
backend.waitForCompletion(1000, handler);

Requirements

  • Linux kernel 5.1+ (5.6+ recommended, 6.0+ for zero-copy send)
  • liburing installed (liburing.so, typically from liburing-dev package)
  • Java 21+ with --enable-native-access=ALL-UNNAMED
See Also:
  • Constructor Details

    • IoUringBackend

      public IoUringBackend()
      Creates a new io_uring backend with default configuration.

      Allocates the io_uring structure and pre-allocated hot-path structures. The backend must be initialized via initialize(TransportConfig) before use.

      This constructor creates a backend that owns the io_uring ring and is responsible for cleanup when closed.

  • Method Details

    • initialize

      public void initialize(TransportConfig config)
      Description copied from interface: TransportBackend
      Initializes the backend with the given configuration.

      This is called once during transport creation. Implementations should allocate resources, initialize the I/O subsystem, and prepare for operations.

      Specified by:
      initialize in interface TransportBackend
      Parameters:
      config - the transport configuration
    • registerBufferPool

      public void registerBufferPool(RegisteredBufferPool pool)
      Description copied from interface: TransportBackend
      Registers a buffer pool with this backend for zero-copy I/O.

      For io_uring, this calls io_uring_register_buffers() to pre-register memory regions with the kernel. For other backends, this may be a no-op.

      Specified by:
      registerBufferPool in interface TransportBackend
      Parameters:
      pool - the buffer pool to register
    • initBufferRing

      public boolean initBufferRing(int nentries, int bufSize, short bgid)
      Initialize a buffer ring for kernel-managed buffer selection.

      Buffer rings enable the most efficient multishot receive pattern where the kernel automatically selects buffers from a pre-registered pool.

      Parameters:
      nentries - number of buffer ring entries (must be power of 2)
      bufSize - size of each buffer in the ring
      bgid - buffer group ID (unique per io_uring instance)
      Returns:
      true if buffer ring was successfully initialized
    • initBufferRing

      public boolean initBufferRing()
      Initialize buffer ring with default parameters. Uses 256 entries of 8KB each with group ID 0.
      Returns:
      true if buffer ring was successfully initialized
    • isBufferRingEnabled

      public boolean isBufferRingEnabled()
      Check if buffer ring is enabled and active.
      Returns:
      true if buffer ring is available for use
    • getBufferGroupId

      public short getBufferGroupId()
      Get buffer group ID for multishot receive operations.
      Returns:
      buffer group ID or -1 if not enabled
    • getBufferRingBuffer

      public MemorySegment getBufferRingBuffer(int bufferId)
      Get a buffer from the buffer ring by ID. Used after receiving a CQE with IORING_CQE_F_BUFFER set.
      Parameters:
      bufferId - buffer ID from CQE
      Returns:
      memory segment for the buffer, or null if invalid
    • recycleBufferRingBuffer

      public void recycleBufferRingBuffer(int bufferId)
      Recycle a buffer back to the buffer ring after processing.
      Parameters:
      bufferId - buffer ID to recycle
    • submitMultishotRecvWithBufferRing

      public boolean submitMultishotRecvWithBufferRing(long token)
      Submit a multishot receive with buffer ring selection.

      This is the most efficient receive pattern - the kernel automatically selects buffers from the ring and continues receiving without resubmission.

      Parameters:
      token - user token for completion tracking
      Returns:
      true if submission was successful
    • submitLinkedEcho

      public boolean submitLinkedEcho(MemorySegment recvBuffer, int recvLen, long recvToken, long sendToken)
      Submit a linked recv+send echo pattern.

      This creates two linked SQEs:

      1. recv - receive data into buffer
      2. send - send the same data back (linked, executes after recv)

      The send only executes after recv completes successfully. If recv fails, the linked send is cancelled automatically.

      Parameters:
      recvBuffer - buffer for receiving data
      recvLen - max bytes to receive
      recvToken - token for recv completion
      sendToken - token for send completion
      Returns:
      true if both SQEs were submitted successfully
    • submitLinkedEchoSkipRecvCqe

      public boolean submitLinkedEchoSkipRecvCqe(MemorySegment recvBuffer, int recvLen, long sendToken)
      Submit linked recv+send with skip on success for efficient echo.

      Like submitLinkedEcho, but uses CQE_SKIP_SUCCESS on the recv to reduce CQE overhead. Only the send completion is generated on success.

      Parameters:
      recvBuffer - buffer for receiving/sending data
      recvLen - max bytes to receive
      sendToken - token for completion (only send generates CQE)
      Returns:
      true if submission was successful
    • submitLinkedRequestResponse

      public boolean submitLinkedRequestResponse(MemorySegment sendBuffer, int sendLen, MemorySegment recvBuffer, int recvLen, long sendToken, long recvToken)
      Submit linked send+recv for request-response pattern.

      Useful for RPC clients that send a request and expect a response. The recv only starts after the send completes.

      Parameters:
      sendBuffer - buffer containing request data
      sendLen - bytes to send
      recvBuffer - buffer for response
      recvLen - max response bytes
      sendToken - token for send completion
      recvToken - token for recv completion
      Returns:
      true if submission was successful
    • isUsingFixedFile

      public boolean isUsingFixedFile()
      Check if fixed file optimization is active for this backend. Fixed files eliminate the fd lookup overhead in each operation.
      Returns:
      true if using fixed file index
    • getFixedFileIndex

      public int getFixedFileIndex()
      Get the fixed file index (if using fixed files).
      Returns:
      fixed file index or -1 if not using fixed files
    • submitBatchRecv

      public int submitBatchRecv(MemorySegment[] buffers, int[] lengths, long[] tokens, int count)
      Submit a batch of receive operations for maximum throughput.

      This submits multiple recv SQEs in a single call, which is more efficient than submitting them one at a time. Useful for high-throughput scenarios where you want to have multiple outstanding receives.

      Parameters:
      buffers - array of receive buffers
      lengths - array of buffer lengths
      tokens - array of user tokens for tracking
      count - number of operations to submit
      Returns:
      number of operations successfully queued
    • submitBatchSend

      public int submitBatchSend(MemorySegment[] buffers, int[] lengths, long[] tokens, int count)
      Submit a batch of send operations for maximum throughput.
      Parameters:
      buffers - array of send buffers
      lengths - array of data lengths
      tokens - array of user tokens for tracking
      count - number of operations to submit
      Returns:
      number of operations successfully queued
    • submitBatchRecvRegistered

      public int submitBatchRecvRegistered(short[] bufferIndices, int[] lengths, long[] tokens, int count)
      Submit a batch of recv operations using registered buffers.

      Uses pre-registered buffer indices for zero-copy receive. More efficient than regular recv as it avoids buffer address translation.

      Parameters:
      bufferIndices - indices of registered buffers
      lengths - max lengths for each recv
      tokens - user tokens for tracking
      count - number of operations
      Returns:
      number of operations successfully queued
    • forceSubmit

      public int forceSubmit()
      Force a single io_uring_submit syscall after queuing operations.

      This is useful when you've queued multiple operations and want to submit them all at once for maximum batching efficiency.

      Returns:
      number of SQEs submitted to kernel
    • connect

      public void connect(SocketAddress remoteAddress, long token)
      Description copied from interface: TransportBackend
      Connects to a remote endpoint asynchronously.
      Specified by:
      connect in interface TransportBackend
      Parameters:
      remoteAddress - the remote address to connect to
      token - the token identifying the operation
    • bind

      public void bind(SocketAddress localAddress)
      Description copied from interface: TransportBackend
      Binds to a local address for accepting connections (server mode).
      Specified by:
      bind in interface TransportBackend
      Parameters:
      localAddress - the local address to bind to
    • accept

      public void accept(long token)
      Description copied from interface: TransportBackend
      Accepts an incoming connection (server mode).
      Specified by:
      accept in interface TransportBackend
      Parameters:
      token - the token identifying the operation
    • send

      public void send(RegisteredBuffer buffer, long token)
      Description copied from interface: TransportBackend
      Sends data using a registered buffer (zero-copy).

      For io_uring, this uses io_uring_prep_send_fixed() with the buffer index. For other backends, this may fall back to a regular send.

      Specified by:
      send in interface TransportBackend
      Parameters:
      buffer - the registered buffer containing data to send
      token - the token identifying the operation
    • send

      public void send(MemorySegment data, int length, long token)
      Description copied from interface: TransportBackend
      Sends data using a raw memory segment.

      This is a fallback for non-registered buffers. Less efficient than TransportBackend.send(RegisteredBuffer, long) but more flexible.

      Specified by:
      send in interface TransportBackend
      Parameters:
      data - the memory segment containing data to send
      length - the number of bytes to send
      token - the token identifying the operation
    • sendZeroCopy

      public void sendZeroCopy(RegisteredBuffer buffer, long token)
      Send data using zero-copy mechanism (SEND_ZC).

      Zero-copy send avoids copying data from user-space to kernel-space, providing significant performance improvements for large buffers (typically 2KB+).

      IMPORTANT:

      • You will receive TWO completions: one for send completion, one for notification
      • The buffer must NOT be modified until the notification completion is received
      • Check LibUring.isZeroCopyNotification(MemorySegment) in your completion handler
      Parameters:
      buffer - the registered buffer to send
      token - user data token for completion tracking
    • sendZeroCopy

      public void sendZeroCopy(MemorySegment data, int length, long token)
      Send data using zero-copy mechanism (SEND_ZC).
      Parameters:
      data - buffer containing data to send
      length - number of bytes to send
      token - user data token for completion tracking
    • receive

      public void receive(RegisteredBuffer buffer, long token)
      Description copied from interface: TransportBackend
      Receives data into a registered buffer (zero-copy).

      For io_uring, this uses io_uring_prep_recv_fixed() with the buffer index.

      Specified by:
      receive in interface TransportBackend
      Parameters:
      buffer - the registered buffer to receive into
      token - the token identifying the operation
    • receive

      public void receive(MemorySegment data, int maxLength, long token)
      Description copied from interface: TransportBackend
      Receives data into a raw memory segment.
      Specified by:
      receive in interface TransportBackend
      Parameters:
      data - the memory segment to receive into
      maxLength - the maximum number of bytes to receive
      token - the token identifying the operation
    • receiveMultishot

      public void receiveMultishot(RegisteredBuffer buffer, long token)
      Start a persistent multi-shot receive operation.

      Multi-shot receive keeps the SQE active and generates multiple CQEs until the operation is cancelled or an error occurs. This is ideal for persistent receive loops as it eliminates the need to resubmit after each receive.

      IMPORTANT:

      • Check CQE flags for IORING_CQE_F_MORE - if set, more completions are coming
      • If IORING_CQE_F_MORE is NOT set, the operation has terminated and must be resubmitted
      • Use LibUring.hasMoreCompletions(MemorySegment) to check in completion handler
      • Requires Linux 5.16+
      Parameters:
      buffer - the registered buffer for receiving data
      token - user data token for completion tracking
    • receiveMultishot

      public void receiveMultishot(MemorySegment data, int maxLength, long token)
      Start a persistent multi-shot receive operation.
      Parameters:
      data - buffer for receiving data
      maxLength - maximum bytes to receive per completion
      token - user data token for completion tracking
    • submitBatch

      public int submitBatch()
      Description copied from interface: TransportBackend
      Submits all queued operations in a batch.

      For io_uring, this calls io_uring_submit() to submit all queued operations with a single syscall. This is the key to achieving 100x syscall reduction.

      For NIO, this is typically a no-op since NIO doesn't support batching.

      Specified by:
      submitBatch in interface TransportBackend
      Returns:
      the number of operations submitted
    • poll

      public int poll(CompletionHandler handler)
      Description copied from interface: TransportBackend
      Polls for completions without blocking.

      For io_uring, this calls io_uring_peek_cqe() to check for completed operations. For NIO, this may use a Selector with 0 timeout.

      Specified by:
      poll in interface TransportBackend
      Parameters:
      handler - the handler to call for each completion
      Returns:
      the number of operations completed
    • waitForCompletion

      public int waitForCompletion(long timeoutMillis, CompletionHandler handler)
      Description copied from interface: TransportBackend
      Waits for at least one operation to complete.

      This blocks until at least one queued operation completes.

      Specified by:
      waitForCompletion in interface TransportBackend
      Parameters:
      timeoutMillis - the maximum time to wait in milliseconds (0 = forever)
      handler - the handler to call for each completion
      Returns:
      the number of operations completed
    • supportsRegisteredBuffers

      public boolean supportsRegisteredBuffers()
      Description copied from interface: TransportBackend
      Returns whether this backend supports registered buffers.
      Specified by:
      supportsRegisteredBuffers in interface TransportBackend
      Returns:
      true if registered buffers are supported
    • supportsBatchSubmission

      public boolean supportsBatchSubmission()
      Description copied from interface: TransportBackend
      Returns whether this backend supports batch submission.
      Specified by:
      supportsBatchSubmission in interface TransportBackend
      Returns:
      true if batch submission is supported
    • supportsTLS

      public boolean supportsTLS()
      Description copied from interface: TransportBackend
      Returns whether this backend supports TLS/SSL.
      Specified by:
      supportsTLS in interface TransportBackend
      Returns:
      true if TLS is supported
    • getBackendType

      public String getBackendType()
      Description copied from interface: TransportBackend
      Returns the backend type identifier.
      Specified by:
      getBackendType in interface TransportBackend
      Returns:
      the backend type (e.g., "io_uring", "nio", "xdp")
    • getStats

      public BackendStats getStats()
      Description copied from interface: TransportBackend
      Returns statistics about backend operations.
      Specified by:
      getStats in interface TransportBackend
      Returns:
      backend statistics including operation counts, errors, etc.
    • close

      public void close()
      Description copied from interface: TransportBackend
      Closes the backend and releases all resources.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface TransportBackend
    • createFromAccepted

      public TransportBackend createFromAccepted(int handle)
      Description copied from interface: TransportBackend
      Creates a new backend instance from an accepted connection handle.

      For io_uring, the handle is the file descriptor. For NIO, the handle is an index or reference to the accepted channel.

      Specified by:
      createFromAccepted in interface TransportBackend
      Parameters:
      handle - the handle returned by the accept completion
      Returns:
      a new TransportBackend for the accepted connection