Class LibUring

java.lang.Object
express.mvp.myra.transport.iouring.LibUring

public final class LibUring extends Object
FFM (Foreign Function & Memory) bindings to the Linux liburing native library.

This class provides direct access to Linux io_uring system calls via Java's Foreign Function & Memory API (JEP 454, finalized in Java 22). io_uring is a high-performance asynchronous I/O interface introduced in Linux kernel 5.1 that provides significant performance improvements over traditional epoll/select-based I/O mechanisms.

io_uring Architecture Overview

io_uring uses two ring buffers shared between user-space and kernel-space:

  • Submission Queue (SQ): User-space writes I/O requests (SQEs - Submission Queue Entries) here. Each SQE describes an I/O operation (read, write, accept, connect, etc.)
  • Completion Queue (CQ): Kernel writes completion results (CQEs - Completion Queue Entries) here. Each CQE contains the result of a completed operation.

This shared memory design enables:

  • Zero-copy I/O: Data stays in place, no copying between kernel/user-space
  • Batched submissions: Multiple operations submitted with a single syscall (100x syscall reduction)
  • SQPOLL mode: Kernel thread polls SQ, eliminating submit syscalls entirely (further 2-5x improvement)
  • Registered buffers: Pre-registered memory regions skip address validation (1.7x throughput improvement)
  • Fixed files: Pre-registered file descriptors skip fd lookup overhead

Memory Layouts

This class defines memory layouts matching the Linux kernel ABI:

Key Operations

Advanced Features

  • Buffer Rings (Linux 5.19+): Kernel-managed buffer selection for multishot recv
  • Linked Operations: Chain SQEs for ordered execution (e.g., recv→send echo)
  • CQE Skip on Success: Reduce CQE overhead for linked operations

Requirements

  • Linux kernel 5.1+ (5.6+ recommended for full features, 6.0+ for zero-copy send)
  • liburing shared library installed (liburing.so, typically from liburing-dev package)
  • Java 21+ with FFM enabled (--enable-native-access=ALL-UNNAMED)

Performance Characteristics

  • Latency: 2-5μs end-to-end vs 50-100μs for traditional NIO
  • Throughput: 1.7x improvement with registered buffers
  • Syscall reduction: 100x fewer syscalls with batch submission
See Also:
  • Field Details

    • IO_URING_SQE_LAYOUT

      public static final StructLayout IO_URING_SQE_LAYOUT
      struct io_uring_sqe - Submission Queue Entry (64 bytes).

      Layout with offsets:

        0: opcode (1)
        1: flags (1)
        2: ioprio (2)
        4: fd (4)
        8: off (8)
       16: addr (8)
       24: len (4)
       28: op_flags (4)
       32: user_data (8)
       40: buf_index (2)
       42: buf_group (2)
       44: personality (4)
       48: splice_fd_in (4)
       52: __pad (4)
       56: addr3 (8)
      
    • IO_URING_CQE_LAYOUT

      public static final StructLayout IO_URING_CQE_LAYOUT
      struct io_uring_cqe - Completion Queue Entry (16 bytes).
      struct io_uring_cqe {
          __u64 user_data;  // sqe->user_data submission passed back
          __s32 res;        // result code for this event
          __u32 flags;      // IORING_CQE_F_* flags
      };
      
    • IORING_SETUP_SQPOLL

      public static final int IORING_SETUP_SQPOLL
      See Also:
    • IORING_SETUP_SQ_AFF

      public static final int IORING_SETUP_SQ_AFF
      See Also:
    • IORING_SETUP_CQSIZE

      public static final int IORING_SETUP_CQSIZE
      See Also:
    • IORING_SETUP_COOP_TASKRUN

      public static final int IORING_SETUP_COOP_TASKRUN
      See Also:
    • IORING_SETUP_SINGLE_ISSUER

      public static final int IORING_SETUP_SINGLE_ISSUER
      See Also:
    • IORING_OP_NOP

      public static final byte IORING_OP_NOP
      See Also:
    • IORING_OP_READV

      public static final byte IORING_OP_READV
      See Also:
    • IORING_OP_WRITEV

      public static final byte IORING_OP_WRITEV
      See Also:
    • IORING_OP_ACCEPT

      public static final byte IORING_OP_ACCEPT
      See Also:
    • IORING_OP_CONNECT

      public static final byte IORING_OP_CONNECT
      See Also:
    • IORING_OP_SEND

      public static final byte IORING_OP_SEND
      See Also:
    • IORING_OP_RECV

      public static final byte IORING_OP_RECV
      See Also:
    • IORING_OP_SEND_ZC

      public static final byte IORING_OP_SEND_ZC
      See Also:
    • IORING_OP_RECV_MULTISHOT

      public static final byte IORING_OP_RECV_MULTISHOT
      See Also:
    • IORING_RECV_MULTISHOT

      public static final int IORING_RECV_MULTISHOT
      See Also:
    • IOSQE_FIXED_FILE

      public static final int IOSQE_FIXED_FILE
      See Also:
    • IOSQE_IO_DRAIN

      public static final int IOSQE_IO_DRAIN
      See Also:
    • IOSQE_BUFFER_SELECT

      public static final int IOSQE_BUFFER_SELECT
      See Also:
    • IOSQE_CQE_SKIP_SUCCESS

      public static final int IOSQE_CQE_SKIP_SUCCESS
      See Also:
    • IORING_CQE_F_BUFFER

      public static final int IORING_CQE_F_BUFFER
      See Also:
    • IORING_CQE_F_MORE

      public static final int IORING_CQE_F_MORE
      See Also:
    • IORING_CQE_F_NOTIF

      public static final int IORING_CQE_F_NOTIF
      See Also:
    • IORING_REGISTER_PBUF_RING

      public static final int IORING_REGISTER_PBUF_RING
      See Also:
    • IORING_UNREGISTER_PBUF_RING

      public static final int IORING_UNREGISTER_PBUF_RING
      See Also:
    • IOU_PBUF_RING_MMAP

      public static final int IOU_PBUF_RING_MMAP
      See Also:
    • IO_URING_PARAMS_LAYOUT

      public static final StructLayout IO_URING_PARAMS_LAYOUT
      Memory layout for io_uring_params.
    • IO_URING_SQ_LAYOUT

      public static final StructLayout IO_URING_SQ_LAYOUT
      Memory layout for io_uring_sq (Submission Queue).
    • IO_URING_CQ_LAYOUT

      public static final StructLayout IO_URING_CQ_LAYOUT
      Memory layout for io_uring_cq (Completion Queue).
    • IO_URING_LAYOUT

      public static final StructLayout IO_URING_LAYOUT
      Memory layout for io_uring structure.
    • KERNEL_TIMESPEC_LAYOUT

      public static final StructLayout KERNEL_TIMESPEC_LAYOUT
      Memory layout for __kernel_timespec (for timeout operations).
      See Also:
      • LinuxLayouts.TIMESPEC
    • IOVEC_LAYOUT

      public static final StructLayout IOVEC_LAYOUT
      Memory layout for iovec structure (for buffer registration).
      See Also:
      • LinuxLayouts.IOVEC
    • IO_URING_BUF_LAYOUT

      public static final StructLayout IO_URING_BUF_LAYOUT
      Memory layout for io_uring_buf (single buffer in a ring). Each entry represents one buffer that can be selected by the kernel.
    • IO_URING_BUF_RING_LAYOUT

      public static final StructLayout IO_URING_BUF_RING_LAYOUT
      Memory layout for io_uring_buf_ring (buffer ring header). The ring uses a tail pointer to track which buffers are available.
    • IO_URING_BUF_REG_LAYOUT

      public static final StructLayout IO_URING_BUF_REG_LAYOUT
      Memory layout for io_uring_buf_reg (buffer ring registration). Used with IORING_REGISTER_PBUF_RING.
    • AF_INET

      public static final int AF_INET
      AF_INET - IPv4 Internet protocols.
      See Also:
    • SOCK_STREAM

      public static final int SOCK_STREAM
      SOCK_STREAM - Sequenced, reliable, connection-based byte streams.
      See Also:
    • SOCK_NONBLOCK

      public static final int SOCK_NONBLOCK
      SOCK_NONBLOCK - Set O_NONBLOCK on the new socket.
      See Also:
    • SOL_SOCKET

      public static final int SOL_SOCKET
      SOL_SOCKET - Socket level for setsockopt.
      See Also:
    • SO_REUSEADDR

      public static final int SO_REUSEADDR
      SO_REUSEADDR - Allow reuse of local addresses.
      See Also:
    • SO_REUSEPORT

      public static final int SO_REUSEPORT
      SO_REUSEPORT - Allow reuse of local port.
      See Also:
    • IPPROTO_TCP

      public static final int IPPROTO_TCP
      IPPROTO_TCP - TCP protocol.
      See Also:
    • F_GETFL

      public static final int F_GETFL
      F_GETFL - Get file status flags.
      See Also:
    • F_SETFL

      public static final int F_SETFL
      F_SETFL - Set file status flags.
      See Also:
    • O_NONBLOCK

      public static final int O_NONBLOCK
      O_NONBLOCK - Non-blocking I/O mode.
      See Also:
  • Method Details

    • queueInit

      public static int queueInit(int entries, MemorySegment ring, int flags)
      Initialize an io_uring instance.
      Parameters:
      entries - number of submission queue entries (must be power of 2)
      ring - pointer to io_uring structure
      flags - initialization flags
      Returns:
      0 on success, negative errno on failure
    • queueInitParams

      public static int queueInitParams(int entries, MemorySegment ring, MemorySegment params)
      Initialize io_uring with parameters.
      Parameters:
      entries - number of queue entries
      ring - pointer to io_uring structure
      params - pointer to io_uring_params structure
      Returns:
      0 on success, negative errno on failure
    • queueExit

      public static void queueExit(MemorySegment ring)
      Tear down an io_uring instance.
    • registerBuffers

      public static int registerBuffers(MemorySegment ring, MemorySegment iovecs, int nrIovecs)
      Register buffers for zero-copy I/O.
      Parameters:
      ring - pointer to io_uring structure
      iovecs - array of iovec structures
      nrIovecs - number of iovecs
      Returns:
      0 on success, negative errno on failure
    • unregisterBuffers

      public static int unregisterBuffers(MemorySegment ring)
      Unregister previously registered buffers.
    • registerFiles

      public static int registerFiles(MemorySegment ring, MemorySegment files, int nr_files)
      Get a submission queue entry.
      Returns:
      pointer to SQE, or null if queue is full
    • unregisterFiles

      public static int unregisterFiles(MemorySegment ring)
    • registerFilesUpdate

      public static int registerFilesUpdate(MemorySegment ring, int off, MemorySegment files, int nr_files)
    • getSqe

      public static MemorySegment getSqe(MemorySegment ring)
    • submit

      public static int submit(MemorySegment ring)
      Submit queued operations to the kernel.
      Returns:
      number of submitted operations, or negative errno
    • waitCqe

      public static int waitCqe(MemorySegment ring, MemorySegment cqePtr)
      Wait for a completion queue entry.
      Parameters:
      ring - pointer to io_uring structure
      cqePtr - pointer to receive CQE pointer
      Returns:
      0 on success, negative errno on failure
    • waitCqeTimeout

      public static int waitCqeTimeout(MemorySegment ring, MemorySegment cqePtr, MemorySegment ts)
      Wait for a completion (with timeout).
      Parameters:
      ring - pointer to io_uring structure
      cqePtr - pointer to store CQE pointer
      ts - pointer to __kernel_timespec structure (or null for no timeout)
      Returns:
      0 on success, negative errno on failure
    • peekCqe

      public static int peekCqe(MemorySegment ring, MemorySegment cqePtr)
      Peek for a completion without blocking.
      Returns:
      0 if CQE available, -EAGAIN if not
    • cqeSeen

      public static void cqeSeen(MemorySegment ring, MemorySegment cqe)
      Mark a CQE as seen (consumed).
    • prepConnect

      public static void prepConnect(MemorySegment sqe, int fd, MemorySegment addr, int addrlen)
      Prepare a connect operation.
    • prepAccept

      public static void prepAccept(MemorySegment sqe, int fd, MemorySegment addr, MemorySegment addrlen, int flags)
      Prepare an accept operation.
    • prepSend

      public static void prepSend(MemorySegment sqe, int fd, MemorySegment buf, long len, int flags)
      Prepare a send operation.
    • prepRecv

      public static void prepRecv(MemorySegment sqe, int fd, MemorySegment buf, long len, int flags)
      Prepare a recv operation.
    • setSqeFlags

      public static void setSqeFlags(MemorySegment sqe, int flags)
      Set flags for the SQE (e.g. IOSQE_FIXED_FILE).
    • prepSendFixedFile

      public static void prepSendFixedFile(MemorySegment sqe, int fileIndex, MemorySegment buf, long len, int msgFlags)
      Prepare a send operation using a registered file index.
    • prepRecvFixedFile

      public static void prepRecvFixedFile(MemorySegment sqe, int fileIndex, MemorySegment buf, long len, int msgFlags)
      Prepare a recv operation using a registered file index.
    • sqeSetOpcode

      public static void sqeSetOpcode(MemorySegment sqe, byte opcode)
      Set SQE opcode field.
    • sqeSetFlags

      public static void sqeSetFlags(MemorySegment sqe, byte flags)
      Set SQE flags field.
    • sqeSetFd

      public static void sqeSetFd(MemorySegment sqe, int fd)
      Set SQE fd field.
    • sqeSetAddr

      public static void sqeSetAddr(MemorySegment sqe, long addr)
      Set SQE addr field (buffer address).
    • sqeSetLen

      public static void sqeSetLen(MemorySegment sqe, int len)
      Set SQE len field (buffer length).
    • sqeSetUserData

      public static void sqeSetUserData(MemorySegment sqe, long userData)
      Set SQE user_data field (for request tracking).
    • sqeSetBufIndex

      public static void sqeSetBufIndex(MemorySegment sqe, short bufIndex)
      Set SQE buf_index field (for fixed buffers).
    • sqeSetOpFlags

      public static void sqeSetOpFlags(MemorySegment sqe, int opFlags)
      Set SQE op_flags field (operation-specific flags like msg_flags).
    • sqeGetFlags

      public static byte sqeGetFlags(MemorySegment sqe)
      Get SQE flags field.
    • cqeGetUserData

      public static long cqeGetUserData(MemorySegment cqe)
      Get CQE user_data field.
    • cqeGetRes

      public static int cqeGetRes(MemorySegment cqe)
      Get CQE res field (result/bytes transferred).
    • cqeGetFlags

      public static int cqeGetFlags(MemorySegment cqe)
      Get CQE flags field.
    • prepSendFixed

      public static void prepSendFixed(MemorySegment sqe, int fd, short bufIndex, int len, int flags)
      Prepare a send operation with fixed buffer.
      Parameters:
      sqe - submission queue entry
      fd - socket file descriptor
      bufIndex - index of registered buffer
      len - number of bytes to send
      flags - send flags
    • prepRecvFixed

      public static void prepRecvFixed(MemorySegment sqe, int fd, short bufIndex, int len, int flags)
      Prepare a recv operation with fixed buffer.
      Parameters:
      sqe - submission queue entry
      fd - socket file descriptor
      bufIndex - index of registered buffer
      len - maximum bytes to receive
      flags - recv flags
    • prepSendZc

      public static void prepSendZc(MemorySegment sqe, int fd, MemorySegment buf, long len, int flags)
      Prepare a zero-copy send operation.

      Zero-copy send avoids copying data from user-space to kernel-space, providing significant performance improvements for large buffers.

      IMPORTANT: When using zero-copy send:

      • You will receive TWO completions: one for send completion, one for notification (IORING_CQE_F_NOTIF)
      • The buffer must NOT be modified until the NOTIF completion is received
      • Check CQE flags for IORING_CQE_F_NOTIF to distinguish notification from send completion
      Parameters:
      sqe - submission queue entry
      fd - socket file descriptor
      buf - buffer to send (must remain valid until NOTIF completion)
      len - number of bytes to send
      flags - send flags (MSG_* constants)
    • prepSendZcFixed

      public static void prepSendZcFixed(MemorySegment sqe, int fd, short bufIndex, int len, int flags)
      Prepare a zero-copy send operation with fixed buffer.
      Parameters:
      sqe - submission queue entry
      fd - socket file descriptor
      bufIndex - index of registered buffer
      len - number of bytes to send
      flags - send flags
    • isZeroCopyNotification

      public static boolean isZeroCopyNotification(MemorySegment cqe)
      Check if a CQE is a zero-copy notification (not actual completion).
      Parameters:
      cqe - completion queue entry
      Returns:
      true if this is a notification CQE
    • prepRecvMultishot

      public static void prepRecvMultishot(MemorySegment sqe, int fd, MemorySegment buf, long len, int flags)
      Prepare a multi-shot receive operation.

      Multi-shot receive keeps the SQE active and generates multiple CQEs until the operation is cancelled or an error occurs. This is ideal for persistent receive loops as it eliminates the need to resubmit after each receive.

      IMPORTANT:

      • Check CQE flags for IORING_CQE_F_MORE - if set, more completions are coming
      • If IORING_CQE_F_MORE is NOT set, the operation has terminated and must be resubmitted
      • Works best with buffer rings (IOSQE_BUFFER_SELECT) for automatic buffer management
      • Requires Linux 5.16+
      Parameters:
      sqe - submission queue entry
      fd - socket file descriptor
      buf - buffer for receiving data
      len - maximum bytes to receive
      flags - recv flags (MSG_* constants)
    • hasMoreCompletions

      public static boolean hasMoreCompletions(MemorySegment cqe)
      Check if a multishot receive is still active (more completions coming).
      Parameters:
      cqe - completion queue entry
      Returns:
      true if more completions are expected
    • isBufferRingSupported

      public static boolean isBufferRingSupported()
      Check if buffer ring registration is supported. Requires liburing with io_uring_register_buf_ring (Linux 5.19+).
      Returns:
      true if buffer ring is available
    • registerBufferRing

      public static int registerBufferRing(MemorySegment ring, MemorySegment bufRing, int nentries, short bgid)
      Register a provided buffer ring with io_uring.

      Buffer rings allow the kernel to automatically select buffers for receive operations, eliminating the need to specify buffers in each SQE. This is essential for efficient multishot receive.

      Usage pattern:

      1. Allocate buffer ring memory (header + buffers)
      2. Initialize the ring with bufferRingInit()
      3. Add buffers with bufferRingAdd()
      4. Call bufferRingAdvance() to make buffers visible to kernel
      5. Register with registerBufferRing()
      6. Use IOSQE_BUFFER_SELECT flag in recv operations
      Parameters:
      ring - io_uring instance
      bufRing - buffer ring memory (header at start)
      nentries - number of buffer entries (power of 2)
      bgid - buffer group ID (unique per ring)
      Returns:
      0 on success, negative errno on failure
    • unregisterBufferRing

      public static int unregisterBufferRing(MemorySegment ring, int bgid)
      Unregister a buffer ring.
      Parameters:
      ring - io_uring instance
      bgid - buffer group ID
      Returns:
      0 on success, negative errno on failure
    • bufferRingInit

      public static void bufferRingInit(MemorySegment bufRing)
      Initialize a buffer ring header. Call this before adding buffers.
      Parameters:
      bufRing - pointer to buffer ring memory
    • bufferRingAdd

      public static void bufferRingAdd(MemorySegment bufRing, long buf, int bufLen, short bid, int mask, int idx)
      Add a buffer to the buffer ring.
      Parameters:
      bufRing - pointer to buffer ring
      buf - buffer address
      bufLen - buffer length
      bid - buffer ID (used to identify buffer on completion)
      mask - ring mask (nentries - 1)
      idx - current index in ring
    • bufferRingAdvance

      public static void bufferRingAdvance(MemorySegment bufRing, int count)
      Advance the buffer ring tail to make buffers visible to the kernel. Must be called after adding buffers with bufferRingAdd().
      Parameters:
      bufRing - pointer to buffer ring
      count - number of buffers added
    • cqeGetBufferId

      public static int cqeGetBufferId(MemorySegment cqe)
      Get the buffer ID from a CQE with IORING_CQE_F_BUFFER flag set.
      Parameters:
      cqe - completion queue entry
      Returns:
      buffer ID (bid) or -1 if no buffer flag
    • cqeHasBuffer

      public static boolean cqeHasBuffer(MemorySegment cqe)
      Check if CQE has buffer ID set (IORING_CQE_F_BUFFER).
      Parameters:
      cqe - completion queue entry
      Returns:
      true if buffer was selected from ring
    • prepRecvMultishotBufferSelect

      public static void prepRecvMultishotBufferSelect(MemorySegment sqe, int fd, short bgid, int flags)
      Prepare a multishot receive with buffer ring selection.

      This is the most efficient receive pattern:

      • Multishot: keeps receiving without resubmitting SQEs
      • Buffer select: kernel picks buffers from the ring
      Parameters:
      sqe - submission queue entry
      fd - socket file descriptor
      bgid - buffer group ID (matching registered buffer ring)
      flags - recv flags (MSG_* constants)
    • sqeSetLink

      public static void sqeSetLink(MemorySegment sqe)
      Set the IO_LINK flag on an SQE to link it to the next SQE.

      Linked operations are executed sequentially. If a linked operation fails, subsequent linked operations are cancelled (unless IOSQE_IO_HARDLINK is used).

      Example: Echo pattern

      sqe1 = getSqe()   // recv
      prepRecv(sqe1, ...)
      sqeSetLink(sqe1)  // link to next
      
      sqe2 = getSqe()   // send (executes after recv completes)
      prepSend(sqe2, ...)
      
      Parameters:
      sqe - submission queue entry to link
    • sqeSetHardLink

      public static void sqeSetHardLink(MemorySegment sqe)
      Set the IO_HARDLINK flag on an SQE. Unlike IOSQE_IO_LINK, the chain continues even if this operation fails.
      Parameters:
      sqe - submission queue entry to hard-link
    • sqeSetCqeSkipSuccess

      public static void sqeSetCqeSkipSuccess(MemorySegment sqe)
      Set CQE_SKIP_SUCCESS flag - don't generate CQE if operation succeeds. Useful for linked operations where only the final result matters. Requires Linux 5.17+.
      Parameters:
      sqe - submission queue entry
    • isAvailable

      public static boolean isAvailable()
      Check if io_uring is available on this system.
    • nativeSocket

      public static int nativeSocket(int domain, int type, int protocol)
      Create a socket.
      Parameters:
      domain - AF_INET for IPv4
      type - SOCK_STREAM for TCP
      protocol - 0 or IPPROTO_TCP
      Returns:
      file descriptor on success, negative on failure
    • nativeConnect

      public static int nativeConnect(int sockfd, MemorySegment addr, int addrlen)
      Connect a socket to a remote address.
      Parameters:
      sockfd - socket file descriptor
      addr - pointer to sockaddr structure
      addrlen - size of sockaddr structure
      Returns:
      0 on success, -1 on failure (check errno)
    • nativeBind

      public static int nativeBind(int sockfd, MemorySegment addr, int addrlen)
      Bind a socket to a local address.
    • nativeListen

      public static int nativeListen(int sockfd, int backlog)
      Listen for connections on a socket.
    • nativeAccept

      public static int nativeAccept(int sockfd, MemorySegment addr, MemorySegment addrlen)
      Accept a connection on a socket.
    • nativeSetsockopt

      public static int nativeSetsockopt(int sockfd, int level, int optname, MemorySegment optval, int optlen)
      Set socket options.
    • nativeFcntl

      public static int nativeFcntl(int fd, int cmd, int arg)
      Manipulate file descriptor.
    • nativeClose

      public static int nativeClose(int fd)
      Close a file descriptor.
    • nativeInetPton

      public static int nativeInetPton(int af, MemorySegment src, MemorySegment dst)
      Convert IPv4/IPv6 address from text to binary form.
    • nativeHtons

      public static short nativeHtons(short hostshort)
      Convert values between host and network byte order (16-bit).
    • createNonBlockingSocket

      public static int createNonBlockingSocket()
      Helper: Create and configure a non-blocking TCP socket.