Discover Your Terminal Mode using Java FFM (Foreign Function and Memory API)

By Salathiel G. at 28 Apr 2024

Discover your terminal mode and size using Java Foreign Function and Memory API (FFM). In this article we will be calling functions from the termios library, available on your operating system. We will be using FFM to call one of those functions, passing it primitive and object as parameters, then getting return values and modifications that C-written function did to the object we passed in.

Requirements:

  • Java 22
  • [Optional] access to termios header files on your system

These are the steps we will be taking:

  • Discover termios library, the function we will be calling and the data structure we will use
  • Explore C and Java in-memory data representation and consequences
  • Describe native data structures with FFM
  • Memory allocation with FFM
  • Loading a library with FFM
  • Locating a function in a library with FFM
  • Tell Java how to translate Java values in native ones
  • Call native functions with parameters
  • Get return value
  • Read memory modified by the native code

DISCLOSURE: I tested these on Ubuntu 22.04 and OpenJDK 22.

# About termios

termios is C-written library with functions that expose the general terminal interface to control asynchronous communication. It is available on all modern operating systems. The function has the following C-signature:

int tcgetattr(int fd, struct termios *termios_p);

It is a function:

  • named tcgetattr,
  • it accepts two parameters:
    • a C int (integer) for the first parameter (fd),
    • a C struct termios* (pointer/reference to an object of type termios), as a second parameter (termios_p),
  • and returning a C int to indicate success (0) or error (otherwise).

NOTE: In C programming language, functions copy the values of the parameters, so that the functions work with copies instead, ven for objects. If you want your function to make changes visible at the calling site scope, you need to pass it as a reference, also known as a pointer. After the function is called, you can check your object attributes to see read the up-to-date values. Moreover, there is no exception mechanism as in Java. Good practice is for functions to validate things to the best effort. If something is wrong, return a non-zero int. Otherwise, proceed and return zero. And is something really terrible still, the program will just exit without much info for debugging.

On Linux & MacOS, you can run man termios to read all about it. For C-header files, search /usr/include/ (on Linux). Otherwise, look it up online.

Here is the termios C-struct:

struct termios {
  tcflag_t c_iflag;
  tcflag_t c_oflag;
  tcflag_t c_cflag;
  tcflag_t c_lflag;
  cc_t c_line;
  cc_t c_cc[NCCS];
}

NOTE: C has a feature called type aliasing. tcflag_t is an alias for int and cc_t for char. Moreover, C promote usage of magic constants but to avoid the overload of reading some memory, it supports another feature known as directives, one of which let you define compile time values. The compiler then replace all references to it with the actual value.

Thus, that termios C-struct could really come down to:

struct termios {
  int c_iflag;
  int c_oflag;
  int c_cflag;
  int c_lflag;
  char c_line;
  char c_cc[32];
}

Now, even a Java developer can understand this as it is similar to POJOs. A structure with some integer fields, a character field and another, which is an array of 32 characters.

# In-Memory Data Representation in C versus Java

For Java to interface with native code, it is important to know what data disposition that native code is expecting and is exposing to us, or we cannot read from it due to format discrepancies.

Language\Type char int
C 1 byte 4 bytes
Java 2 bytes 4 bytes

The above array compares in-memory data representation of two types both available in C and Java. We chose C because that is what the termios library is written into and, Java because that is the language from which we will be calling the termios' function.

NOTE: But consider that there might be more difference between Java and the other programming language of your favourite library. C and Rust even have types which the number of bytes is platform-dependent: size_t in C, usize for Rust. Be mindful of those.

# FFM Memory Layouts

The memory layout is an FFM abstraction describing how a data structure will be arranged in the memory. For termios C-struct here, we will translate it to Java as:

import java.lang.foreign.MemoryLayout;

import static java.lang.foreign.MemoryLayout.sequenceLayout;
import static java.lang.foreign.MemoryLayout.structLayout;
import static java.lang.foreign.ValueLayout.JAVA_BYTE;
import static java.lang.foreign.ValueLayout.JAVA_INT;

static final MemoryLayout MODE_LAYOUT = structLayout(
  JAVA_INT.withName("c_iflag"),
  JAVA_INT.withName("c_oflag"),
  JAVA_INT.withName("c_cflag"),
  JAVA_INT.withName("c_lflag"),
  JAVA_BYTE.withName("c_line"),
  sequenceLayout(Term.NCCS, JAVA_BYTE).withName("c_cc")
);

interface Term {
  int NCCS = 32;
}
  • java.lang.foreign.MemoryLayout.structLayout static method is the factory method to describe a memory layout composition where continuous memory regions may express different data fields and types,
  • java.lang.foreign.MemoryLayout.sequenceLayout static method helps with arrays, where elements are of the same size,
  • java.lang.foreign.ValueLayout.JAVA_BYTE is a layout for, well, a single byte long (this is what termios expects),
  • java.lang.foreign.ValueLayout.JAVA_INT is a layout for an integer of 4-bytes long.

NOTE: Sequences represent a continuous space in memory. They are a great fit for arrays so that accessing array elements is a constant complexity operation: O(1).

Calling .withName as we did is optional and will only serve a later purpose. Remember that layouts are descriptions and nothing that affect the actual memory.

# FFM Arenas

java.lang.foreign.Arena is the entrypoint to scope reserved memory. Scoped memory under an arena is makes that memory portion managed by the arena. Arena instances are Java resources: meaning they implement java.lang.AutoCloseable. As such, they can be used in a try-with resource block, or have the arena.close() method called. When closing an arena resource, it marks all the memory it manages as candidates for garbage collection.

import java.lang.foreign.Arena;

void main() {
  try (var arena = Arena.ofConfined()) {
    // TODO: Do something great with arena
  }
}

There are currently 4 constructs to instantiate an Arena:Arena.auto(), Arena.global(), Arena.shared(), Arena.ofConfined(). But it is out of the scope of this article to discuss all of them. All memory allocated from these Arena construct are zero-initialized. Allocated memory of a native one (that is, off the heap).

# FFM Memory Segments

With our arena in place, let us now allocate some memory space to store our termios C-struct instance:

import java.lang.foreign.Arena;
import java.lang.foreign.MemorySegment;

// static memory layout here, removed for simplicity
void main() {
    try (var arena = Arena.ofConfined()) {
        final MemorySegment MODE_SEGMENT = arena.allocate(MODE_LAYOUT);
    }
}

That allocation operation actually reserves some memory space for our termios C-struct instance. How much space? Well, our layout composed previously determines that. But it also returns to us two valuable information:

  • the address of the first byte of out termios C-struct instance
  • the size of it in the memory

Both are wrapped in the returned java.lang.foreign.MemorySegment and more:

  • MODE_SEGMENT.address() will give you that address,
  • MODE_SEGMENT.byteSize() will return the byte size of the memory segment,
  • Some other methods will let us do more this, like segment.get(...) and segment.set(...) to read/write from/to it.

# FFM Linker and Symbol Lookup

java.lang.foreign.Linker is the interface with the contract between the JVM and your operating system, in both directions:

  • a down-call is when Java calls a native function,
  • an up-call stubs, is for native code calling Java methods.

It is a sealed interface, that only permits the jdk.internal.foreign.abi.AbstractLink which, in turn, is a sealed class permitting only the JDK implementations for various supported Java platforms. As of Java 22, those platforms are:

  • jdk.internal.foreign.abi.aarch64.windows.WindowsAArch64Linker
  • jdk.internal.foreign.abi.aarch64.linux.LinuxAArch64Linker
  • jdk.internal.foreign.abi.aarch64.macos.MacOsAArch64Linker
  • jdk.internal.foreign.abi.ppc64.linux.LinuxPPC64leLinker
  • jdk.internal.foreign.abi.ppc64.linux.LinuxPPC64Linker
  • jdk.internal.foreign.abi.ppc64.aix.AixPPC64Linker
  • jdk.internal.foreign.abi.x64.windows.Windowsx64Linker
  • jdk.internal.foreign.abi.x64.sysv.SysVx64Linker
  • jdk.internal.foreign.abi.riscv64.linux.LinuxRISCV64Linker
  • jdk.internal.foreign.abi.s390.linux.LinuxS390Linker
  • jdk.internal.foreign.abi.fallback.FallbackLinker

To obtain a linker:

import java.lang.foreign.Linker;

// If it cannot find one for your platform, it throws an UnsupportedOperationException
static final Linker LINKER = Linker.nativeLinker();

With our linker, we now need to locate our function. And for that, we need a symbol lookup: with the help of a java.lang.foreign.SymbolLookup.

If we are loading a custom library (.dll on Microsoft Windows, .so on Linux systems), we can use the static method java.lang.foreign.SymbolLookup.libraryLookup("path-to-library", arenaInstance);.

But Java will automatically load a number of system libraries at the JVM startup. They describe it as:

Each Linker is responsible for choosing libraries that are widely recognized as useful on the OS and processor combination supported by the Linker. Accordingly, the precise set of symbols exposed by the symbol lookup is unspecified; it varies from one Linker to another.

SOURCE: Java Doc.

On Linux, these libraries typically include libc, libm and libdl. Our termios is in the default libraries load by the JVM on startup. So to get the symbol lookup:

import java.lang.foreign.SymbolLookup;

// Static linker removed to reduce noise
static final SymbolLookup SYMBOL_LOOKUP = LINKER.defaultLookup();

# FFM Locating Our Function

SYMBOL_LOOKUP.find("tcgetattr"); will find our C-written tcgetattr function, and return a java.util.Optional of the memory segment where that function is located. And empty optional if not found.

# FFM Down-Call

Before we call our function, we need Java to know the signature of the function (binaries do not always carry all necessary information for Java to discover it on its own.) For tcgetattr, this will be:

import java.lang.foreign.FunctionDescriptor;

final FunctionDescriptor signature = FunctionDescriptor.of(
        ValueLayout.JAVA_INT, ValueLayout.JAVA_INT, ValueLayout.ADDRESS);
//      int                   (int fd,              struct termios *termios_p)

This function descriptor describes a function that:

  1. return an int,
  2. accept an int first parameter,
  3. accept a pointer (memory address) second parameter.

Next, we make a method handle that binds our located function address with its signature:

import java. lang. invoke. MethodHandle;

final MethodHandle tcgetattr = SYMBOL_LOOKUP.find("tcgetattr")
        .map(memorySegment -> LINKER.downcallHandle(memorySegment, signature))
        .orElseThrow();

Finally, we can call tcgetattr native function, using the invoke method on our method handle:

static final MemoryLayout MODE_LAYOUT = structLayout(
        JAVA_INT.withName("c_iflag"),
        JAVA_INT.withName("c_oflag"),
        JAVA_INT.withName("c_cflag"),
        JAVA_INT.withName("c_lflag"),
        JAVA_BYTE.withName("c_line"),
        sequenceLayout(Term.NCCS, JAVA_BYTE).withName("c_cc"));
static final Linker LINKER = Linker.nativeLinker();
static final SymbolLookup SYMBOL_LOOKUP = LINKER.defaultLookup();

void main() {
    try (var arena = Arena.ofConfined()) {
        final var MODE_SEGMENT = arena.allocate(MODE_LAYOUT);
        final FunctionDescriptor signature = FunctionDescriptor.of(
                ValueLayout.JAVA_INT, ValueLayout.JAVA_INT, ValueLayout.ADDRESS);
        final MethodHandle tcgetattr = SYMBOL_LOOKUP.find("tcgetattr")
                .map(memorySegment -> LINKER.downcallHandle(memorySegment, signature))
                .orElseThrow();
        final Object status = tcgetattr.invoke(Term.STD_OUT_FD, MODE_SEGMENT);
        // TODO: explore return value and changes to our MODE_SEGMENT
    }
}

interface Term {
    int NCCS = 32;
    int STD_OUT_FD = 0;
}

NOTE: We called our function with two values:

  • Term.STD_OUT_FD which is the file descriptor of the standard output,
  • MODE_SEGMENT is is the pointer (or memory address) of our C-struct termios instance.

0 is the file descriptor for standard input, 1 for standard output and 2 for standard error.

# FFM Return Value and Reading Memory Segment

That status variable there is the return value of C-written tcgetattr. It is an integer, so to can cast it and check its value. Remember that 0 means all was well and anything else mean there was an error. Also recall that our memory segment was zero initialized? Now we will read that memory and find that some fields are no more zeros.

To read value from our memory segment, we could compute the bytes offset of each field and read from there:

MODE_SEGMENT.get(JAVA_INT, 0) // read c_iflag
MODE_SEGMENT.get(JAVA_INT, 4) // then read c_iflag

But that is error-prone. To address that, Java FFM has another abstraction called VarHandle to let us navigate memory segments by paths described in a corresponding memory layout. Here is reading all the fields of our in-memory termios C-object:

final int c_iflag = (int) MODE_LAYOUT.varHandle(groupElement("c_iflag")).get(MODE_SEGMENT, 0);
final int c_oflag = (int) MODE_LAYOUT.varHandle(groupElement("c_oflag")).get(MODE_SEGMENT, 0);
final int c_cflag = (int) MODE_LAYOUT.varHandle(groupElement("c_cflag")).get(MODE_SEGMENT, 0);
final int c_lflag = (int) MODE_LAYOUT.varHandle(groupElement("c_lflag")).get(MODE_SEGMENT, 0);
final int c_line = (int) MODE_LAYOUT.varHandle(groupElement("c_line")).get(MODE_SEGMENT, 0);
final List<Byte> c_cc = MODE_SEGMENT.asSlice(MODE_LAYOUT.byteOffset(groupElement("c_cc")))
  .elements(JAVA_BYTE)
  .map(elementSegment -> elementSegment.get(JAVA_BYTE, 0))
  .toList();

System.out.println(STR."c_iflag = \{c_iflag}");
System.out.println(STR."c_oflag = \{c_oflag}");
System.out.println(STR."c_cflag = \{c_cflag}");
System.out.println(STR."c_lflag = \{c_lflag}");
System.out.println(STR."c_line = \{c_line}");
System.out.println(STR."c_cc = \{c_cc}");

My command (using Java 22, the preview features here are only the string template and the implicit classes) & output:

$ java --source 22 --enable-preview --enable-native-access=ALL-UNNAMED src/termios.java

c_iflag = 17664
c_oflag = 5
c_cflag = 191
c_lflag = 2619
c_line = 0
c_cc = [3, 28, 127, 21, 4, 0, 1, 0, 17, 19, 26, 0, 18, 15, 23, 22, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In MODE_LAYOUT.varHandle(groupElement("c_iflag")).get(MODE_SEGMENT, 0):

  • MODE_LAYOUT.varHandle(groupElement("c_iflag")) build the var handle,
  • .get(MODE_SEGMENT, 0) apply it to a segment to read some value,
  • 0 here is byte alignement, something out of scope for this article.

NOTE: The var handle needs a path to locate the layout to read. You can only read primitive layout (you can read int, byte, long, ADDRESS but not array or object.) That is why we did not read c_cc using a var handle. Although we could have read its entries individually using either:

  • layout.varHandle(groupElement("c_cc"), sequenceElement(i)).get(segment, 0), or
  • layout.varHandle(groupElement("c_cc"), sequenceElement()).get(segment, 0, i).

Where i is the index of the element you want to be reading for the c_cc array, from the memory segment.

What we did for c_cc reading is:

  1. To get the address for that array (that means skipping some bytes to its offset - thus the offset computation MODE_LAYOUT.byteOffset(groupElement("c_cc"))),
  2. Then have Java interpret it as an array of byte elements - .elements(JAVA_BYTE) - which result is a stream of memory segments for each byte in the segment,
  3. And, finally, we read the byte value from each byte segment, and aggregate all the values in a list.

That is how we instantiate native objects, call native functions, pass it parameters, read their return values and, finally, read memory values modified by our native functions. This is made possible with core Java since Java 22, using the foreign function and memory API aka FFM. No external dependencies needed.

In the next article, we will explore the meaning of these termios fields and do more:

  • get our terminal size (rows and columns),
  • change our terminal mode (like vim or nano, or something worse).

Until then, follow me on X to get in touch or continue the discussion.