Discover Your Terminal Mode using Java FFM (Foreign Function and Memory API)
By Salathiel G. at 28 Apr 2024
Discover your terminal mode and size using Java Foreign Function and Memory API (FFM). In this article we will be calling functions from the termios library, available on your operating system. We will be using FFM to call one of those functions, passing it primitive and object as parameters, then getting return values and modifications that C-written function did to the object we passed in.
Requirements:
- Java 22
- [Optional] access to termios header files on your system
These are the steps we will be taking:
- Discover termios library, the function we will be calling and the data structure we will use
- Explore C and Java in-memory data representation and consequences
- Describe native data structures with FFM
- Memory allocation with FFM
- Loading a library with FFM
- Locating a function in a library with FFM
- Tell Java how to translate Java values in native ones
- Call native functions with parameters
- Get return value
- Read memory modified by the native code
DISCLOSURE: I tested these on Ubuntu 22.04 and OpenJDK 22.
# About termios
termios is C-written library with functions that expose the general terminal interface to control asynchronous communication. It is available on all modern operating systems. The function has the following C-signature:
int tcgetattr(int fd, struct termios *termios_p);
It is a function:
- named
tcgetattr
, - it accepts two parameters:
- a C
int
(integer) for the first parameter (fd
), - a C
struct termios*
(pointer/reference to an object of typetermios
), as a second parameter (termios_p
),
- a C
- and returning a C
int
to indicate success (0
) or error (otherwise).
NOTE: In C programming language, functions copy the values of the parameters, so that the functions work with copies instead, ven for objects. If you want your function to make changes visible at the calling site scope, you need to pass it as a reference, also known as a pointer. After the function is called, you can check your object attributes to see read the up-to-date values. Moreover, there is no exception mechanism as in Java. Good practice is for functions to validate things to the best effort. If something is wrong, return a non-zero int. Otherwise, proceed and return zero. And is something really terrible still, the program will just exit without much info for debugging.
On Linux & MacOS, you can run man termios
to read all about it.
For C-header files, search /usr/include/ (on Linux).
Otherwise, look it up online.
Here is the termios
C-struct:
struct termios {
tcflag_t c_iflag;
tcflag_t c_oflag;
tcflag_t c_cflag;
tcflag_t c_lflag;
cc_t c_line;
cc_t c_cc[NCCS];
}
NOTE: C has a feature called type aliasing.
tcflag_t
is an alias forint
andcc_t
forchar
. Moreover, C promote usage of magic constants but to avoid the overload of reading some memory, it supports another feature known as directives, one of which let you define compile time values. The compiler then replace all references to it with the actual value.
Thus, that termios
C-struct could really come down to:
struct termios {
int c_iflag;
int c_oflag;
int c_cflag;
int c_lflag;
char c_line;
char c_cc[32];
}
Now, even a Java developer can understand this as it is similar to POJOs. A structure with some integer fields, a character field and another, which is an array of 32 characters.
# In-Memory Data Representation in C versus Java
For Java to interface with native code, it is important to know what data disposition that native code is expecting and is exposing to us, or we cannot read from it due to format discrepancies.
Language\Type | char | int |
---|---|---|
C | 1 byte | 4 bytes |
Java | 2 bytes | 4 bytes |
The above array compares in-memory data representation of two types both available in C and Java. We chose C because that is what the termios library is written into and, Java because that is the language from which we will be calling the termios' function.
NOTE: But consider that there might be more difference between Java and the other programming language of your favourite library. C and Rust even have types which the number of bytes is platform-dependent:
size_t
in C,usize
for Rust. Be mindful of those.
# FFM Memory Layouts
The memory layout is an FFM abstraction describing how a data structure will be arranged in the memory.
For termios
C-struct here, we will translate it to Java as:
import java.lang.foreign.MemoryLayout;
import static java.lang.foreign.MemoryLayout.sequenceLayout;
import static java.lang.foreign.MemoryLayout.structLayout;
import static java.lang.foreign.ValueLayout.JAVA_BYTE;
import static java.lang.foreign.ValueLayout.JAVA_INT;
static final MemoryLayout MODE_LAYOUT = structLayout(
JAVA_INT.withName("c_iflag"),
JAVA_INT.withName("c_oflag"),
JAVA_INT.withName("c_cflag"),
JAVA_INT.withName("c_lflag"),
JAVA_BYTE.withName("c_line"),
sequenceLayout(Term.NCCS, JAVA_BYTE).withName("c_cc")
);
interface Term {
int NCCS = 32;
}
java.lang.foreign.MemoryLayout.structLayout
static method is the factory method to describe a memory layout composition where continuous memory regions may express different data fields and types,java.lang.foreign.MemoryLayout.sequenceLayout
static method helps with arrays, where elements are of the same size,java.lang.foreign.ValueLayout.JAVA_BYTE
is a layout for, well, a single byte long (this is whattermios
expects),java.lang.foreign.ValueLayout.JAVA_INT
is a layout for an integer of 4-bytes long.
NOTE: Sequences represent a continuous space in memory. They are a great fit for arrays so that accessing array elements is a constant complexity operation: O(1).
Calling .withName
as we did is optional and will only serve a later purpose.
Remember that layouts are descriptions and nothing that affect the actual memory.
# FFM Arenas
java.lang.foreign.Arena
is the entrypoint to scope reserved memory.
Scoped memory under an arena is makes that memory portion managed by the arena.
Arena instances are Java resources: meaning they implement java.lang.AutoCloseable
.
As such, they can be used in a try-with resource block, or have the arena.close()
method called.
When closing an arena resource, it marks all the memory it manages as candidates for garbage collection.
import java.lang.foreign.Arena;
void main() {
try (var arena = Arena.ofConfined()) {
// TODO: Do something great with arena
}
}
There are currently 4 constructs to instantiate an Arena
:Arena.auto()
, Arena.global()
, Arena.shared()
,
Arena.ofConfined()
. But it is out of the scope of this article to discuss all of them.
All memory allocated from these Arena
construct are zero-initialized.
Allocated memory of a native one (that is, off the heap).
# FFM Memory Segments
With our arena in place, let us now allocate some memory space to store our termios
C-struct instance:
import java.lang.foreign.Arena;
import java.lang.foreign.MemorySegment;
// static memory layout here, removed for simplicity
void main() {
try (var arena = Arena.ofConfined()) {
final MemorySegment MODE_SEGMENT = arena.allocate(MODE_LAYOUT);
}
}
That allocation operation actually reserves some memory space for our termios
C-struct instance. How much space? Well,
our layout composed previously determines that. But it also returns to us two valuable information:
- the address of the first byte of out
termios
C-struct instance - the size of it in the memory
Both are wrapped in the returned java.lang.foreign.MemorySegment
and more:
MODE_SEGMENT.address()
will give you that address,MODE_SEGMENT.byteSize()
will return the byte size of the memory segment,- Some other methods will let us do more this, like
segment.get(...)
andsegment.set(...)
to read/write from/to it.
# FFM Linker and Symbol Lookup
java.lang.foreign.Linker
is the interface with the contract between the JVM and your operating system, in both
directions:
- a down-call is when Java calls a native function,
- an up-call stubs, is for native code calling Java methods.
It is a sealed interface, that only permits the jdk.internal.foreign.abi.AbstractLink
which, in turn, is a sealed
class permitting only the JDK implementations for various supported Java platforms. As of Java 22, those platforms are:
jdk.internal.foreign.abi.aarch64.windows.WindowsAArch64Linker
jdk.internal.foreign.abi.aarch64.linux.LinuxAArch64Linker
jdk.internal.foreign.abi.aarch64.macos.MacOsAArch64Linker
jdk.internal.foreign.abi.ppc64.linux.LinuxPPC64leLinker
jdk.internal.foreign.abi.ppc64.linux.LinuxPPC64Linker
jdk.internal.foreign.abi.ppc64.aix.AixPPC64Linker
jdk.internal.foreign.abi.x64.windows.Windowsx64Linker
jdk.internal.foreign.abi.x64.sysv.SysVx64Linker
jdk.internal.foreign.abi.riscv64.linux.LinuxRISCV64Linker
jdk.internal.foreign.abi.s390.linux.LinuxS390Linker
jdk.internal.foreign.abi.fallback.FallbackLinker
To obtain a linker:
import java.lang.foreign.Linker;
// If it cannot find one for your platform, it throws an UnsupportedOperationException
static final Linker LINKER = Linker.nativeLinker();
With our linker, we now need to locate our function. And for that, we need a symbol lookup: with the help of a java.lang.foreign.SymbolLookup
.
If we are loading a custom library (.dll
on Microsoft Windows,
.so
on Linux systems), we can use the static method
java.lang.foreign.SymbolLookup.libraryLookup("path-to-library", arenaInstance);
.
But Java will automatically load a number of system libraries at the JVM startup. They describe it as:
Each Linker is responsible for choosing libraries that are widely recognized as useful on the OS and processor combination supported by the Linker. Accordingly, the precise set of symbols exposed by the symbol lookup is unspecified; it varies from one Linker to another.
SOURCE: Java Doc.
On Linux, these libraries typically include libc
, libm
and libdl
. Our termios is in the default libraries
load by the JVM on startup. So to get the symbol lookup:
import java.lang.foreign.SymbolLookup;
// Static linker removed to reduce noise
static final SymbolLookup SYMBOL_LOOKUP = LINKER.defaultLookup();
# FFM Locating Our Function
SYMBOL_LOOKUP.find("tcgetattr");
will find our C-written tcgetattr
function, and return a java.util.Optional
of
the memory segment where that function is located. And empty optional if not found.
# FFM Down-Call
Before we call our function, we need Java to know the signature of the function (binaries do not always carry all
necessary information for Java to discover it on its own.) For tcgetattr
, this will be:
import java.lang.foreign.FunctionDescriptor;
final FunctionDescriptor signature = FunctionDescriptor.of(
ValueLayout.JAVA_INT, ValueLayout.JAVA_INT, ValueLayout.ADDRESS);
// int (int fd, struct termios *termios_p)
This function descriptor describes a function that:
- return an
int
, - accept an
int
first parameter, - accept a
pointer
(memory address) second parameter.
Next, we make a method handle that binds our located function address with its signature:
import java. lang. invoke. MethodHandle;
final MethodHandle tcgetattr = SYMBOL_LOOKUP.find("tcgetattr")
.map(memorySegment -> LINKER.downcallHandle(memorySegment, signature))
.orElseThrow();
Finally, we can call tcgetattr
native function, using the invoke
method on our method handle:
static final MemoryLayout MODE_LAYOUT = structLayout(
JAVA_INT.withName("c_iflag"),
JAVA_INT.withName("c_oflag"),
JAVA_INT.withName("c_cflag"),
JAVA_INT.withName("c_lflag"),
JAVA_BYTE.withName("c_line"),
sequenceLayout(Term.NCCS, JAVA_BYTE).withName("c_cc"));
static final Linker LINKER = Linker.nativeLinker();
static final SymbolLookup SYMBOL_LOOKUP = LINKER.defaultLookup();
void main() {
try (var arena = Arena.ofConfined()) {
final var MODE_SEGMENT = arena.allocate(MODE_LAYOUT);
final FunctionDescriptor signature = FunctionDescriptor.of(
ValueLayout.JAVA_INT, ValueLayout.JAVA_INT, ValueLayout.ADDRESS);
final MethodHandle tcgetattr = SYMBOL_LOOKUP.find("tcgetattr")
.map(memorySegment -> LINKER.downcallHandle(memorySegment, signature))
.orElseThrow();
final Object status = tcgetattr.invoke(Term.STD_OUT_FD, MODE_SEGMENT);
// TODO: explore return value and changes to our MODE_SEGMENT
}
}
interface Term {
int NCCS = 32;
int STD_OUT_FD = 0;
}
NOTE: We called our function with two values:
Term.STD_OUT_FD
which is the file descriptor of the standard output,MODE_SEGMENT
is is the pointer (or memory address) of our C-structtermios
instance.
0
is the file descriptor for standard input,1
for standard output and2
for standard error.
# FFM Return Value and Reading Memory Segment
That status
variable there is the return value of C-written tcgetattr
. It is an integer, so to can cast it and check
its value. Remember that 0
means all was well and anything else mean there was an error. Also recall that our memory
segment was zero initialized? Now we will read that memory and find that some fields are no more zeros.
To read value from our memory segment, we could compute the bytes offset of each field and read from there:
MODE_SEGMENT.get(JAVA_INT, 0) // read c_iflag
MODE_SEGMENT.get(JAVA_INT, 4) // then read c_iflag
But that is error-prone. To address that, Java FFM has another abstraction called VarHandle
to let us navigate memory
segments by paths described in a corresponding memory layout. Here is reading all the fields of our in-memory termios
C-object:
final int c_iflag = (int) MODE_LAYOUT.varHandle(groupElement("c_iflag")).get(MODE_SEGMENT, 0);
final int c_oflag = (int) MODE_LAYOUT.varHandle(groupElement("c_oflag")).get(MODE_SEGMENT, 0);
final int c_cflag = (int) MODE_LAYOUT.varHandle(groupElement("c_cflag")).get(MODE_SEGMENT, 0);
final int c_lflag = (int) MODE_LAYOUT.varHandle(groupElement("c_lflag")).get(MODE_SEGMENT, 0);
final int c_line = (int) MODE_LAYOUT.varHandle(groupElement("c_line")).get(MODE_SEGMENT, 0);
final List<Byte> c_cc = MODE_SEGMENT.asSlice(MODE_LAYOUT.byteOffset(groupElement("c_cc")))
.elements(JAVA_BYTE)
.map(elementSegment -> elementSegment.get(JAVA_BYTE, 0))
.toList();
System.out.println(STR."c_iflag = \{c_iflag}");
System.out.println(STR."c_oflag = \{c_oflag}");
System.out.println(STR."c_cflag = \{c_cflag}");
System.out.println(STR."c_lflag = \{c_lflag}");
System.out.println(STR."c_line = \{c_line}");
System.out.println(STR."c_cc = \{c_cc}");
My command (using Java 22, the preview features here are only the string template and the implicit classes) & output:
$ java --source 22 --enable-preview --enable-native-access=ALL-UNNAMED src/termios.java
c_iflag = 17664
c_oflag = 5
c_cflag = 191
c_lflag = 2619
c_line = 0
c_cc = [3, 28, 127, 21, 4, 0, 1, 0, 17, 19, 26, 0, 18, 15, 23, 22, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
In MODE_LAYOUT.varHandle(groupElement("c_iflag")).get(MODE_SEGMENT, 0)
:
MODE_LAYOUT.varHandle(groupElement("c_iflag"))
build the var handle,.get(MODE_SEGMENT, 0)
apply it to a segment to read some value,0
here is byte alignement, something out of scope for this article.
NOTE: The var handle needs a path to locate the layout to read. You can only read primitive layout (you can read
int
,byte
,long
,ADDRESS
but not array or object.) That is why we did not readc_cc
using a var handle. Although we could have read its entries individually using either:
layout.varHandle(groupElement("c_cc"), sequenceElement(i)).get(segment, 0)
, orlayout.varHandle(groupElement("c_cc"), sequenceElement()).get(segment, 0, i)
.Where
i
is the index of the element you want to be reading for thec_cc
array, from the memorysegment
.
What we did for c_cc
reading is:
- To get the address for that array (that means skipping some bytes to its offset - thus the offset computation
MODE_LAYOUT.byteOffset(groupElement("c_cc"))
), - Then have Java interpret it as an array of byte elements -
.elements(JAVA_BYTE)
- which result is a stream of memory segments for each byte in the segment, - And, finally, we read the byte value from each byte segment, and aggregate all the values in a list.
That is how we instantiate native objects, call native functions, pass it parameters, read their return values and, finally, read memory values modified by our native functions. This is made possible with core Java since Java 22, using the foreign function and memory API aka FFM. No external dependencies needed.
In the next article, we will explore the meaning of these termios
fields and do more:
- get our terminal size (rows and columns),
- change our terminal mode (like vim or nano, or something worse).
Until then, follow me on X to get in touch or continue the discussion.