abu@software-lab.de

Native C Calls

(c) Software Lab. Alexander Burger

This document describes how to call C functions in shared object files (libraries) from PicoLisp, using the built-in native function - possibly with the help of the struct and lisp functions.


Overview

native calls a C function in a shared library. It tries to

  1. find a library by name
  2. find a function by name in the library
  3. convert the function's argument(s) from Lisp to C data structures
  4. call the function's C code
  5. convert the function's return value(s) from C to Lisp data structures

The direct return value of native is the Lisp representation of the C function's return value. Further values, returned by reference from the C function, are available in Lisp variables (symbol values).

struct is a helper function, which can be used to manipulate C data structures in memory. It may take a scalar (a numeric representation of a C value) to convert it to a Lisp item, or (more typically) a pointer to a memory area to build and extract data structures. lisp allows you to install callback functions, callable from C code, written in Lisp.

%@ is a convenience function, simplifying the most common use case of native.

In combination, these functions can interface PicoLisp to almost any C function.

The above steps are fully dynamic; native doesn't have (and doesn't require) a priori knowledge about the library, the function or the involved data. No need to write any glue code, interfaces or include files. All functions can even be called interactively from the REPL.


Syntax

The arguments to native are

  1. a library
  2. a function
  3. a return value specification
  4. optional arguments

The simplest form is a call to a function without return value and without arguments. If we assume a library "lib.so", containing a function with the prototype

void fun(void);

then we can call it as

(native "lib.so" "fun")


Libraries

The first argument to native specifies the library. It is either the name of a library (a symbol), or the handle of a previously found library (a number).

As a special case, a transient symbol "@" can be passed for the library name. It then refers to the current main program (instead of an external library), and can be used for standard functions like "malloc" or "free". Because this is needed so often,

(%@ "fun" ...)

can be used instead of

(native "@" "fun" ...)

native uses dlopen(3) internally to find and open the library, and to obtain the handle. If the name contains a slash ('/'), then it is interpreted as a (relative or absolute) pathname. Otherwise, the dynamic linker searches for the library according to the system's environment and directories. See the man page of dlopen(3) for further details.

If called with a symbolic argument, native automatically caches the handle of the found library in the value of that symbol. The most natural way is to pass the library name as a transient symbol ("lib.so" above): The initial value of a transient symbol is that symbol itself, so that native receives the library name upon the first call. After successfully finding and opening the library, native stores the handle of that library in the value of the passed symbol ("lib.so"). As native evaluates its arguments in the normal way, subsequent calls within the same transient scope will receive the numeric value (the handle), and don't need to open and search the library again.


Functions

The same rules applies to the second argument, the function. When called with a symbol, native stores the function handle in its value, so that subsequent calls evaluate to that handle, and native can directly jump to the function.

native uses dlsym(3) internally to obtain the function pointer. See the man page of dlsym(3) for further details.

In most cases a program will call more than one function from a given library. If we keep the code within the same transient scope (i.e. in the same source file), each library will be opened - and each function searched - only once.

(native "lib.so" "fun1")
(native "lib.so" "fun2")
(native "lib.so" "fun3")

After "fun1" was called, "lib.so" will be open, and won't be re-opened for "fun2" and "fun3". Consider the definition of helper functions:

(de fun1 ()
   (native "lib.so" "fun1") )

(de fun2 ()
   (native "lib.so" "fun2") )

(de fun3 ()
   (native "lib.so" "fun3") )

After any one of fun1, fun2 or fun3 was called, the symbol "lib.so" will hold the library handle. And each function "fun1", "fun2" and "fun3" will be searched only when called the first time.

Note that the function handle points to a structure in memory, which is automatically allocated. This implies that a memory leak may occur if the transient symbol holding the function handle goes out of scope (e.g. by repeated (re)loading the library after executing its functions).

Warning: It should be avoided to put more than one library into a single transient scope if there is a chance that two different functions with the same name will be called in two different libraries. Because of the function handle caching, the second call would otherwise (wrongly) go to the first function.


Return Value

The (optional) third argument to native specifies the return value. A C function can return many types of values, like integer or floating point numbers, string pointers, or pointers to structures which in turn consist of those types, and even other structures or pointers to structures. native tries to cover most of them.

As described in the result specification, the third argument should consist of a pattern which tells native how to extract the proper value.

Primitive Types

In the simplest case, the result specification is NIL like in the examples so far. This means that either the C function returns void, or that we are not interested in the value. The return value of native will be NIL in that case.

If the result specification is one of the symbols B, I or N, an integer number is returned, by interpreting the result as a char (8 bit unsigned byte), int (32 bit signed integer), or long number (64 bit signed integer), respectively. Other (signed or unsigned numbers, and of different sizes) can be produced from these types with logical and arithmetic operations if necessary.

If the result specification is the symbol C, the result is interpreted as a 16 bit number, and a single-char transient symbol (string) is returned.

A specification of S tells native to interpret the result as a pointer to a C string (null terminated), and to return a transient symbol (string).

If the result specification is a number, it will be used as a scale to convert a returned double (if the number is positive) or float (if the number is negative) to a scaled fixpoint number.

Examples for function calls, with their corresponding C prototypes:

(native "lib.so" "fun" 'I)             # int fun(void);
(native "lib.so" "fun" 'N)             # long fun(void);
(native "lib.so" "fun" 'P)             # void *fun(void);
(native "lib.so" "fun" 'S)             # char *fun(void);
(native "lib.so" "fun" 1.0)            # double fun(void);

Arrays and Structures

If the result specification is a list, it means that the C function returned a pointer to an array, or an arbitrary memory structure. The specification list should then consist of either the above primitive specifications (symbols or numbers), or of cons pairs of a primitive specification and a repeat count, to denote arrays of the given type.

Examples for function calls, with their corresponding pseudo C prototypes:

(native "lib.so" "fun" '(I . 8))       # int *fun(void);  // 8 integers
(native "lib.so" "fun" '(B . 16))      # unsigned char *fun(void);  // 16 bytes

(native "lib.so" "fun" '(I I))         # struct {int i; int j;} *fun(void);
(native "lib.so" "fun" '(I . 4))       # struct {int i[4];} *fun(void);

(native "lib.so" "fun" '(I (B . 4)))   # struct {
                                       #    int i;
                                       #    unsigned char c[4];
                                       # } *fun(void);

(native "lib.so" "fun"                 # struct {
   '(((B . 4) I) (S . 12) (N . 8)) )   #    struct {unsigned char c[4]; int i;}
                                       #    char *names[12];
                                       #    long num[8];
                                       # } *fun(void);

If a returned structure has an element which is a pointer to some other structure (i.e. not an embedded structure like in the last example above), this pointer must be first obtained with a N pattern, which can then be passed to struct for further extraction.


Arguments

The (optional) fourth and following arguments to native specify the arguments to the C function.

Primitive Types

Integer arguments (up to 64 bits, signed or unsigned char, short, int or long) can be passed as they are: As numbers.

(native "lib.so" "fun" NIL 123)        # void fun(int);
(native "lib.so" "fun" NIL 1 2 3)      # void fun(int, long, short);

String arguments can be specified as symbols. native allocates memory for each string on the stack, passes the pointer to the C function, and cleans up the stack when done.

(native "lib.so" "fun" NIL "abc")      # void fun(char*);
(native "lib.so" "fun" NIL 3 "def")    # void fun(int, char*);

Note that the allocated string memory is released after the return value is extracted. This allows a C function to return the argument string pointer, perhaps after modifying the data in-place, and receive the new string as the return value (with the S specification).

(native "lib.so" "fun" 'S "abc")       # char *fun(char*);

Also note that specifying NIL as an argument passes an empty string ("", which also reads as NIL in PicoLisp) to the C function. Physically, this is a pointer to a NULL-byte, and is not a NULL-pointer. Be sure to pass 0 (the number zero) if a NULL-pointer is desired.

Floating point arguments are specified as cons pairs, where the value is in the CAR, and the CDR holds the fixpoint scale. If the scale is positive, the number is passed as a double, otherwise as a float.

(native "lib.so" "fun" NIL             # void fun(double, float);
   (12.3 . 1.0) (4.56 . -1.0) )

Arrays and Structures

Composite arguments are specified as nested list structures. native allocates memory for each array or structure (with malloc(3)), passes the pointer to the C function, and releases the memory (with free(3)) when done.

This implies that such an argument can be both an input and an output value to a C function (pass by reference).

The CAR of the argument specification can be NIL (then it is an input-only argument). Otherwise, it should be a variable which receives the returned structure data.

The CADR of the argument specification must be a cons pair with the total size of the structure in its CAR. The CDR is ignored for input-only arguments, and should contain a result specification for the output value to be stored in the variable.

For example, a minimal case is a function that takes an integer reference, and stores the number '123' in that location:

void fun(int *i) {
   *i = 123;
}

We call native with a variable X in the CAR of the argument specification, a size of 4 (i.e. sizeof(int)), and I for the result specification. The stored value is then available in the variable X:

: (native "lib.so" "fun" NIL '(X (4 . I)))
-> NIL
: X
-> 123

The rest (CDDR) of the argument specification may contain initialization data, if the C function expects input values in the structure. It should be a list of initialization items, optionally with a fill-byte value in the CDR of the last cell.

If there are no initialization items and just the final fill-byte, then the whole buffer is filled with that byte. For example, to pass a buffer of 20 bytes, initialized to zero:

: (native "lib.so" "fun" NIL '(NIL (20) . 0))

A buffer of 20 bytes, with the first 4 bytes initialized to 1, 2, 3, and 4, and the rest filled with zero:

: (native "lib.so" "fun" NIL '(NIL (20) 1 2 3 4 . 0))

and the same, where the buffer contents are returned as a list of bytes in the variable X:

: (native "lib.so" "fun" NIL '(X (20 B . 20) 1 2 3 4 . 0))

For a more extensive example, let's use the following definitions:

typedef struct value {
   int x, y;
   double a, b, c;
   int z;
   char nm[4];
} value;

void fun(value *val) {
   printf("%d %d\n", val->x, val->y);
   val->x = 3;
   val->y = 4;
   strcpy(val->nm, "OK");
}

We call this function with a structure of 40 bytes, requesting the returned data in V, with two integers (I . 2), three doubles (100 . 3) with a scale of 2 (1.0 = 100), another integer I and four characters (C . 4). If the structure gets initialized with two integers 7 and 6, three doubles 0.11, 0.22 and 0.33, and another integer 5 while the rest of the 40 bytes is cleared to zero

: (native "lib.so" "fun" NIL
   '(V (40 (I . 2) (100 . 3) I (C . 4)) -7 -6 (100 11 22 33) -5 . 0) )

then it will print the integers 7 and 6, and V will contain the returned list

((3 4) (11 22 33) 5 ("O" "K" NIL NIL))

i.e. the original integer values 7 and 6 replaced with 3 and 4.

Note that the allocated structure memory is released after the return value is extracted. This allows a C function to return the argument structure pointer, perhaps after modifying the data in-place, and receive the new structure as the return value - instead of (or even in addition to) to the direct return via the argument reference.


Memory Management

The preceding Arguments section mentions that native implicitly allocates and releases memory for strings, arrays and structures.

Technically, this mimics automatic variables in C.

For a simple example, let's assume that we want to call read(2) directly, to fetch a 4-byte integer from a given file descriptor. This could be done with the following C function:

int read4bytes(int fd) {
   char buf[4];

   read(fd, buf, 4);
   return *(int*)buf;
}

buf is an automatic variable, allocated on the stack, which disappears when the function returns. A corresponding native call would be:

(%@ "read" 'I Fd '(Buf (4 . I)) 4)

The structure argument (Buf (4 . I)) says that a space of 4 bytes should be allocated and passed to read, then an integer I returned in the variable Buf (the return value of native itself is the integer returned by read). The memory space is released after that.

(Note that we can call %@ here, as read resides in the main program.)

Instead of a single integer, we might want a list of four bytes to be returned from native:

(%@ "read" 'I Fd '(Buf (4 B . 4)) 4)

The difference is that we wrote (B . 4) (a list of 4 bytes) instead of I (a single integer) for the result specification (see the Arrays and Structures section).

Let's see what happens if we extend this example. We'll write the four bytes to another file descriptor, after reading them from the first one:

void copy4bytes(int fd1, int fd2) {
   char buf[4];

   read(fd1, buf, 4);
   write(fd2, buf, 4);
}

Again, buf is an automatic variable. It is passed to both read and write. A direct translation would be:

(%@ "read" 'I Fd '(Buf (4 B . 4)) 4)
(%@ "write" 'I Fd2 (cons NIL (4) Buf) 4)

This works as expected. read returns a list of four bytes in Buf. The call to cons builds the structure

(NIL (4) 1 2 3 4)

i.e. no return variable, a four-byte memory area, filled with the four bytes (assuming that read returned 1, 2, 3 and 4). Then this structure is passed to write.

But: This solution induces quite some overhead. The four-byte buffer is allocated before the call to read and released after that, then allocated and released again for write. Also, the bytes are converted to a list to be stored in Buf, then that list is extended for the structure argument to write, and converted again back to the raw byte array. The data in the list itself are never used.

If the above operation is to be used more than once, it is better to allocate the buffer manually, use it for both reading and writing, and then release it. This also avoids all intermediate list conversions.

(let Buf (%@ "malloc" 'P 4)  # Allocate memory
   (%@ "read" 'I Fd Buf 4)   # (Possibly repeat this several times)
   (%@ "write" 'I Fd2 Buf 4)
   (%@ "free" NIL Buf) )     # Release memory

To allocate such a buffer locally on the stack (just like a C function would do), buf can be used. Equivalent to the above is:

(buf Buf 4  # Allocate local memory
   (%@ "read" 'I Fd Buf 4)
   (%@ "write" 'I Fd2 Buf 4) )

Fast Fourier Transform

For a more typical example, we might call the Fast Fourier Transform using the library from the FFTW package. With the example code for calculating Complex One-Dimensional DFTs:

#include <fftw3.h>
...
{
   fftw_complex *in, *out;
   fftw_plan p;
   ...
   in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
   out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
   p = fftw_plan_dft_1d(N, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
   ...
   fftw_execute(p); /* repeat as needed */
   ...
   fftw_destroy_plan(p);
   fftw_free(in); fftw_free(out);
}

we can build the following equivalent:

(load "@lib/math.l")

(de FFTW_FORWARD . -1)
(de FFTW_ESTIMATE . 64)

(de fft (Lst)
   (let
      (Len (length Lst)
         In (native "libfftw3.so" "fftw_malloc" 'P (* Len 16))
         Out (native "libfftw3.so" "fftw_malloc" 'P (* Len 16))
         P (native "libfftw3.so" "fftw_plan_dft_1d" 'N
            Len In Out FFTW_FORWARD FFTW_ESTIMATE ) )
      (struct In NIL (cons 1.0 (apply append Lst)))
      (native "libfftw3.so" "fftw_execute" NIL P)
      (prog1
         (struct Out (make (do Len (link (1.0 . 2)))))
         (native "libfftw3.so" "fftw_destroy_plan" NIL P)
         (native "libfftw3.so" "fftw_free" NIL Out)
         (native "libfftw3.so" "fftw_free" NIL In) ) ) )

This assumes that the argument list Lst is passed as a list of complex numbers, each as a list of two numbers for the real and imaginary part, like

(fft '((1.0 0) (1.0 0) (1.0 0) (1.0 0) (0 0) (0 0) (0 0) (0 0)))

The above translation to Lisp is quite straightforward. After the two buffers are allocated, and a plan is created, struct is called to store the argument list in the In structure as a list of double numbers (according to the 1.0 initialization item). Then fftw_execute is called, and struct is called again to retrieve the result from Out and return it from fft via the prog1. Finally, all memory is released.

Constant Data

If such allocated data (strings, arrays or structures passed to native) are constant during the lifetime of a program, it makes sense to allocate them only once, before their first use. A typical candidate is the format string of a printf call. Consider a function which prints a floating point number in scientific notation:

(load "@lib/math.l")

: (de prf (Flt)
   (%@ "printf" NIL "%e\n" (cons Flt 1.0)) )
-> prf

: (prf (exp 12.3))
2.196960e+05


Callbacks

Sometimes it is necessary to do the reverse: Call Lisp code from C code.

This mechanism uses the Lisp-level function lisp. No C source code access is required.

lisp returns a function pointer, which can be passed to C functions via native. When this function pointer is dereferenced and called from the C code, the corresponding Lisp function is invoked. Only five numeric arguments and a numeric return value can be used, and other data types must be handled by the Lisp function with struct and memory management operations.

Callbacks are often used in user interface libraries, to handle key-, mouse- and other events. Examples can be found in "@lib/openGl.l". The following function mouseFunc takes a Lisp function, installs it under the tag mouseFunc (any other tag would be all right too) as a callback, and passes the resulting function pointer to the OpenGL glutMouseFunc() function, to set it as a callback for the current window:

(de mouseFunc (Fun)
   (native `*GlutLib "glutMouseFunc" NIL (lisp 'mouseFunc Fun)) )

(The global *GlutLib holds the library "/usr/lib/libglut.so". The backquote (`) is important here, so that the transient symbol with the library name (and not the global *GlutLib) is evaluated by native, resulting in the proper library handle at runtime).

A program using OpenGL may then use mouseFunc to install a function

(mouseFunc
   '((Btn State X Y)
      (do-something-with Btn State X Y) ) )

so that future clicks into the window will pass the button, state and coordinates to that function.