A Lisp compiler to RISC-V written in Lisp

11th October 2024

This is a simple experimental Lisp compiler, written in uLisp, that will compile a Lisp function into RISC-V machine code. You can run the compiler on the RISC-V core of a Raspberry Pi Pico 2 (or another RP2350-based board):

RaspberryPiPico2.jpg

It's based on my earlier project A Lisp compiler to ARM written in Lisp.

Introduction

When I added the facility of executing machine code to uLisp I had in mind the eventual goal of being able to compile uLisp functions into machine code, and this is a first step in that direction.

The nice thing about compiling Lisp is that you don't have to write a tokeniser or parser, because Lisp programs are already in a consistent structure that can be processed by another Lisp program.

The compiler program is written in the subset of Common Lisp supported by uLisp, and will run on the RISC-V core of a RP2350-based board; I used a Raspberry Pi Pico 2. You can also run it using Common Lisp on a laptop or desktop computer, and display the code it generates, but of course you won't be able to run the RISC-V machine code because Common Lisp doesn't have uLisp's defcode command.

I got my initial inspiration for this compiler from Peter Norvig's book "Paradigms of Artificial Intelligence Programming" [1].

Resources

To use the compiler you first need to load the RISC-V assembler from: RISC-V assembler in uLisp.

Get the full source of the compiler here: Lisp compiler for RISC-V.

Or from GitHub here: https://github.com/technoblogy/lisp-arm-compiler.

For information about setting up uLisp on a Raspberry Pi Pico 2 see: Raspberry Pi Pico 2.

Using the compiler

To run the compiler you simply call compile on a Lisp function; for example:

(compile 'fibonacci)

The function will be compiled into a machine code function, replacing the original Lisp code, so that calling fibonacci will now execute the RISC-V machine-code version.

You can also display the code generated for an expression by calling comp on the expression; for example:

(pprint (comp '(* 13 17)))

(:integer
  ($li 'a0 13)
  ($addi 'sp 'sp -4)
  ($sw 'a0 0 '(sp))
  ($li 'a0 17)
  ($lw 'a1 0 '(sp))
  ($addi 'sp 'sp 4)
  ($mul 'a0 'a1 'a0))

The :integer prefix shows that the result is an integer; see below.

For examples of several simple Lisp programs that it will successfully compile see Examples below. These also give a comparison of the speed of the Lisp and machine-code versions.

Specification

The compiler understands the following Lisp objects:

Defining variables and functions: defun, setq

Symbols: nil, t

List functions: car, cdr

Arithmetic functions: +, -, *, /, mod, 1+, 1-

Arithmetic comparisons: =, <, <=, >, >=, /=

Conditionals: if, and, or

Tail-call optimisation

Although the compiler doesn't include any iteration constructs, it does provide tail-call optimisation which can make recursive programs as efficient as iterative ones. Consider this recursive program to add two positive numbers:

(defun add (a b)
  (if (= b 0) a
    (add (+ a 1) (- b 1))))

On a system without tail-call optimisation, evaluating:

(add 10000 10000)

will probably fail, because it requires 10000 stack frames to store the intermediate results. This compiler recognises that the recursive call to add can be replaced by a jump to the start of the program, and so it has no problem evaluating it. For a more sensible example see factor below.

How the compiler works

Register usage

To avoid needing to keep track of register usage the compiler makes use of the stack to pass values to an expression, and store the value returned by an expression.

The following table shows how the RISC-V registers are used within the compiler:

Registers Use
a0 a1 a2 a3 Used to pass the parameters to the main function's arguments.
a0 Contains the value returned by the main function.
a4 a5 a6 a7 Contain copies of the function arguments within the function.
a0 a1 Used to pass the arguments to each operator.
a0 Used to return the value from each operator.
s0 to s11 Local variables.

Compiling an expression

The following steps show the sequence of compiling an expression, such as:

(* x 13)
  • Code is generated to evaluate each of the arguments, in this case x and 13, and each result is pushed onto the stack, apart from the last which is left in a0.
  • The first value is popped from the stack into register a1.
  • The function, in this case *, is then evaluated for a1 and a0, with the result in a0.

This stack-based approach ensures that a more complex expression, such as:

(* (- x 1) (+ x 13))

will also compile into correct code, without conflicts between registers.

Calling the function recursively

The compiler supports calling a function recursively from within the function itself. Because the registers corresponding to the parameters and local variables would be overwritten by the recursive call they are stored on the stack around the function call.

There are several recursive functions in the examples below.

Types

For boolean operations I decided to represent nil as 0, and t as 1. A problem I hadn't anticipated was that I would need to keep track of what type of object each function returned, integer or boolean. For example, consider the problem of compiling the statement:

(and x y)

If x has the value 0 and y has the value 7 this should return 7. However, if x has the value nil and y has the value 7 this should return nil. Representing nil as zero leads to an ambiguity.

I solved this by returning a type, :integer or :boolean, with each compiled expression, according to the following rules:

  • Predicates, and t or nil, always return a :boolean.
  • Arithmetic operations always return an :integer.
  • An if form requires a :boolean test form and returns an :integer.
  • A progn or let block returns the type of its last expression.

An item with an ambiguous type returns the type nil.

Running the examples

I used the following simple examples to test the compiler. Before compiling a new function you might want to remove the previous one from memory using makunbound to free up the code memory before compiling the next function; for example:

(makunbound 'fibonacci)

Alternatively you could increase the amount of memory available for machine code by editing the directive such as:

#define CODESIZE 256 

before uploading uLisp to your board.

Examples

The following examples take integer arguments and return an integer result.

Factor

This function takes a simple approach to finding the least prime factor of a number:

(defun factor (n d)
  (if (> (* d d) n) n
   (if (= 0 (mod n d)) d
     (factor n (1+ d)))))

It should be called with n equal to the number to be factorized, and d=2. It takes advantage of the compiler's tail-call optimisation, which makes it as efficient as an iterative solution. If the number is prime, factor will print the number itself.

To find the least prime factor of 2146654199 (46327 x 46337):

Lisp version:

> (time (factor 2146654199 2))
46327
Time: 5.4 s

Compiled version:

> (time (factor 2146654199 2))
46327
Time: 19 ms

You can use the above function as the basis for a simple recursive routine to factorize a number into a list of its prime factors:

(defun factorize (n)
  (let ((f (factor n 2)))
    (if (= n f) (list n) (cons f (factorize (/ n f))))))

For example:

> (factorize 731731731)
(3 17 43 333667)

Takeuchi function

This is a version of the highly-recursive benchmark I use for comparing versions of Lisp [2]:

(defun tak (x y z)
  (if (>= y x) z
    (tak
     (tak (1- x) y z)
     (tak (1- y) z x)
     (tak (1- z) x y))))

Lisp version:

> (time (tak 18 12 6))
7
Time: 4.1 s

Compiled version

> (time (tak 18 12 6))
7
Time: 16 ms

Factorial

This is a recursive implementation of the factorial function:

(defun fact (n)
  (if (<= n 1) 1
    (* n (fact (- n 1)))))

Lisp version:

> (time (fact 12))
479001600
Time: 1 ms

Compiled version

> (time (fact 12))
479001600
Time: 0 ms

Fibonacci

This is a recursive implementation of the Fibonacci series:

(defun fibonacci (n)
  (if (< n 3) 1
    (+ (fibonacci (- n 1)) (fibonacci (- n 2)))))

Lisp version:

> (time (fibonacci 27))
196418 Time: 50.5 s

Compiled version

> (time (fibonacci 27))
196418
Time: 80 ms

Greatest Common Divisor

A recursive algorithm to calculate the greatest common divisor of two integers.

(defun gcd (a b)
  (if (= b 0) a
   (gcd b (mod a b))))

Lisp version:

> (time (gcd 2032460032 2056252672))
256
Time: 1 ms

Compiled version

> (time (gcd 2032460032 2056252672))
256
Time: 0 ms

Hofstadter Q sequence

This is one of several recursive sequences described in Douglas Hofstadter's book "Gödel, Escher, Bach: an Eternal Golden Braid". It is defined as follows:

(defun q (n)
  (if (<= n 2) 1
    (+
     (q (- n (q (- n 1))))
     (q (- n (q (- n 2)))))))

It is related to the Fibonacci sequence, except that in this case the two preceding terms specify how far to go back in the sequence to find the two terms to be summed.

Lisp version:

> (time (q 21))
12
Time: 8.6 s

Compiled version

> (time (q 21))
12
Time: 25 ms

Two-dimensional recursive function Q2

This function Q2 is my two-dimensional extension of the Hofstadter Q sequence [3]:

(defun q2 (x y)
  (if (or (< x 1) (< y 1)) 1
    (+ (q2 (- x (q2 (1- x) y)) y)
       (q2 x (- y (q2 x (1- y)))))))

Lisp version:

> (time (q2 7 8))
31
Time: 13.8 s

Compiled version

> (time (q2 7 8))
31
Time: 50 ms

Number of combinations - nCr

This is a simple but very inefficient way of recursively calculating nCr, based on Pascal's Triangle:

(defun ncr (n r)
  (if (or (= r 0) (= r n)) 1
    (+ (ncr (1- n) (1- r)) (ncr (1- n) r))))

For example, to calculate the number of possible poker hands from a pack of cards:

Lisp version:

> (time (ncr 52 5))
2598960
Time: 615.5 s

Compiled version

> (time (ncr 52 5))
2598960
Time: 1.7 s

List examples

Any of the arguments to a machine-code function can be a list, in which case the address of the list is passed to the routine in the corresponding parameter. You can then use the functions car and cdr to process the elements in the list.

Dot product

This recursive function calculates the dot product of two vectors:

(defun dot-product (a b)
  (if (and a b)
      (+ (* (car a) (car b)) (dot-product (cdr a) (cdr b)))
0))

It can handle two vectors of arbitrary length provided they are the same length.

For example, to calculate the following dot product:

(987 654 321) • (963 852 741) = 987 × 963 + 654 × 852 + 321 × 741 = 1745550

Lisp version:

> (time (dot-product '(987 654 321) '(963 852 741)))
1745550
Time: 0 ms

Compiled version

> (time (dot-product '(987 654 321) '(963 852 741)))
1745550
Time: 0 ms

Compiler source

Here's a description of the source of the compiler.

Invoking the compiler

To compile a Lisp function you simply give the command compile followed by the name of the function; for example, to compile the fibonacci function:

(compile 'fibonacci)

Here's the definition of the command compile:

(defun compile (name)
  (if (eq (car (eval name)) 'lambda)
    (eval (comp (cons 'defun (cons name (cdr (eval name))))))
 "Not a Lisp function"))

Main compiler function

The main function comp returns the compiled code for an expression or form, as a list of assembler instructions prefixed by the type, :integer or :boolean:

(defun comp (x &optional env tail)
  (cond
   ((null x) (type-code :boolean '(($li 'a0 0))))
   ((eq x t) (type-code :boolean '(($li 'a0 1))))
   ((symbolp x) (comp-symbol x env tail))
   ((atom x) (type-code :integer (list (list '$li ''a0 x))))
   (t (let ((fn (first x)) (args (rest x)))
        (case fn
          (defun (setq *label-num* 0)
                 (setq env (mapcar #'(lambda (x y) (cons x y)) (second args) *locals*))
                 (comp-defun (first args) (second args) (cddr args) env))
          (progn (comp-progn args env tail))
          (if    (comp-if (first args) (second args) (third args) env tail))
          (setq  (comp-setq args env tail))
          (t     (comp-funcall fn args env tail)))))))

The function comp takes the item x to be compiled, the current environment env associating each local variable with a register, and a flag tail which is true if the item has no successors.

Each of the different types of form are handled by separate functions such as comp-defun, comp-if, and comp-progn.

Utilities

The compiler uses the following utility functions:

The functions push-regs and pop-regs generate instructions to push a list of registers to the stack, and pop a list of registers from the stack:

(defun push-regs (&rest regs)
  (let ((n -4))
  (append
   (list (list '$addi ''sp ''sp (* -4 (length regs))))
   (mapcar #'(lambda (reg) (list '$sw (list 'quote reg) (incf n 4) ''(sp))) regs))))

(defun pop-regs (&rest regs)
  (let ((n (* 4 (length regs))))
  (append
   (mapcar #'(lambda (reg) (list '$lw (list 'quote reg) (decf n 4) ''(sp))) regs)
   (list (list '$addi ''sp ''sp (* 4 (length regs)))))))

The function mappend applies a function to the elements of a list, and the results, which should be lists, are appended together:

(defun mappend (fn lst)
  (apply #'append (mapcar fn lst)))

The function type-code adds a code type label to the front of a list of assembler instructions, and the functions code-type and code return the code type label, and the code list, respectively:

(defun type-code (type code) (cons type code))

(defun code-type (type-code) (car type-code))

(defun code (type-code) (cdr type-code))

The function checktype gives an error if the value returned is not the correct type:

(defun checktype (fn type check)
  (unless (or (null type) (null check) (eq type check))
    (error "Argument to '~a' must be ~a not ~a~%" fn check type)))

The lists *params* and *locals* list the registers available for use in the compiler:

(defvar *params* '(a0 a1 a2 a3))

(defvar *locals* '(a4 a5 s0 s1 a6 a7 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11))

Finally, gen-label generates a new label for use in branches and jumps:

(defvar *label-num* 0)

(defun gen-label ()
  (read-from-string (format nil "lab~d" (incf *label-num*))))

The remaining functions handle the compiling of specific types of Lisp form:

Symbols

The environment is represented by an association list giving the register associated with each variable, such as:

((y . r5) (x . r4))

The function comp-symbol looks up a symbol in the association list and returns the appropriate register:

(defun comp-symbol (x env)
  (let ((reg (cdr (assoc x env))))
    (type-code nil (list (list '$mv ''a0 (list 'quote reg))))))

Assignment

The function comp-setq handles assignment to a variable:

(defun comp-setq (args env tail)
  (let ((value (comp (second args) env tail))
        (reg (cdr (assoc (first args) env))))
    (type-code 
     (code-type value) 
     (append (code value) (list (list '$mv (list 'quote reg) ''a0))))))

Function definition

The definition of the function being compiled is handled by comp-defun:

(defun comp-defun (name args body env)
  (setq *used-params* (subseq *locals* 0 (length args)))
  (append 
   (list 'defcode name args)
   (list name)
   (apply #'append 
          (mapcar #'(lambda (x y) (list (list '$mv (list 'quote x) (list 'quote y))))
                  *used-params* *params*))
   (code (comp-progn body env t))))

Progn special form

The function comp-progn compiles a progn form:

(defun comp-progn (exps env tail)
  (let* ((len (1- (length exps)))
         (nlast (subseq exps 0 len))
         (last1 (nth len exps))
         (start (mappend #'(lambda (x) (append (code (comp x env t)))) nlast))
         (end (comp last1 env tail)))
    (type-code (code-type end) (append start (code end)))))

It compiles code to evaluate each expression in the body of the progn, discarding all but the last results, and returns the type of the last form as the type of the whole block.

If special form

The function comp-if compiles the code for an if special form:

(defun comp-if (pred then else env tail)
  (let ((lab1 (gen-label))
        (lab2 (gen-label))
        (test (comp pred env nil)))
    (checktype 'if (car test) :boolean)
    (type-code :integer
               (append
                (code test) (list (list '$beqz ''a0 lab1))
                (code (comp then env t)) (list (list '$j lab2) lab1)
                (code (comp else env tail)) (list lab2)
                (when tail '(($ret)))))))

Function calls

Finally, comp-funcall compiles code for function calls to the built-in functions, or a recursive call to the main function:

(defun comp-funcall (f args env tail)
  (let ((test (assoc f '((< . $slt) (> . $sgt))))
        (teste (assoc f '((= . $seqz) (/= . $snez))))
        (testn (assoc f '((>= . $slt) (<= . $sgt))))
        (logical (assoc f '((and . $and) (or . $or))))
        (arith1 (assoc f '((1+ . 1) (1- . -1))))
        (arith (assoc f '((+ . $add) (- . $sub) (* . $mul) (/ . $div) (mod . $rem)))))
    (cond
     ((or test teste testn)
      (type-code :boolean
                   (append
                    (comp-args f args 2 :integer env)
                    (pop-regs 'a1)
                    (cond
                     (test (list (list (cdr test) ''a0 ''a1 ''a0)))
                     (teste (list '($sub 'a0 'a1 'a0) (list (cdr teste) ''a0 ''a0)))
                     (testn (list (list (cdr testn) ''a0 ''a1 ''a0) '($xori 'a0 'a0 1))))
                    (when tail '(($ret))))))
     (logical 
      (type-code :boolean
                 (append
                  (comp-args f args 2 :boolean env)
                  (pop-regs 'a1)
                  (list (list (cdr logical) ''a0 ''a0 ''a1))
                  (when tail '(($ret))))))
     (arith1
      (type-code :integer
                 (append
                  (comp-args f args 1 :integer env)
                  (list (list '$addi ''a0 ''a0 (cdr arith1)))
                  (when tail '(($ret))))))
     (arith
      (type-code :integer 
                 (append
                  (comp-args f args 2 :integer env)
                  (pop-regs 'a1)
                  (list (list (cdr arith) ''a0 ''a1 ''a0))
                  (when tail '(($ret))))))
     ((member f '(car cdr))
      (type-code :integer
                 (append
                  (comp-args f args 1 :integer env)
                  (if (eq f 'cdr) (list '($lw 'a0 4 '(a0)))
                    (list '($lw 'a0 0 '(a0)) '($lw 'a0 4 '(a0))))
                  (when tail '(($ret))))))
     (t ; function call
      (type-code :integer 
                 (append
                  (comp-args f args nil :integer env)
                  (when (> (length args) 1)
                    (append
                     (list (list '$mv (list 'quote (nth (1- (length args)) *params*)) ''a0))
                     (apply #'pop-regs (subseq *params* 0 (1- (length args))))))
                  (cond
                   (tail (list (list '$j f)))
                   (t (append
                       (apply #'push-regs (cons 'ra (reverse *used-params*)))
                       (list (list '$jal f))
                       (apply 'pop-regs (append *used-params* (list 'ra))))))))))))

The arithmetic comparisons take advantage of the RISC-V instructions such as slt (Set if less than), which set the destination register to 0 if the comparison is false, and to 1 if it's true; this provides the required boolean result without needing a branch.

The function comp-funcall uses the routine comp-args to generate code to compile each of the arguments to a function:

(defun comp-args (fn args n type env)
  (unless (or (null n) (= (length args) n))
    (error "Incorrect number of arguments to '~a'" fn))
  (let ((n (length args)))
    (mappend #'(lambda (y)
                 (let ((c (comp y env nil)))
                   (decf n)
                   (checktype fn type (code-type c))
                   (if (zerop n) (code c) (append (code c) (push-regs 'a0)))))
             args)))

Appendix

The following example shows the code generated by a simple function, rec, a recursive function related to the factorial function:

(defun rec (n)
  (1+ (* n (if (= n 0) 0 (rec (1- n))))))

Compiling this gives the following RISC-V machine code:

> (compiler 'rec)
0000      rec
0000 872a ($mv 'a4 'a0)
0002 853a ($mv 'a0 'a4)
0004 1171 ($addi 'sp 'sp -4)
0006 c02a ($sw 'a0 0 '(sp))
0008 853a ($mv 'a0 'a4)
000a 1171 ($addi 'sp 'sp -4)
000c c02a ($sw 'a0 0 '(sp))
000e 4501 ($li 'a0 0)
0010 4582 ($lw 'a1 0 '(sp))
0012 0111 ($addi 'sp 'sp 4)
0014 8533 ($sub 'a0 'a1 'a0)
0016 40a5 
0018 3513 ($seqz 'a0 'a0)
001a 0015 
001c c119 ($beqz 'a0 lab1)
001e 4501 ($li 'a0 0)
0020 a819 ($j lab2)
0022      lab1
0022 853a ($mv 'a0 'a4)
0024 157d ($addi 'a0 'a0 -1)
0026 1161 ($addi 'sp 'sp -8)
0028 c006 ($sw 'ra 0 '(sp))
002a c23a ($sw 'a4 4 '(sp))
002c f0ef ($jal rec)
002e fd5f 
0030 4712 ($lw 'a4 4 '(sp))
0032 4082 ($lw 'ra 0 '(sp))
0034 0121 ($addi 'sp 'sp 8)
0036      lab2
0036 4582 ($lw 'a1 0 '(sp))
0038 0111 ($addi 'sp 'sp 4)
003a 8533 ($mul 'a0 'a1 'a0)
003c 02a5 
003e 0505 ($addi 'a0 'a0 1)
0040 8082 ($ret)

Trying it out:

> (rec 12)
1302061345

This example demonstrates how the RISC-V Lisp assembler takes advantage of 16-bit compressed instructions where possible, instead of the equivalent full 32-bit instructions.


  1. ^ Norvig, Peter "Paradigms of Artificial Intelligence Programming" Morgan Kaufmann Publishers, Inc, San Francisco, 1992, pp 784-833, available as a PDF paip-lisp on GitHub.
  2. ^ Benchmarks - Takeuchi function.
  3. ^ Benchmarks - Two-dimensional recursive function Q2.