RISC-V assembler overview

The RISC-V version of uLisp allows you to generate machine-code functions, integrated with Lisp, written in RISC-V code. The only boards it currently supports are the Sipeed MAiX RISC-V boards.

The RISC-V uLisp assembler has the following features:

  • You can create multiple named machine-code functions, limited only by the amount of code memory available.
  • Machine-code functions are created with a defcode special form, which has a similar syntax to defun.
  • You can include labels in your assembler listing simply by including them as symbols in the body of the defcode form. The defcode form creates these as local variables.
  • The defcode form automatically does a two-pass assembly to resolve forward references, used in branches and memory references.
  • The defcode form generates an assembler listing, showing the mnemonics and the machine-code generated from them.
  • The machine-code functions are saved with save-image, and restored with load-image.

The assembler itself is written in Lisp to make it easy to extend it or add new instructions. For example, you could add support for RISC-V floating-point instructions.

Get the assembler here: RISC-V assembler in uLisp.

To add the assembler to uLisp: do Select All and CopyPaste it into the field at the top of the Arduino IDE Serial Monitor window, and press Return. Or you could load it from an SD card.

For a summary of the RISC-V assembler instructions see RISC-V assembler instructions.

For some more complex examples see RISC-V assembler examples.

Saving an image

Once you have loaded the assembler, you can save the uLisp image to an SD card using:

(save-image)

In future you can then simply reload it using:

(load-image)

The defcode form

The assembler uses a special defcode form to generate machine-code functions.

defcode special form

Syntax: (defcode name (parameters) form*)

The defcode form is similar in syntax to defun. It creates a named machine-code function from a series of 16-bit integers given in the body of the form. These are written into RAM, and can be executed by calling the function in the same way as a normal Lisp function.

For example:

(defcode mul13 (x) #x45b5 #x0533 #x02b5 #x8082)

creates a machine-code routine called mul13, with one parameter, consisting of three instructions which multiplies its single integer argument by 13. For example:

> (mul13 10)
130

If you specify the machine code instructions as constants, as in the above example, you don't need to load the RISC‑V assembler.

Calling convention

Functions defined with defcode can take up to four parameters. These are passed to the machine-code routine in the registers a0 to a3 respectively. The symbols used for the four parameters can be used as synonyms for the corresponding register a0 to a3 in the body of the defcode form.

If a parameter is an integer its value is passed in the corresponding register; otherwise the address of the parameter is passed in the corresponding register. For examples showing how to access a list in a machine-code routine see RISC-V assembler examples - List examples.

The machine-code function should return the result back to uLisp in a0. This is returned as an integer.

Assembler

Although you can supply machine-code instructions as hexadecimal op-codes, the assembler is more convenient as it allows you to write machine-code functions in RISC-V mnemonics. It is written in uLisp.

Assembler syntax

Where possible the syntax is very similar to RISC-V assembler syntax, with the following differences:

  • The mnemonics are prefixed by '$' (because some mnemonics such as push and pop are already in use as Lisp functions).
  • Registers are represented as symbols, prefixed with a quote. Constants are just numbers.

Assembler instructions are just Lisp functions, so you can see the code they generate:

> ($li 'a1 13)
17845

The assembler includes a function x16 to print a 16-bit value in hexadecimal, so you can see the result in hexadecimal by writing:

> (x16 ($li 'a1 13))
#x45b5

The following table shows typical RISC-V assembler formats, and the equivalent in this Lisp assembler:

Examples RISC-V assembler uLisp assembler
Registers mv  a1, a2 ($mv 'a1 'a2)
Immediate li a0,2 ($li 'a0 2)
Load ld a0,8(sp) ($ld 'a0 8 '(sp))
Load in-line constant ldr  r0, label ($ldr 'r0 label)
Branch ble a0,a1,label ($ble 'a0 'a1 label)
Jump to subroutine jal label ($jal label)

Simple example

Here's a simple example consisting of three RISC-V instructions that multiplies its parameter by 13 and returns the result:

(defcode mul13 (x)
  ($li 'a1 13)
  ($mul 'a0 'a0 'a1)
  ($ret))

Evaluating this generates an assembler listing as follows:

0000 45b5 ($li 'a1 13)
0002 0533 ($mul 'a0 'a0 'a1)
0004 02b5 
0006 8082 ($ret)
We can then call the function as follows:
> (mul13 11)
143

The result is the number returned in the r0 register.

Note that functions written using defcode can't be relied upon to have a fixed position in memory and so should be position independent, and use only relative branches and memory references within the machine-code function.

Labels

You can include symbols in the body of the defcode form to create labels. The defcode assembler automatically creates these as local variables, and then does a two-pass assembly to resolve forward references. The assembler can then access these variables to calculate the offsets in branches and pc-relative addressing.

Note also that because uLisp requires comments starting with a semi-colon to be terminated by an open parenthesis, you can't put a comment immediately before a label. This is a limitation because the Arduino Serial Monitor removes all line break characters. You can use bracketing comments instead:

#| This is a comment |#

For example, here's a simple routine to calculate the Greatest Common Divisor, which uses two labels:

; Greatest Common Divisor
(defcode gcd (a b)
  swap
  ($mv 'a2 'a1)
  ($mv 'a1 'a0)
  again
  ($mv 'a0 'a2)
  ($sub 'a2 'a2 'a1)
  ($bltz 'a2 swap)
  ($bnez 'a2 again)
  ($ret))

Evaluating this form generates the following assembler listing:

0000      swap
0000 862e ($mv 'a2 'a1)
0002 85aa ($mv 'a1 'a0)
0004      again
0004 8532 ($mv 'a0 'a2)
0006 8e0d ($sub 'a2 'a2 'a1)
0008 4ce3 ($bltz 'a2 swap)
000a fe06 
000c fe65 ($bnez 'a2 again)
000e 8082 ($ret)

For example, to find the GCD of 3287 and 3460:

> (gcd 3287 3460)
173

For more examples see RISC-V assembler examples.