AVR assembler overview

The AVR version of uLisp allows you to generate machine-code functions, integrated with Lisp, written in AVR assembler. It is currently supported on the Curiosity Nano AVR128DA48 or AVR128DB48 boards, and on boards based on the ATmega1284P.

It has the following features:

  • You can create multiple named machine-code functions, limited only by the amount of code memory available.
  • Machine-code functions are created with a defcode special form, which has a similar syntax to defun.
  • You can include labels in your assembler listing simply by including them as symbols in the body of the defcode form. The defcode form creates these as local variables.
  • The defcode form automatically does a two-pass assembly to resolve forward references, used in branches, relative jumps, and memory references.
  • The defcode form generates an assembler listing, showing the mnemonics and the machine-code generated from them.
  • The machine-code functions are saved with save-image, and restored with load-image.

The assembler itself is written in Lisp to make it easy to extend it or add new instructions. For example, you could write assembler macros in Lisp.

Get the assembler here: AVR assembler in uLisp.

To add it to uLisp: do Select All and Copy, Paste it into the field at the top of the Arduino IDE Serial Monitor window, and press Return.

For details of the AVR instruction set see Microchip's AVR Instruction Set Manual.

For examples of using the assembler see AVR assembler examples.

Saving an image

Once you have loaded the assembler, you can save the uLisp image using:

(save-image)

In future you can then simply reload it using:

(load-image)

How it works

The AVR assembler works slightly differently from the ARM and RISC-V assemblers provided for uLisp. Because AVR processors can't execute programs in RAM, the AVR assembler copies the assembled machine code from RAM to flash, so that it can be executed in flash. This takes advantage of two utilities developed for AVR processors: the Flash Writing extension to Spence Konde's DxCore for the AVR128DA48 and AVR128DB48 boards, and the Optiboot Flasher extension for the ATmega1284P.

Although writing to flash causes wear on the flash, the datasheet gives the byte endurance as 100k erase/write cycles. The flash is only erased/rewritten when you evaluate a defcode form, so you’d have to evaluate a defcode 28 times a day for 10 years to wear it out.

An added benefit of enabling flash writing on these boards is that the flash can be used for saving the Lisp workspace, using save-image. On the AVR128DA48, AVR128DB48, and ATmega1284P boards this allows the entire workspace to be saved.

Enabling flash writing on a Curiosity Nano AVR128DA48 or AVR128DB48 board

To use the AVR assembler on the Curiosity Nano AVR128DA48 or AVR128DB48 boards you need to upload uLisp AVR Version 3.6 or later using Spence Konde's DxCore with the Flash Writing option set coorrectly; for full details see AVR DA and DB series boards.

Enabling flash writing on an ATmega1284P board

To use the AVR assembler on ATmega1284P boards you need to install the Optiboot Flasher bootloader using MCUdude's MightyCore before uploading uLisp AVR Version 3.6 or later.

The defcode form

The assembler uses a special defcode form to generate machine-code functions.

defcode special form

Syntax: (defcode name (parameters) form*)

The defcode form is similar in syntax to defun. It creates a named machine-code function from a series of 16-bit integers given in the body of the form. These are written into RAM, and can be executed by calling the function in the same way as a normal Lisp function.

For example:

(defcode swap (x) #x2789 #x2798 #x2789 #x9508)

creates a machine-code routine called swap, with one parameter, consisting of four instructions, which swaps the high and low bytes of its integer argument. For example:

> (format t "~x" (swap #x1234))
3412

Calling convention

Functions defined with defcode can take up to four integer parameters. These are passed to the machine-code routine in the following registers:

Parameter 1 2 3 4
Low byte r24 r22 r20 r18
High byte r25 r23 r21 r19

If a parameter is an integer its value is passed in the specified registers; otherwise the address of the parameter is passed in the specified registers.

The machine-code function should return the result back to uLisp in r24 (low byte) and r25 (high byte). This is returned as an integer.

Call-clobbered registers

The best registers to use in assembler functions are r0, r18–r27, r30, and r31. These are call clobbered; a function may use them without restoring the contents.

Call-saved registers

If you use r1, r2–r17, r28, or r29 you must restore their original contents. Note that by convention r1 always contains zero.

Relative, not absolute

Because the absolute addresses may change, programs should use only branches and relative jumps and calls; so $rjmp not $jmp, and $rcall not $call.

Assembler

Although you can supply machine-code instructions as hexadecimal op-codes, the assembler is more convenient as it allows you to write machine-code functions in mnemonics. It is written in uLisp.

Assembler syntax

Where possible the syntax is very similar to AVR assembler syntax, with the following differences:

  • The mnemonics are prefixed by '$' (because some mnemonics such as push and pop are already in use as Lisp functions).
  • Registers are represented as symbols, prefixed with a quote. Constants are just numbers.

The byte registers are r0 to r31. XL, XH, YL, YH, ZL, and ZH are synonyms for r26 to r31 respectively.

The word registers are X=r27:r26, Y=r29:r28, and Z=r31:r30.

Assembler instructions are just Lisp functions, so you can see the code they generate:

> ($eor 'r24 'r25)
10121

You can use format to print the value in hexadecimal; for example:

>  (format t "~x" ($eor 'r24 'r25))
2789

The following table shows typical AVR assembler formats, and the equivalent in this Lisp assembler:

Examples AVR assembler uLisp assembler
Push and pop push  r12 ($push 'r12)
Registers mov  r27, r25 ($mov 'r27 'r25)
Immediate ldi r22, 3 ($ldi 'r22 3)
Load indirect ld  r28, Z ($ld 'r28 'z)
Load indirect with increment ld  r30, X+ ($ld 'r30 'x+)
Load indirect with decrement ld  r30, -X ($ld 'r30 '-x)
Load indirect with displacement ldd r29, Z+3 ($ldd 'r29 'z 3)
Jump relative rjump label ($rjump label)
Branch brcc label ($br 'cc label)
Set flag sei ($se 'i)
Clear flag clc ($cl 'c)

Note that to reduce the amount of memory required by the assembler the branch instructions are all specified as $br followed by a second parameter specifying the two-letter condition code. Similarly for the set flag and clear flag instructions.

Simple example

Here's a neat example consisting of four instructions which swaps the high and low bytes of its integer argument. The argument is passed in r25 (high byte) and r24 (low byte) and returned in the same two registers:

(defcode swap (x)
  ($eor 'r24 'r25)
  ($eor 'r25 'r24)
  ($eor 'r24 'r25)
  ($ret))

Evaluating this generates an assembler listing as follows:

0000 89 27 ($eor 'r24 'r25)
0002 98 27 ($eor 'r25 'r24)
0004 89 27 ($eor 'r24 'r25)
0006 08 95 ($ret)
We can then call the function as follows:
> (format t "~x" (swap #x1234))
3412

Note that functions written using defcode can't be relied upon to have a fixed position in memory and so should be position independent, and use only relative branches and memory references within the machine-code function.

Labels

You can include symbols in the body of the defcode form to create labels. The defcode assembler automatically creates these as local variables, and then does a two-pass assembly to resolve forward references. The assembler can then access these variables to calculate the offsets in branches and pc-relative addressing.

Note also that because uLisp requires comments starting with a semi-colon to be terminated by an open parenthesis, you can't put a comment immediately before a label. This is a limitation because the Arduino Serial Monitor removes all line break characters. You can use bracketing comments instead:

#| This is a comment |#

For example, here's a simple routine to calculate the Greatest Common Divisor, which uses two labels:

; Greatest Common Divisor
(defcode gcd (x y)
  swap
  ($movw 'r30 'r22)
  ($movw 'r22 'r24)
  again
  ($movw 'r24 'r30)
  ($sub 'r30 'r22)
  ($sbc 'r31 'r23)
  ($br 'cs swap)
  ($br 'ne again)
  ($ret))

Evaluating this form generates the following assembler listing:

0000       swap
0000 fb 01 ($movw 'r30 'r22)
0002 bc 01 ($movw 'r22 'r24)
0004       again
0004 cf 01 ($movw 'r24 'r30)
0006 e6 1b ($sub 'r30 'r22)
0008 f7 0b ($sbc 'r31 'r23)
000a d0 f3 ($br 'cs swap)
000c d9 f7 ($br 'ne again)
000e 08 95 ($ret)

For example, to find the GCD of 3287 and 3460:

> (gcd 3287 3460)
173