uLisp - ARM assembler Thumb-2 extensions

ARM assembler Thumb-2 extensions

This page describes functions to add support to the uLisp ARM assembler for additional Thumb-2 instructions, including the it (if-then-else) conditional execution instruction, instructions using the imm12 immediate format, and signed and unsigned division.

Introduction

The original ARM instruction set consisted of a small number of 32-bit instructions.

ARM subsequently released an alternative instruction set called Thumb (later referred to as Thumb-1) in which most of the instructions are 16 bit, resulting in more compact code, and this is the instruction set supported by the original uLisp ARM assembler.

A subsequent release called Thumb-2 extended the Thumb architecture by adding:

A substantial number of 32-bit Thumb instructions covering most of the functionality of the original ARM instruction set, but without the condition field.
Several 16-bit Thumb instructions. One of these, the it (if-then-else) instruction, provides an efficient alternative mechanism for conditional execution.
An imm12 immediate format, allowing many commonly used 32-bit constant formats to be represented as the operand in 32-bit instructions.

I've implemented support for what I think will be the most useful Thumb-2 instructions for assembler programming in uLisp, and this document describes them, with examples of how you might want to use them.

Loading the Thumb-2 extensions

To add the Thumb-2 extensions load the standard assembler file first, followed by the extensions file, because some of the extensions add extended versions of the instructions in the main file.

Get the standard assembler file here: ARM assembler in uLisp.

Get the extensions here: ARM Thumb-2 extensions.

Note that these extensions require a board with an M4, M33, or later ARM processor. They won't work on boards with an M0 or M0+ ARM processor, such as the ATSAMD21 used in the Arduino Zero, the RP2040 used in the Raspberry Pi Pico, or the nRF51822 used in the BBC Micro:bit v1.

The it instruction

The it instruction deserves a bit of explanation, because it is unlike any of the other instructions. It provides an if … then … else construct that allows you to conditionally execute up to four instructions following a condition, without having to have branch instructions.

The it instruction takes two arguments:

An optional sequence of zero to three 't' or 'e' characters, representing "then" or "else".
A two-letter condition code. These are the same as the branch instruction suffixes.

There is an implicit "then", so the first instruction after the it instruction is executed if the condition code is true.

Up to three more instructions after the it instruction are executed according to the sequence of 't' and 'e' characters. A 't' (then) means that the instruction is executed if the condition was true, and an 'e' (else) means that the instruction is executed if it was false.

For example:

($it 'eet 'eq)
(... executed if Z flag set ...)
(... executed if Z flag clear ...)
(... executed if Z flag clear ...)
(... executed if Z flag set ...)
(... normal execution ...)

Note that 16-bit instructions in the it instruction block, other than cmp, cmn and tst, do not set the condition code flags.

Alternative conditional branch format

In the original uLisp ARM assembler, conditional branches had a format such as:

($bgt label)

For consistency with the it instruction the Thumb-2 extensions file adds support for an alternative format where the condition code is a separate two-letter symbol, such as:

($b 'gt label)

The imm12 immediate constant format

Several 32-bit Thumb-2 instructions support a 12-bit immediate constant format, called imm12, that is designed to allow you to encode many commonly occurring formats of 32-bit constants in just 12 bits. These are:

An 8-bit constant.
A constant that can be produced by shifting an 8-bit value left by any number of bits.
A replicated halfword constant of the form #x00XY00XY.
A replicated halfword constant of the form #xXY00XY00.
A replicated byte constant of the form #xXYXYXYXY.

Instructions that allow you to use an imm12 constant are mov, and, eor, orr, bic, mvn, and tst.

Logical immediate instructions

The logical instructions in the original Thumb-1 instruction only provided register-to-register versions, so to perform an operation with an immediate argument you had to load the number into a register first. The Thumb-2 extensions include 32-bit immediate versions of all the logical instructions, which can often allow a more compact solution to a problem.

The Thumb-2 logical immediate instructions all offer a version that doesn't affect the condition codes. For simplicity I've only implemented the versions that do affect the condition codes, so they are consistent with the Thumb-1 register-to-register versions.

Examples

It's not obvious what some of these Thumb-2 instructions might be useful for, so the following examples demonstrate some possible applications:

Implementing the Lisp ash function – it

To demonstrate a use of the it instruction, here is how you might implement the Lisp ash function in ARM assembler both without and with the it instruction.

The ash function gives a left shift if its second argument is positive, and a right shift if it's negative. We can implement this using two branches:

(defcode my-ash (x y)
   ($cmp 'r1 0)
   ($b 'lt lab1)
   ($lsl 'r0 'r1)
   ($b lab2)
   lab1
   ($neg 'r1 'r1)
   ($asr 'r0 'r1)
   lab2
   ($bx 'lr))

The it instruction gives a more elegant implementation with no branches:

(defcode my-ash (x y)
  ($cmp 'r1 0)
  ($it 'te 'lt)
  ($neg 'r1 'r1) ; then
  ($asr 'r0 'r1) ; then
  ($lsl 'r0 'r1) ; else
  ($bx 'lr))

For example:

> (format t "~b" (my-ash #b10001101 -2))
100011

> (format t "~b" (my-ash #b10001101 2))
1000110100

Stretch - the imm12 immediate instructions

The following example seems almost perfectly designed to show off the imm12 immediate constant format in many of the Thumb-2 instructions. It takes a 16-bit number and stretches it to 32 bits, filling the gaps with zeroes, and works without needing iteration ^[1]. If we call the bits a to p the operation can be represented as:

abcdefghijklmnop -> 0a0b0c0d0e0f0g0h0i0j0k0l0m0n0o0p

Here's the function stretch:

(defcode stretch (x)
  ($and 'r1 'r0 #xff00)
  ($lsl 'r1 8)
  ($and 'r0 'r0 #x00ff)
  ($orr 'r0 'r1)
  ($lsl 'r1 'r0 4)
  ($orr 'r0 'r1)
  ($and 'r0 'r0 #x0f0f0f0f)
  ($lsl 'r1 'r0 2)
  ($orr 'r0 'r1)
  ($and 'r0 'r0 #x33333333)
  ($lsl 'r1 'r0 1)
  ($orr 'r0 'r1)
  ($and 'r0 'r0 #x55555555)
  ($bx 'lr))

For example:

> (format t "~b" (stretch #b1111001111001111))
1010101000001010101000001010101

You can use stretch to create two other useful functions. This function double repeats each bit in a 16-bit number twice to create a 32-bit number, so:

abcdefghijklmnop -> aabbccddeeffgghhiijjkkllmmnnoopp

This has practical applications in image processing, such as creating double-sized versions of font bitmaps.

> (format t "~b" (double #b1111001111001111))
11111111000011111111000011111111

The function interleave shuffles two 16 bit numbers together, taking alternate bits from each number:

ghijklmnopqrstuv 0123456789abcdef -> g0h1i2j3k4l5m6n7o8p9qarbsctduevf

For example:

> (format t "~b" (interleave #b1111111100000000 #b0000000011111111))
10101010101010100101010101010101

Population count – cbz

The ARM processor doesn't have an instruction to calculate the number of '1' bits in a word, also called the population count, but the following iterative solution is efficient, and takes advantage of the Thumb-2 cbz (Compare and Branch if Zero) instruction:

(defcode popcount (x)
  ($mov 'r1 0)
  loop
  ($cbz 'r0 return)
  ($add 'r1 1)
  ($sub 'r2 'r0 1)
  ($and 'r0 'r2)
  ($b loop)
  return
  ($mov 'r0 'r1)
  ($bx 'lr))

For example:

> (popcount #b10101010101010101010)
10

It relies on the fact that the bitwise AND of x with x − 1 differs from x only in zeroing the least significant non-zero bit: subtracting 1 changes the rightmost string of 0s to 1s, and changes the rightmost 1 to a 0. If x originally had n bits that were 1, then after n iterations x will be reduced to zero.

Reverse bits – rbit

The Thumb-2 rbit instruction reverses the order of bits in a 32-bit number. The reverse-bits operation could be useful when transforming bitmap images, or when interfacing between protocols that work MSB first and LSB first:

(defcode reverse-bits (n)
  ($rbit 'r0 'r0)
  ($bx 'lr))

For example:

> (format t "~b" (reverse-bits #b10110011100011110000111110000011))
11000001111100001111000111001101

Remainder - sdiv and mls

The signed division instruction sdiv, in conjunction with the multiply and subtract instruction mls, provide an efficient way of implementing the remainder function:

(defcode my-rem (x y)
  ($sdiv 'r2 'r0 'r1)
  ($mls 'r0 'r2 'r1 'r0)
  ($bx 'lr))

For example:

> (my-rem 12345 7)
4

Integer square root - clz

The Thumb-2 clz instruction counts the number of leading zeros in a register. It provides an easy way of getting upper and lower bounds for the integer part of the square root of a number. These are useful for applications such as finding prime numbers, where the upper bound gives the largest factor you need to test. If a more accurate result is needed, these bounds can be used as the starting point for Newton's method, or a binary search.

The algorithm takes advantage of the fact that the length of the binary representation of a number's integer square root is approximately half that of the original number.

Here is the upper bound routine, upper-sqrt:

(defcode upper-sqrt (x)
  ($mov 'r1 33)
  ($mov 'r2 1)
  ($clz 'r0 'r0)
  ($sub 'r0 'r1 'r0)
  ($lsr 'r0 'r0 1)
  ($lsl 'r2 'r0)
  ($sub 'r0 'r2 1)
  ($bx 'lr))

It's equivalent to this Lisp function (assuming you defined clz):

(defun upper-sqrt (x) (1- (ash 1 (truncate (- 33 (clz x)) 2))))

For example:

> (upper-sqrt 9)
3

> (upper-sqrt 1000000)
1023

> (upper-sqrt 1600000000)
65535

To get the lower bound of the integer square root you could use the following Lisp function, lower-sqrt:

(defun lower-sqrt (x) (1- (truncate (+ (upper-sqrt x) 3) 2)))

Summary of the extensions

Here's a summary of the extensions defined in the ARM Thumb-2 extensions file:

	Operation	Example	Flags	Action	Notes
Move	Immediate	($mov 'r8 imm12)	NZ	r8 = imm12	See notes about imm12
Basic bit manipulation	Bit field clear Count leading zeros Reverse bits	($bfc 'r0 'lsb 'width) ($clz 'r0 'r1) ($rbit 'r0 'r1)			Clear bits from lsb by width. Number of leading zeros
Comparison and branch	Compare and branch if non-zero Compare and branch if zero If-Then	($cbnz 'r0 label) ($cbz 'r0 label) ($it 'xyz 'cond)
Multiply/divide	Multiply and add Multiply and subtract Signed divide Unsigned divide	($mla 'r0 'r1 'r2 'r3) ($mls 'r0 'r1 'r2 'r3) ($sdiv 'r0 'r1 'r2) ($udiv 'r0 'r1 'r2)		r0 = r1 * r2 + r3 r0 = r3 - r1 * r2 r0 = r1 / r2 r0 = r1 / r2	Works for signed or unsigned Works for signed or unsigned Signed Unsigned
Logical immediate	AND immediate EOR immediate OR immediate Bit clear immediate Move NOT immediate Test bits immediate	($and 'r0 'r1 imm12) ($eor 'r0 'r1 imm12) ($orr 'r0 'r1 imm12) ($bic 'r0 imm12) ($mvn 'r0 imm12) ($tst 'r0 imm12)	NZC NZ NZ NZ NZ NZC	r0 = r1 AND imm12 r0 = r1 EOR imm12 r0 = r1 OR imm12 r0 = r0 AND NOT imm12 r0 = NOT imm12 Flags: r0 AND imm12	See notes about imm12

^ Warren Jr., Henry S. (2013) [2002]. Hacker's Delight (2 ed.). Addison Wesley - Pearson Education, Inc., p. 141.

Previous: ARM assembler examples

Next: ARM NeoPixel driver using assembler