uLisp - Arbitrary-precision arithmetic extension

Arbitrary-precision arithmetic extension

This is a small arbitrary-precision extension for uLisp, for calculating with large integers. It is designed to work on any 32-bit uLisp platform. To add these functions:

Uncomment #define extensions at the start of the main uLisp source file.
Compile and upload uLisp with the file Bignums.ino included in the same folder as the uLisp source file for your platform.

The bignum functions will then be added to the built-in functions in uLisp, together with their documentation.

Here is the Bignums Extension: Bignums.ino.

Or get it from GitHub at: https://github.com/technoblogy/ulisp-bignums.

For information about the Extensions File feature of uLisp see: Adding your own functions.

Introduction

I’ve been trying to think of a good example to test the new Extensions File feature introduced in uLisp Release 4.4, and thought it would be interesting to add an arbitrary precision package.

It's similar in function to the uLisp example program Infinite precision arithmetic, but with the following differences:

It's written in C, so faster.
Numbers are packed as efficiently as possible, 32 bits per Lisp object.
It supports a wider range of functions, including division, modulus, comparisons, and bitwise logical functions.

It was inspired by Kokke's tiny-bignum-c ^[1] library on GitHub; his clearly-written routines made me hopeful that this was a project I could complete in a weekend. My library follows the logic of his routines, but rewritten to use uLisp lists rather than C arrays for storing the bignums.

My aim was to make the functions simple to understand, rather than optimising the routines. In particular, I know there are clever ways to improve the bignum multiplication and division; this is left as an exercise for the reader. One of the slowest operations is using $bignum-string to convert a bignum to decimal, because it needs to perform repeated divisions.

Even if you're not interested in arbitrary-precision arithmetic, but would like to make your own extensions to uLisp, this example should be useful in showing:

How to interface C routines with uLisp.
How to build a list within a C function.
How to construct a uLisp string from within a C routine.
How to call the garbage collector from C.

I've included detailed comments to explain non-obvious features.

Approach

This set of uLisp functions use uLisp lists to represent bignums, and takes advantage of the list handling and garbage collection to allow you to work with arbitrarily large numbers, limited only by the amount of Lisp workspace.

The bignum versions of the uLisp functions have the same names as their integer equivalents, but with a '$' prefix. The library provides the following functions:

Function	Description
($bignum int)	Converts an integer to a bignum.
($integer bignum)	Converts a bignum to an integer.
($bignum-string bignum [base])	Converts a bignum to a string; the base is 10 (default) or 16.
($string-bignum bignum [base])	Converts a string to a bignum; the case is 10 (default) or 16.
($zerop bignum)	Tests whether a bignum is zero, allowing for trailing zeros.
($+ bignum1 bignum2)	Adds two bignums and returns the sum as a new bignum.
($- bignum1 bignum2)	Subtracts two bignums and returns the difference as a new bignum.
($* bignum1 bignum2)	Multiplies two bignums and returns the product as a new bignum.
($/ bignum1 bignum2)	Divides two bignums and returns the quotient as a new bignum.
($mod bignum1 bignum2)	Divides two bignums and returns the remainder as a new bignum.
($= bignum1 bignum2)	Returns t if the two bignums are equal.
($< bignum1 bignum2)	Returns t if bignum1 is less than bignum2.
($> bignum1 bignum2)	Returns t if bignum1 is greater than bignum2.
($logand bignum1 bignum2)	Returns the logical AND of two bignums.
($logior bignum1 bignum1)	Returns the logical inclusive OR of two bignums.
($logxor bignum1 bignum2)	Returns the logical exclusive OR of two bignums.
($ash bignum shift)	Returns bignum shifted by shift bits; positive means left.

Examples

In the following examples I've followed the convention that variables representing bignums are prefixed with '$', although this isn't required. The timings were all obtained on an ATSAMD51 board (Adafruit ItsyBitsy M4 Express).

Factorial

The following routine $fact returns the factorial of an integer as a bignum:

(defun $fact (n)
  (let (($result ($bignum 1)))
  (dotimes (i n $result)
    (setq $result ($* $result ($bignum (1+ i)))))))

For example, to find factorial 100 and return it as string representing a decimal number:

> (time ($bignum-string ($fact 100)))
"933262154439441526816992388562667004907159682643816214685929638952175999932299156089414639
76156518286253697920827223758251185210916864000000000000000000000000"
Time: 233 ms

Exponent

The following routine $expt finds x^y for integer x and y:

(defun $expt (x y)
  (let (($e ($bignum 1))
        ($f ($bignum x)))
    (loop
     (when (zerop y) (return $e))
     (when (oddp y) (setq $e ($* $e $f)))
     (setq $f ($* $f $f) y (ash y -1)))))

For example, to find 7¹⁶⁰:

> (time ($bignum-string ($expt 7 160)))
"164318477493817185791700041055654480634183741959952349706976467123320756556228789187756432
3818254449486910838997871467298047369612896001"
Time: 153 ms

Square root

The following routine $sqrt calculates the square root of a bignum using the Newton–Raphson method:

(defun $sqrt ($a)
  (let* (($1 ($bignum 1))
         ($high $a)
         ($low ($bignum 0))
         ($mid ($+ ($ash $high -1) $1)))
    (loop
     (unless ($> $high $low) (return))
     (if ($> ($* $mid $mid) $a)
         (setq $high ($- $mid $1))
       (setq $low $mid))
     (setq $mid ($+ ($+ $low ($ash ($- $high $low) -1)) $1)))
    $low))

For example:

> (time ($bignum-string ($sqrt ($string-bignum "152415787532388367501905199875019052100"))))
"12345678901234567890"
Time: 32 ms

Mersenne primes

The following example mersenne prints the first 16 Mersenne primes, which are primes of the form 2ⁿ-1:

(defun mersenne ()
  (dolist (m '(2 3 5 7 13 17 19 31 61 89 107 127 521 607 1279 2203))
    (let (($p ($- ($expt 2 m) ($bignum 1))))
      (format t "~4a ~a~%" m ($bignum-string $p)))))

Here's the output (formatted to fit the browser window):

> (time (mersenne))
2    3
3    7
5    31
7    127
13   8191
17   131071
19   524287
31   2147483647
61   2305843009213693951
89   618970019642690137449562111
107  162259276829213363391578010288127
127  170141183460469231731687303715884105727
521  68647976601306097149819007990813932172694353001433054093944634591855431833976560521225
     59640661454554977296311391480858037121987999716643812574028291115057151
607  53113799281676709868958820655246862732959311772703192319944413820040355986085224273916
     25022652292856688893294862465010153465793376527072394095199787665873519438312708353932
     19031728127
1279 10407932194664399081925240327364085538615262247266704805319112350403608059673360298012
     23944173232418484242161395428100779138356624832346490813990660567732076292412950938922
     03457731833496615835504729594205476898112116936771475484788669625013844382602917323488
     85311160828538416585028255604666224831890918801847068222203140521026698435488732958028
     878050869736186900714720710555703168729087
2203 14759799152141802350848986227373817363120661453331697751477712164785702978780789493774
     07337049389289382748507531496480477281264838760259191814463365330269540496961201113430
     15690239609398909022625932693502528140961498349938822283144859860183431853623092377264
     13902094902318364468996082107954829637630942366309454108327937699053999824571863229447
     29636418890623372171723742105636440368218459649632948538696905872650486914434637457507
     28044182367681351785209934866084717257940842231667809767022401199028017047489448742692
     47421088235368084850725022405194525875428753499765585726702296339625752126374778977855
     01552646522609988869914013540483809865681250419497686697771007
nil
Time: 19.7 s

Factorising

Here’s a function that uses the arbitrary-precision arithmetic extension to factorise large numbers, using Pollard’s rho algorithm ^[2]:

(defun $gcd (a b)
  (let (temp)
    (loop
     (when ($zerop b) (return a))
     (setq temp b b ($mod a b) a temp))))

(defun $pollard-rho (n)
  (let* (($1 ($bignum 1))
         (x ($bignum 2))
         (y ($bignum 2))
         (d ($bignum 1))
         ($g (lambda (x) ($mod ($+ ($* x x) $1) n))))
    (loop
     (unless ($= d $1) (return))
     (setq x ($g x))
     (setq y ($g ($g y)))
     (setq d ($gcd (if ($> x y) ($- x y) ($- y x)) n)))
    (if ($= d n) nil d)))

For example, it takes 10 minutes on an ATSAMD51 board to find the smallest factor of the ninth Fermat number ^[3], which has 155 digits and is:

1340780792994259709957402499820584612747936582059239337772356144372176403007354697680187
4298166903427690031858186486050853753882811946569946433649006084097

Here’s the definition of $fermat (it needs $expt defined earlier):

(defun $fermat (n) ($+ ($expt 2 (expt 2 n)) ($bignum 1)))

and here’s the factor:

> (time ($bignum-string ($pollard-rho ($fermat 9))))
"2424833"
Time: 607.9 s

Implementation

Representation

The bignums are represented using standard lists of 32-bit integers. They are packed as efficiently as possible, 32 bits per Lisp object, with the least significant word (LSW) of a bignum stored in the head or car of the list, and the most significant word (MSW) in the tail. For example, the following list represents 3 + 17 x 2³², or 73014444035:

'(3 17)

The macro int_to_bignum() constructs a bignum from an integer:

#define int_to_bignum(x) (cons(number(x), NULL))

Normalising

After some operations, such as a downshift, bignums can end up with leading zeros, so 1 might be represented as:

'(1 0 0 0 0)

The internal function bignum_normalise() removes leading zeros, to ensure a unique representation:

object *bignum_normalise (object *bignum) {
  object *result = bignum;
  object *last = bignum;
  while (bignum != NULL) {
    if (checkinteger(car(bignum)) != 0) last = bignum;
    bignum = cdr(bignum);
  }
  cdr(last) = NULL;
  return result;
}

It scans through the bignum, LSW to MSW, setting last to the last non-zero word found. It then truncates the list at this point.

Upshift bit and downshift bit

The internal functions upshift_bit() and downshift_bit() destructively shift a bignum up or down by a bit. They are used in bignum division, and to implement the Lisp function ash. I avoided the terms "left shift" and "right shift", because I found them confusing given that the bignums are ordered LSW to MSW left to right.

They each scan through the bignum, shifting each word by one bit and adding the carry to the next or previous word as appropriate. If there's a carry upshift_bit() will also extend the length of the bignum by one word:

void upshift_bit (object *bignum) {
  uint32_t now = (uint32_t)checkinteger(car(bignum));
  car(bignum) = number(now << 1);
  while (cdr(bignum) != NULL) {
    uint32_t next = (uint32_t)checkinteger(car(cdr(bignum)));
    car(cdr(bignum)) = number((next << 1) | (now >> 31));
    now = next; bignum = cdr(bignum);
  }
  if (now >> 31 != 0) cdr(bignum) = cons(number(now >> 31), NULL);
}

Addition and subtraction

Addition and subtraction are relatively easy to implement. They scan through the two operands from LSW to MSW, adding or subtracting words, and catering for the carry or borrow as appropriate. Here's add:

object *bignum_add (object *bignum1, object *bignum2) {
  object *result = cons(NULL, NULL);
  object *ptr = result;
  int carry = 0;
  while (!(bignum1 == NULL && bignum2 == NULL)) {
    uint64_t tmp1 = 0, tmp2 = 0, tmp;
    if (bignum1 != NULL) {
      tmp1 = (uint64_t)(uint32_t)checkinteger(first(bignum1));
      bignum1 = cdr(bignum1);
    }
    if (bignum2 != NULL) {
      tmp2 = (uint64_t)(uint32_t)checkinteger(first(bignum2));
      bignum2 = cdr(bignum2);
    }
    tmp = tmp1 + tmp2 + carry;
    carry = (tmp > MAX_VAL);
    cdr(ptr) = cons(number(tmp & MAX_VAL), NULL);
    ptr = cdr(ptr);
  }
  if (carry != 0) {
    cdr(ptr) = cons(number(carry), NULL);
  }
  return cdr(result);
}

The function constructs a new list result for the result of the addition, to avoid affecting the two operands.

Multiplication

Multiplication follows the method of long multiplication as taught in school, except that it multiplies two words at a time, using uint64_t operands:

object *bignum_mul (object *bignum1, object *bignum2, object *env) {
  object *result = int_to_bignum(0);
  object *arg2 = bignum2;
  int i = 0, j;
  while (bignum1 != NULL) {
    bignum2 = arg2; j = 0;
    while (bignum2 != NULL) {
      uint64_t n = (uint64_t)(uint32_t)checkinteger(first(bignum1)) *
        (uint64_t)(uint32_t)checkinteger(first(bignum2));
      object *tmp;
      if (n > MAX_VAL) tmp = cons(number(n), cons(number(n>>(uint64_t)32), NULL));
      else tmp = cons(number(n), NULL);
      for (int m = i + j; m > 0; m--) push(number(0), tmp); // upshift i+j words
      result = bignum_add(result, tmp);
      bignum2 = cdr(bignum2); j++;
      maybe_gc(result, env);
    }
    bignum1 = cdr(bignum1); i++;
  }
  return result;
}

Again, the result is returned in a new bignum, result, and the original operands are preserved.

Comparison

The bignum_cmp() function compares two bignums and returns SMALLER (-1), EQUAL (0), or LARGER (1) depending on the result. It is used in bignum division, and to implement the Lisp comparison functions $<, $=, and $>.

The obvious way to implement bignum_cmp() would be:

If the operands are different lengths the longer one is larger. Otherwise if they are the same length:

Compare successive pairs of corresponding words, from MSW to LSW.
As soon as they are different return the result, otherwise compare the next pair.
If you've reached the LSW without finding a difference return EQUAL.

However, scanning from MSW to LSW is inefficient with Lisp lists so I implemented it in reverse:

Start with state set to EQUAL.
Scan from LSW to MSW until you reach the end of the longer list.
Set b1 and b2 to the current word in each operand, or to zero if we have reached the end of that list.
Set state to LARGER if b1 > b2 and SMALLER if b1 < b2.

Here's the function:

int bignum_cmp (object *bignum1, object *bignum2) {
  int state = EQUAL;
  uint32_t b1, b2;
  while (!(bignum1 == NULL && bignum2 == NULL)) {
    if (bignum1 != NULL) {
      b1 = checkinteger(car(bignum1));
      bignum1 = cdr(bignum1);
    } else b1 = 0;
    if (bignum2 != NULL) {
      b2 = checkinteger(car(bignum2));
      bignum2 = cdr(bignum2);
    } else b2 = 0;
    if (b1 > b2) state = LARGER; else if (b1 < b2) state = SMALLER;
  }
  return state;
}

Division

Division follows the method of long division as taught in school, except that it operates a bit at a time. The procedure is:

Set current to 1 and denom to a copy of the second argument, the denominator.
Normalise the operands by shifting denom and current up, a bit at a time, until it's larger than the numerator, and then shift them back down one bit.

Then repeat until current is zero:

Set result to zero and remainder to the numerator.
If remainder is greater than or equal to denom, subtract denom from it, and set the current bit in result.
Shift current and denom down one bit.

The quotient will be in result and the remainder in remainder. The function bignum_div() returns both of these as a list of two items, so the single function can be used to implement the Lisp functions $* and $mod.

The final version of bignum_div() normalises the operands using word shifts rather than bitwise shifts as it turned out to be more efficient, even though this results in some redundant calls to downshift_bit() later:

object *bignum_div (object *bignum1, object *bignum2, object *env) {
  object *current = int_to_bignum(1);
  object *denom = copylist(bignum2);
  while (bignum_cmp(denom, bignum1) != LARGER) {
    push(number(0), current); push(number(0), denom); // upshift current and denom 1 word
    push(current, GCStack);
    maybe_gc(denom, env);
    pop(GCStack);
  }

  object *result = int_to_bignum(0);
  object *remainder = copylist(bignum1);
  while (!bignum_zerop(current)) {
    if (bignum_cmp(remainder, denom) != SMALLER) {
      remainder = bignum_sub(remainder, denom);
      result = do_operator(result, current, op_ior);
    }
    downshift_bit(current); downshift_bit(denom);
    push(current, GCStack); push(remainder, GCStack); push(denom, GCStack);
    maybe_gc(result, env);
    pop(GCStack); pop(GCStack); pop(GCStack);
  }
  return cons(result, cons(remainder, NULL));
}

Logical operations

The C function do_operator() implements the Lisp bitwise logical operations $logadd, $logior, and $logxor. To avoid having to write three very similar C functions to implement these, do_operator() takes a C function as its third argument, and uses this to determine which operator is implemented. Because C doesn't allow you to pass an operator as an argument, the three operators are formulated as C functions:

uint32_t op_and (uint32_t a, uint32_t b) { return a & b; };
uint32_t op_ior (uint32_t a, uint32_t b) { return a | b; };
uint32_t op_xor (uint32_t a, uint32_t b) { return a ^ b; };

Here's do_operator(). As with bignum_add() the function constructs a new list result for the result of the operation, to avoid affecting the two operands:

object *do_operator (object *bignum1, object *bignum2, uint32_t (*op)(uint32_t, uint32_t)) {
  object *result = cons(NULL, NULL);
  object *ptr = result;
  uint32_t tmp1 = 0, tmp2 = 0;
  while (!(bignum1 == NULL && bignum2 == NULL)) {
    if (bignum1 != NULL) {
      tmp1 = (uint32_t)checkinteger(first(bignum1));
      bignum1 = cdr(bignum1);
    }
    if (bignum2 != NULL) {
      tmp2 = (uint32_t)checkinteger(first(bignum2));
      bignum2 = cdr(bignum2);
    }
    cdr(ptr) = cons(number(op(tmp1, tmp2)), NULL);
    ptr = cdr(ptr);
  }
  return cdr(result);
}

String to bignum and bignum to string

The simplest way to support printing and reading bignums was to provide bignum to string and string to bignum functions. I've made these support base 16 and base 10.

String to bignum is the simpler of these:

Set result to 0.
Read the string a character at a time, multiply result by the base, and add the digit value of the character.
Return result.

Here's fn_Sstringbignum():

object *fn_Sstringbignum (object *args, object *env) {
  (void) env;
  object *string = first(args);
  if (!stringp(string)) error(notastring, string);
  int b = 10;
  args = cdr(args);
  if (args != NULL) b = checkinteger(car(args));
  if (b != 10 && b != 16) error2(PSTR("only base 10 or 16 supported"));
  object *base = int_to_bignum(b);
  object *result = int_to_bignum(0);
  object *form = (object *)string->name;
  while (form != NULL) {
    int chars = form->chars;
    for (int i=(sizeof(int)-1)*8; i>=0; i=i-8) {
      char ch = chars>>i & 0xFF;
      if (!ch) break;
      int d = digitvalue(ch);
      if (d >= b) error(PSTR("illegal character in bignum"), character(ch));
      push(result, GCStack); push(base, GCStack);
      result = bignum_mul(result, base, env);
      pop(GCStack); pop(GCStack);
      result = bignum_add(result, cons(number(d), NULL));
    }
    form = car(form);
  }
  return result;
}

Bignum to string in base 16 is simple; just reverse the list, so it's in the order MSW to LSW, and then print successive words in hexadecimal.

Base 10 is trickier as it requires repeated division by 10, with the remainder each time giving successive digits. The most efficient approach is to use division by 100000000, the largest power of ten that fits in a word, and keep a list of the remainders in the order MSW to LSW. Then print each word of the remainder in decimal. Here's the function fn_Sbignumstring():

object *fn_Sbignumstring (object *args, object *env) {
  (void) env;
  object *bignum = copylist(checkbignum(first(args)));
  int b = 10; uint32_t p;
  args = cdr(args);
  if (args != NULL) b = checkinteger(car(args));
  object *list = NULL;
  if (b == 16) {
    p = 0x10000000;
    while (bignum != NULL) {
      push(car(bignum), list);
      bignum = cdr(bignum);
    }
  } else if (b == 10) {
    p = 100000000;
    object *base = cons(number(p*10), NULL);
    while(!bignum_zerop(bignum)) {
      push(bignum, GCStack); push(base, GCStack); push(list, GCStack);
      object *result = bignum_div(bignum, base, env);
      pop(GCStack); pop(GCStack); pop(GCStack);
      object *remainder = car(second(result));
      bignum = first(result);
      push(remainder, list);
    }
  } else error2(PSTR("only base 10 or 16 supported"));
  bool lead = false;
  object *obj = newstring();
  object *tail = obj;
  while (list != NULL) {
    uint32_t i = car(list)->integer;
    for (uint32_t d=p; d>0; d=d/b) {
      uint32_t j = i/d;
      if (j!=0 || lead || d==1) { 
        char ch = (j<10) ? j+'0' : j+'W';
        lead=true;
        buildstring(ch, &tail);
      }
      i = i - j*d;
    }
    list = cdr(list);
  }
  return obj;
}

Garbage collection

Some of the C functions in this arbitrary-precision arithmetic package build temporary lists to hold bignums used during the calculations. Because garbage collection doesn't occur automatically in C functions, as it does when executing Lisp code, it's possible for a "no room" error to occur even when there's plenty of workspace. The solution is to manually call the garbage collector within the C functions. To reduce the time overhead of garbage collections the package includes a function maybe_gc() which only performs a garbage collection if less than 1/16th of the workspace remains.

Before calling maybe_gc() in a C function it's important to protect temporary lists being used in that function from being garbage collected. You can protect one list by passing it as the first argument to maybe_gc(), or you can push one or more lists on GCStack, popping them afterwards.

For example, this code is included in bignum_div(), the bignum divide routine, to perform a garbage collection with current and denom protected:

push(current, GCStack);
maybe_gc(denom, env);
pop(GCStack);

There's a similar call in bignum_mul().

Functions that call bignum_div() or bignum_mul() also need to protect temporary lists around the call, because a garbage collection may occur in the routine being called.

Further suggestions

The library could be modified to run on 8/16-bit versions of uLisp by making the fundamental building block of bignums uint16_t rather than uint32_t.

This library could be used to implement a function to calculate pi to a large number of digits; for a good summary of the available algorithms see Meet π on Guy Fernando's i4cy.com site.

^ tiny-bignum-c on GitHub.
^ Pollard's rho algorithm on Wikipedia.
^ Fermat numbers on Wikipedia.

Previous: LoRa extension

Next: NeoPixel extension