Objects

16th April 2017: This description has been updated to match uLisp 1.8.

11th August 2017: Note that in the SAM/SAMD version of uLisp each object consists of two 4-byte cells.

All objects in uLisp consist of two 2-byte cells, so every object occupies 4 bytes. There are three fundamental types of object: symbol, number, or cons:

Objects1.gif

  • A symbol contains '2' in the left cell, and the name of the symbol packed in the right cell.
  • A number contains '4' in the left cell, and the 16-bit signed integer value in the right cell.
  • Finally, a cons contains an address pointer in each cell pointing to other objects.

These constants are defined by the type enum; a simplified version is:

enum type {ZERO=0, SYMBOL=2, NUMBER=4};

The low number addresses are never used, so we can uniquely identify the type from the left-hand cell.

The object type is defined by the following typedef:

typedef struct sobject {
  union {
    struct {
      sobject *car;
      sobject *cdr;
    };
    struct {
      unsigned int type;
      union {
        symbol_t name;
        int integer;
      };
    };
  };
} object;

The two pointers in a cons cell are called car and cdr, for historical reasons.

Setting up uLisp's workspace

The space used by uLisp is reserved by the declaration:

object Workspace[WORKSPACESIZE] WORDALIGNED;

The workspacesize depends on the processor, and varies between 317 objects on an ATmega328, and 3328 objects on a MSP430FR5969. The RAM above the workspace is available for the processor stack, and the amount of workspace can be increased at the risk of running out of stack.

Initially all the workspace is allocated to the freelist: a list of objects, linked together by cdr pointers. The last object in the list has a NULL car pointer:

Objects2.gif

The workspace is initialised by initworkspace():

void initworkspace () {
  Freelist = NULL;
  for (int i=WORKSPACESIZE-1; i>=0; i--) {
    object *obj = &Workspace[i];
    car(obj) = NULL;
    cdr(obj) = Freelist;
    Freelist = obj;
    Freespace++;
  }
}

Making objects

When allocating objects, cells are taken from the front of the freelist, and the freelist pointer is updated. For example, after creating the list:

(sq 6)

the freelist will be:

Objects3.gif

An object is allocated by calling myalloc():

object *myalloc () {
  if (Freespace == 0) error(F("No room"));
  object *temp = Freelist;
  Freelist = cdr(Freelist);
  Freespace--;
  return temp;
}

The myalloc() function is called by the routines that make an object of each type.

Number objects are created by number(), which takes an integer value:

object *number (int n) {
  object *ptr = myalloc();
  ptr->type = NUMBER;
  ptr->integer = n;
  return ptr;
}

Symbol objects are created by symbol(), which takes an unsigned integer containing a packed version of the symbol:

object *symbol (symbol_t name) {
  object *ptr = myalloc();
  ptr->type = SYMBOL;
  ptr->name = name;
  return ptr;
}

Finally cons() takes two objects to be pointed to by the car and cdr of the cons:

object *cons (object *arg1, object *arg2) {
  object *ptr = myalloc();
  ptr->car = arg1;
  ptr->cdr = arg2;
  return ptr;
}

Encoding symbols

In uLisp the names of built-in functions can be any length, and may contain any character except space, brackets, or quote. User-defined function names are restricted to three letters or digits.

When the read function encounters a symbol name it first calls builtin() which looks it up in lookup_table[] to see if it matches one of the built-in symbols:

int builtin(char* n){
  int entry = 0;
  while (entry < ENDFUNCTIONS) {
    if(strcmp_P(n, (char*)pgm_read_word(&lookup_table[entry].string)) == 0)
      return entry;
    entry++;
  }
  return ENDFUNCTIONS;
}

The following fragment shows part of the lookup table:

const tbl_entry_t lookup_table[] PROGMEM = {
  { string0, NULL, 0, 127 },
...
  { string93, fn_delay, 1, 1 },
  { string94, fn_shiftout, 4, 4 },
  { string95, fn_shiftin, 3, 3 },
  { string96, fn_note, 0, 3 },
};

If it matches, builtin() returns the index in the lookup-table, which is a constant identifying the function in the functions enum:

enum function { BRA, ... DELAY, SHIFTIN, SHIFTOUT, NOTE, FUNCTIONS };

If a match is found, the lookup table gives the entry point of the C function to implement the function; for example, fn_note().

The strings representing the actual function names are stored in program memory, to avoid using up RAM, and they are referenced in the lookup table. For example:

const char string96[] PROGMEM = "note";

User-defined symbols

If builtin() doesn't find a match in the lookup table it returns the value FUNCTIONS to indicate that no built-in symbol exists. The string is then assumed to be a user-defined symbol.

On processors with plenty of RAM uLisp supports arbitrary symbols; for details see Arbitrary symbol names. On machines with limited RAM the first three characters are encoded into a 16-bit value using radix-40 encoding. This allows you to encode three characters from a 40-character set because 40*40*40 is 64000, which is less than 65536.

The characters are packed by pack40():

int pack40 (char *buffer) {
  return (((toradix40(buffer[0]) * 40) + toradix40(buffer[1])) * 40 + toradix40(buffer[2]));
}

This in turn calls toradix40() to convert the characters in the character set to values between 0 and 39:

int toradix40 (char ch) {
  if (ch == 0) return 0;
  if (ch >= '0' && ch <= '9') return ch-'0'+30;
  ch = ch | 0x20;
  if (ch >= 'a' && ch <= 'z') return ch-'a'+1;
  return -1; // Invalid
}

The corresponding routine fromradix40() converts a number from 0 to 39 to the corresponding character in the character set:

int fromradix40 (int n) {
  if (n >= 1 && n <= 26) return 'a'+n-1;
  if (n >= 30 && n <= 39) return '0'+n-30;
  return 0;
}

The user-defined name with the smallest encoded value is "a", which has a value of 1*40*40 or 1600, so uLisp can safely have 1599 built-in functions without the values overlapping!

Decoding symbols

The reverse procedure is used to decode symbols by the function name(). If the value of the encoded symbol is less than FUNCTIONS the name is looked up in the lookup table; otherwise it is unpacked from radix40 encoding. Here's a simplified version:

char *name (object *obj){
  char *buffer = SymbolTop;
  buffer[3] = '\0';
  if(obj->type != SYMBOL) error(F("Error in name"));
  symbol_t x = obj->name;
  if (x < ENDFUNCTIONS) return lookupbuiltin(x);
  for (int n=2; n>=0; n--) {
    buffer[n] = fromradix40(x % 40);
    x = x / 40;
  }
  return buffer;
}

Symbol table

Most Lisp implementations incorporate a symbol table, where a unique instance of each symbol is stored together with its value. To save RAM uLisp doesn't do this, so when two symbols are encountered their names need to be compared to see if they are the same symbol.