Arbitrary symbol names

As of version 1.8 uLisp allows optional support for arbitrary symbol names. This page describes how this feature works.

Symbol table

uLisp now allocates a symbol table in RAM of typically 512 bytes, which is enough for about 50 ten-character symbols. The top of the symbol table doubles as the input buffer.

If the symbol table is allocated with the minimum size of BUFFERSIZE, which is only just sufficient for its use as the input buffer, only short symbols will be accepted, and attempting to enter a long symbol, or one with characters outside the radix 40 set, will give the error "No room for long symbols".

The same error will occur if a larger symbol table becomes full.

Reading a symbol

When a symbol is encountered it is put into the input buffer. If it is three characters or less, valid40() is called to check whether it can be encoded into a 16-bit word using radix 40 encoding:

boolean valid40 (char *buffer) {
 return (toradix40(buffer[0])>=0 && toradix40(buffer[1])>=0 && toradix40(buffer[2])>=0);
}

If so, it is encoded, and no space is used in the symbol table. If it is more than three characters, or uses characters not in the radix 40 set, it is searched for in the symbol table using longsymbol():

int longsymbol (char *buffer) {
  char *p = SymbolTable;
  int i = 0;
  while (strcmp(p, buffer) != 0) {p = p + strlen(p) + 1; i++; }
  if (p == buffer) {
    // Add to symbol table?
    char *newtop = SymbolTop + strlen(p) + 1;
    if (SYMBOLTABLESIZE-(newtop-SymbolTable) < BUFFERSIZE)
      error(F("No room for long symbols"));
    SymbolTop = newtop;
  }
  if (i > 1535) error(F("Too many long symbols"));
  return i + 64000; // First number unused by radix40
}

If the symbol is found, its index in the SymbolTable is added to 64000 to give the symbol identifier; 64000 is the first number that cannot occur as the result of a radix 40 encoding.

If the symbol doesn't already exist in the symbol table, it is left at the top of the table where it was entered, and the top of the table, SymbolTop, is moved to the end of the newly added symbol.

Looking up a symbol

The function name(), which returns a pointer to the name of a symbol, has been extended to cater for long symbol names:

char *name (object *obj){
  char *buffer = SymbolTop;
  buffer[3] = '\0';
  if(obj->type != SYMBOL) error(F("Error in name"));
  symbol_t x = obj->name;
  if (x < ENDFUNCTIONS) return lookupbuiltin(x);
  else if (x >= 64000) return lookupsymbol(x);
  for (int n=2; n>=0; n--) {
    buffer[n] = fromradix40(x % 40);
    x = x / 40;
  }
  return buffer;
}
There are now three cases:
  • If the symbol number is less than ENDFUNCTIONS it's a built-in symbol, and its name is looked by by calling lookupbuiltin().
  • If the symbol number is 64000 or greater its name is looked up in the symbol table by calling lookupsymbol().
  • Otherwise it's a radix 40 encoded symbol, and its name is extracted by calling fromradix40().

The new routine, lookupsymbol(), counts through the symbol table until it reaches the correct symbol:

char *lookupsymbol (symbol_t name) {
  char *p = SymbolTable;
  int i=name-64000;
  while (i > 0 && p < SymbolTop) {p = p + strlen(p) + 1; i--; }
  if (p == SymbolTop) return NULL; else return p;
}

It returns NULL if the symbol is not found.

Deleting a symbol

Finally, when removing a function using the Lisp function makunbound, the corresponding symbol, if any, also needs to be deleted from the symbol table.

This is achieved by deletesymbol():

void deletesymbol (symbol_t name) {
  char *p = lookupsymbol(name);
  if (p == NULL) return;
  char *q = p + strlen(p) + 1;
  *p = '\0'; p++;
  while (q < SymbolTop) *(p++) = *(q++);
  SymbolTop = p;
}

This searches for the symbol to be deleted and, if found, shuffles all the subsequent symbols down to free up the space. Note that it purposely leaves a null end-of-string character in the table, to avoid needing to renumber any of the existing long symbols. There is therefore a slight overhead from creating new symbols and then deleting them using makunbound.