Garbage collection

16th April 2017: This description has been updated to match uLisp 1.8.

Garbage collection is an essential part of any Lisp interpreter or compiler. It allows Lisp to manipulate lists and more complex data structures without requiring the programmer to be concerned about the allocation and deallocation of the memory they need.

During the normal running of a Lisp program, objects are left that are no longer accessible by anything else in the environment. These objects are essentially wasted memory, and are called garbage. The garbage collector runs intermittently to gather up these inaccessible blocks of memory, and restore them for use by the program.

Mark and sweep

The type of garbage collector used in uLisp is called mark and sweep. It runs in two stages:

First, markobject() marks all the objects that are still accessible. After marking each object it recursively follows the car and cdr of each cons object to mark all the objects it points to. Here's a simplified version:

void markobject (object *obj) {
  MARK:
  if (obj == NULL) return;
  if (marked(obj)) return;

  object* arg = car(obj);
  unsigned int type = obj->type;
  mark(obj);
  
  if (type >= PAIR || type == ZERO) { // cons
    markobject(arg);
    obj = cdr(obj);
    goto MARK;
  }
}
An object is marked by setting the bottom bit of its car cell, using the C macro mark(x):
#define mark(x)            (car(x) = (object *)(((unsigned int)(car(x))) | 0x0001))

The bottom bit is usually zero, because the workspace is aligned to an even address.

Then sweep() builds a new freelist out of the unmarked objects, and unmarks the marked objects:

void sweep () {
  Freelist = NULL;
  Freespace = 0;
  for (int i=WORKSPACESIZE-1; i>=0; i--) {
    object *obj = &Workspace[i];
    if (!marked(obj)) myfree(obj); else unmark(obj);
  }
}

The function myfree() adds an object to the freelist:

inline void myfree (object *obj) {
  car(obj) = NULL;
  cdr(obj) = Freelist;
  Freelist = obj;
  Freespace++;
}

The macro unmark(x) clears the mark bit:

#define unmark(x)          (car(x) = (object *)(((unsigned int)(car(x))) & 0xFFFE))

and the macro marked(x) returns TRUE if the object is marked:

#define marked(x)          ((((unsigned int)(car(x))) & 0x0001) != 0)

Finally, the actual garbage collection is performed by gc(), which calls markobject() and sweep():

void gc (object *form, object *env) {
  #if defined(printgcs)
  int start = Freespace;
  #endif
  markobject(tee); 
  markobject(GlobalEnv);
  markobject(GCStack);
  markobject(form);
  markobject(env);
  sweep();
  #if defined(printgcs)
  pchar('{'); pint(Freespace - start); pchar('}');
  #endif
}

This first marks the following objects:

  • The language symbols t.
  • The global environment GlobalEnv containing an association list of all the globally-defined variables and functions.
  • The garbage-collection stack, GCStack, used to push temporary structures that need to be protected from garbage collection during evaluation.
  • The form and local environment passed to gc as arguments.

It then calls sweep() to free the unmarked objects.

Invoking garbage collection

The garbage collector is invoked each time eval() is called if the amount of free space is less than 20 objects, by the statement:

if (Freespace < 20) gc(form, env);