Garbage collection

Garbage collection is an essential part of any Lisp interpreter or compiler. It allows Lisp to manipulate lists and more complex data structures without requiring the programmer to be concerned about the allocation and deallocation of the memory they need.

During the normal running of a Lisp program, objects are left that are no longer accessible by anything else in the environment. For example, if you evaluated:

(defvar lst '(1 2 3))
(setq lst '(4 5 6))

the list (1 2 3) is no longer accessible.

These objects are essentially wasted memory, and are called garbage. The garbage collector runs intermittently to gather up these inaccessible blocks of memory, and restore them for use by the program.

Depending on the amount of memory and the speed of the processor, garbage collection typically takes a few milliseconds.

Mark and sweep

The type of garbage collector used in uLisp is called mark and sweep. It runs in two stages:

First, markobject() marks all the objects that are still accessible. After marking each object it recursively follows the car and cdr of each cons object to mark all the objects it points to. Here's a simplified version:

void markobject (object *obj) {
  MARK:
  if (obj == NULL) return;
  if (marked(obj)) return;

  object* arg = car(obj);
  unsigned int type = obj->type;
  mark(obj);
  
  if (type >= PAIR || type == ZERO) { // cons
    markobject(arg);
    obj = cdr(obj);
    goto MARK;
  }
}
An object is marked by setting the bottom bit of its car cell, using the C macro mark(x):
#define mark(x)            (car(x) = (object *)(((unsigned int)(car(x))) | 0x0001))

The bottom bit is usually zero, because the workspace is aligned to an even address.

Then sweep() builds a new freelist out of the unmarked objects, and unmarks the marked objects:

void sweep () {
  Freelist = NULL;
  Freespace = 0;
  for (int i=WORKSPACESIZE-1; i>=0; i--) {
    object *obj = &Workspace[i];
    if (!marked(obj)) myfree(obj); else unmark(obj);
  }
}

The function myfree() adds an object to the freelist:

inline void myfree (object *obj) {
  car(obj) = NULL;
  cdr(obj) = Freelist;
  Freelist = obj;
  Freespace++;
}

The macro unmark(x) clears the mark bit:

#define unmark(x)          (car(x) = (object *)(((unsigned int)(car(x))) & 0xFFFE))

and the macro marked(x) returns TRUE if the object is marked:

#define marked(x)          ((((unsigned int)(car(x))) & 0x0001) != 0)

Finally, the actual garbage collection is performed by gc(), which calls markobject() and sweep():

void gc (object *form, object *env) {
  #if defined(printgcs)
  int start = Freespace;
  #endif
  markobject(tee); 
  markobject(GlobalEnv);
  markobject(GCStack);
  markobject(form);
  markobject(env);
  sweep();
  #if defined(printgcs)
  pchar('{'); pint(Freespace - start); pchar('}');
  #endif
}

This first marks the following objects:

  • The language symbols t.
  • The global environment GlobalEnv containing an association list of all the globally-defined variables and functions.
  • The garbage-collection stack, GCStack, used to push temporary structures that need to be protected from garbage collection during evaluation.
  • The form and local environment passed to gc as arguments.

It then calls sweep() to free the unmarked objects.

Invoking garbage collection

The garbage collector is invoked each time eval() is called if the amount of free space is less than 1/16 of the workspace, by the statement:

if (Freespace <= WORKSPACESIZE>>4) gc(form, env);

You can also force uLisp to do a garbage collection by calling the Lisp gc function. This prints the amount of space reclaimed, and the time taken; for example:

> (gc)
Space: 1473 bytes, Time: 1296 us