Descriptors


Rather than couple the inheritance model of Avail with that of the language in which it is implemented, a layer of indirection is used. Not only does this deal with the ability to have multiple inheritance (without support from the Smalltalk language), but it also allows much more severe discrepancies to be accomodated. For example, some types in Avail have an infinite number of supertypes. Clearly we can't just enumerate these as we would be forced to do in Smalltalk if we were coupled to its inheritance model. The fact that the current virtual machine is a hybrid implementation on Smalltalk and C++ doesn't help us, as C++'s inheritance model also lacks this ability. So how do we avoid sacrificing Avail's power?

Every object in Avail has a four byte header that contains information about the object's size and its layout. The layout information is implemented as a pointer to an instance of a subclass of Descriptor, a Smalltalk (and C++) class. Every message understood by an Avail object is indirected through its Descriptor in a double-dispatch kind of scheme. For example, the method that implements tuple subscripting in the Smalltalk implementation is called "tupleAt:", and it takes an integer as its argument. The method extracts the object's descriptor and sends it the message "Object:tupleAt:", passing as the two arguments the original object (which is a tuple) and the index. Because there are several possible representations for tuples (for example, tuples of bytes have a special packed format), this allows the Descriptor to say how to access the particular implementation of an object. In the case of a packed tuple of bytes, the "Object:tupleAt:" implementation in ByteTupleDescriptor takes care of things. For an ordinary tuple of object pointers, an instance of ObjectTupleDescriptor is used in the descriptor field of the object instead.

Since every access to an object is indirected through its descriptor, we can implement indirection objects trivially via an IndirectionDescriptor. This kind of descriptor just indirects all messages through a forwarding pointer within the indirection object. Indirection objects are used by the garbage collector to move objects around in memory. They are also used to change the representation of objects dynamically, allowing an object to change into a representation that is bigger than the previous representation.

Since many objects can share the same Descriptor, I use a 16-bit index into a table of Descriptors instead of a full 32-bit pointer. This saves 16 bits per object. That leaves 16 bits for object size information, which is plenty for the objects that Avail manipulates. Large objects are factored via the Composite pattern into objects smaller than 64K, but this shouldn't significantly affect performance, as these cases should rarely occur.

Besides only using 16 bits to encode an object's Descriptor, there are other ways space is conserved. The ByteTupleDescriptor class has a subclass SmallByteTupleDescriptor. This subclass has instances that correspond to tuple sizes 1 through 20. Each such descriptor knows what size of tuple it is used in. That means small tuples don't need to reserve space to record their size. Instead, that is effectively encoded in the descriptor field by the choice among those 20 instances.

Because of all the immutable objects present in Avail, it's useful to keep a one-bit sticky reference count in each object. This count distinguishes between the case where there is one reference to the object from another object and the case where it is not known exactly how many references there are to the object (but at some point there were at least 2). Whenever an object with a single reference is asked to perform an operation, say set union, that object can be modified in place to accomplish this. That's because after this operation there would be no references to the original object, so no one could detect that the memory used for an input was being recycled as the output. Like the way SmallByteTupleDescriptor encodes a 20-way choice via multiple descriptor instances, the one-bit reference count is encoded in every object via a 2-way choice between descriptors. That means there are twice as many descriptors as you would otherwise expect, and each descriptor must know whether it represents objects with one reference or objects with many references.

You may be wondering how this fits in with the type system. The answer lies in my choice of names. I used the term Descriptor instead of Type, because they really are distinct concepts. A Descriptor is a factoring of common information from a family of objects. This family of objects is mostly orthogonal to the classification of objects by their types. So two distinct tuples, even though they may have different tuple types, may have the same descriptor. Conversely, two instances of the same tuple type may have two different representations. For example, the tuple <1,2> might have a packed byte representation in one place, using the associated ByteTupleDescriptor, and a pointers-to-object representation in another place, using an ObjectTupleDescriptor. These can both exist at the same time, even though for an Avail programmer they are completely indistinguishable.

User-defined types definitely do not have anything like a one-to-one correspondence with Descriptors. At the moment, there are only two descriptors that identify an object as a user-defined object (an instance of any user-defined type), and these correspond to the two cases from the one-bit reference counting mechanism described above. Some day there may be more descriptors for user-defined types, but probably never one for every type of object that can be constructed. Instead, Avail would specialize based on how many attributes, whether common data could be factored out, whether some integers could be unboxed, or whether some pointers to other objects could be embedded. Maybe some day extremely common user-defined types could have special descriptors automatically associated with them (for space and speed reasons). This would have to happen dynamically, of course, which would make things very complicated, but still possible. These are still rough ideas, so I won't go into more detail about them here.

At the "object semantics" level they represent the same object, but at the "structural" level they represent different structures. This dichotomy is important for understanding the implementation of the Avail virtual machine. Hopefully the above discussion highlights these differences enough to follow when the topic comes up elsewhere in this document.


Back to Virtual Machine

Table of contents