Inheritance with C structures

For my latest project I had to program an unrooted phylogenetic tree. As with standard binary trees, each node has two children, one left, one right. Except for the root, which has three pointers, left child, right child and an extra child.

Writing this, I realise that having a root in an unrooted tree seems counter-intuitive. Truth is, this “tree” is actually a graph with nodes of degree 3. It's just that biologists call them trees. One of these nodes is picked at random and declared root. Unlike in a binary tree, no particular order is implied by the parent-child-relationship! More on unrooted trees which aren't trees but have roots.

Defining a type for nodes in C is straightforward.

typedef struct tree_node {
    struct tree_node *left_branch, *right_branch;
    /* more data */
} tree_node;

Obviously we want to do something with these nodes later, so let's define the following function.

void process(tree_node *node);

The final step is to define the root. If this was C++ we could simply make the root inherit from a node and be done. Unfortunately things are not that easy in plain C. So here are the options.

A lot of these ideas come from the great book 21st Century C by Ben Klemens. Checkout more greatness in his blog.

Option 1

The root could simply have a tree_node member.

typedef struct tree_root {
    struct tree_node node;
    struct tree_node *extra_branch;
    /* even more data */
} tree_root;

This option is easiest and most widely supported by compilers. No additional flags are necessary. But it is not as beautiful as it could be. The above definition means a root has a node, where in fact a root is a node. This also becomes apparent when we try to invoke process.

 tree_root *root = {/* sth. */};
 process(root->node.left_branch);
 process(&(root->node));

The extra parenthesis around root->node are optional. But they make the semantics of that line much more clearer.

Option 2

Another obvious solution is that we could simply copy all data members from the node to the root.

typedef struct tree_root {
    struct tree_node *left_branch, *right_branch;
    /* more data */
    struct tree_node *extra_branch;
    /* even more data */
} tree_root;

But this is neither beautiful nor typesafe: process((tree_node*)root). However, it fixes the other calling problem process(root->left_branch).

Option 2.5

Anonymous structures have been available in C since the C11 standard or when using the Microsoft extensions. Using them, the above example can be rewritten.

typedef struct tree_root {
    struct {
        struct tree_node *left_branch, *right_branch;
        /* more data */
    } ;
    struct tree_node *extra_branch;
    /* even more data */
 } tree_root;

This does not change the way we call process. We have to try harder.

Option 3

We can use unnamed fields to our advantage and make the definition of the tree root much clearer with the Microsoft extensions.

typedef struct tree_root {
    struct tree_node;
    struct tree_node *extra_branch;
    /* even more data */
} tree_root;

In order for this to compile with gcc or clang the -fms-extensions flag has to be used. But this makes the definition already much cleaner. It can be used as follows.

process(root->left_branch);
process((tree_node*)root);

Now there is true inheritance. The member left_branch came from the definition of tree_node. But the cast remains unless your favourite compiler supports the Plan 9 extensions. Then this cast becomes optional. However, these extension are only available in GCC, not in Clang.

Option 4

This will look complicated, but bear with me.

typedef struct tree_root {
    union {
        struct tree_node;
        struct tree_node as_tree_node;
    };
    struct tree_node *extra_branch;
    /* even more data */
} tree_root;

This weird union-struct gives the node part of the root a name that we can use to access it. Both structs as_tree_node and the unnamed one are of the same type. So it is safe to access members of one via the other. Thus root->left_branch is equivalent to root->as_tree_node.left_branch.

process(root->left_branch);
process(&root->as_tree_node);

No casts involved; Just plain member accesses. Full type safety. But the ugly ampersand is back. Don't really know what to do about that, though.