C++ Interfaces and Templates

2020-08-25

Someone at Recurse Center recommended building a key value store as a beginner C++ project. This exposes you to a lot of corners of C++ such as templates (so that you can store whatever you want in your key value store in a type safe way), file access (so that you can persist) and networking (making it available over the internet). I got started the same way I would in any language: by building an interface for the core logic that could be plugged in to a persistence layer and a networking stack. Separations of concerns is great!

Unfortunately I pretty quickly ran in to a problem. I had three files. A key value store data structure header (kvslib.h) with an implementation (kvslib.cpp), and a main.cpp that used this data structure. Let's start with the header file for the key value store.

kvslib.h:

#include <string>
#include <map>

template <typename T>
class KeyValueStoreInterface
{
public:
    virtual T get(std::string key) = 0;
    virtual void put(std::string key, T value) = 0;
    virtual void remove(std::string key) = 0;
};

template <typename T>
class InMemoryKVS : KeyValueStoreInterface<T>
{
private:
    std::map<std::string, T> internalKeyValue;

public:
    InMemoryKVS();
    virtual T get(std::string key);
    virtual void put(std::string key, T value);
    virtual void remove(std::string key);
};

Here I'm defining a class, KeyValueStoreInterface that is generic over type T. It's has three functions get, put, remove which are pretty self explanatory. Importantly these are "virtual" functions. Virtual functions are functions that you expect to be redefined in child classes. Even if you have a pointer to a parent class, you can call virtual functions on it and it will be forwarded to the derived class.

Below that is another class InMemoryKVS that is a subclass of KeyValueStoreInterface. It's basically exactly the same, including the fact that it is generic, except it says that it will have a private member variable internalKeyValue that is a map of strings to T.

kvslib.cpp:

#include "kvslib.h"

template <typename T>
InMemoryKVS<T>::InMemoryKVS()
{
}

template <typename T>
T InMemoryKVS<T>::get(std::string key)
{
    return this->internalKeyValue[key];
}

template <typename T>
void InMemoryKVS<T>::put(std::string key, T value)
{
    this->internalKeyValue.insert(std::make_pair(key, value));
}

template <typename T>
void InMemoryKVS<T>::remove(std::string key)
{
    this->internalKeyValue.erase(key);

The implementation file is pretty simple. It only implements functions for InMemoryKVS and they work exactly how you'd expect.

main.cpp:

#include <iostream>
#include "kvslib.h"

int main()
{
    InMemoryKVS<int> x = InMemoryKVS<int>();
    x.put("hello", 5);

    std::cout << x.get("hello") << std::endl;

    x.remove("hello");

    std::cout << x.get("hello") << std::endl;

    InMemoryKVS<std::string> y = InMemoryKVS<std::string>();
    std::string str("world");
    y.put("hello", str);
    std::cout << y.get("foo") << std::endl;
    return 0;
}

Finally I use the InMemoryKVS from main. I make one that stores ints and another that stores strings. Simple!

Unfortunately compiling this results in a pretty inscrutable error:

bazel-out/k8-fastbuild/bin/_objs/kvs/main.pic.o:main.cpp:function main: error: undefined reference to 'InMemoryKVS<int>::InMemoryKVS()'
bazel-out/k8-fastbuild/bin/_objs/kvs/main.pic.o:main.cpp:function main: error: undefined reference to 'InMemoryKVS<int>::put(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
bazel-out/k8-fastbuild/bin/_objs/kvs/main.pic.o:main.cpp:function main: error: undefined reference to 'InMemoryKVS<int>::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
bazel-out/k8-fastbuild/bin/_objs/kvs/main.pic.o:main.cpp:function main: error: undefined reference to 'InMemoryKVS<int>::remove(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
bazel-out/k8-fastbuild/bin/_objs/kvs/main.pic.o:main.cpp:function main: error: undefined reference to 'InMemoryKVS<int>::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
bazel-out/k8-fastbuild/bin/_objs/kvs/main.pic.o:main.cpp:function main: error: undefined reference to 'InMemoryKVS<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::InMemoryKVS()'
bazel-out/k8-fastbuild/bin/_objs/kvs/main.pic.o:main.cpp:function main: error: undefined reference to 'InMemoryKVS<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::put(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
bazel-out/k8-fastbuild/bin/_objs/kvs/main.pic.o:main.cpp:function main: error: undefined reference to 'InMemoryKVS<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
bazel-out/k8-fastbuild/bin/_objs/kvs/main.pic.o:main.cpp:vtable for InMemoryKVS<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >: error: undefined reference to 'InMemoryKVS<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
bazel-out/k8-fastbuild/bin/_objs/kvs/main.pic.o:main.cpp:vtable for InMemoryKVS<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >: error: undefined reference to 'InMemoryKVS<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::put(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
bazel-out/k8-fastbuild/bin/_objs/kvs/main.pic.o:main.cpp:vtable for InMemoryKVS<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >: error: undefined reference to 'InMemoryKVS<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::remove(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
bazel-out/k8-fastbuild/bin/_objs/kvs/main.pic.o:main.cpp:vtable for InMemoryKVS<int>: error: undefined reference to 'InMemoryKVS<int>::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
bazel-out/k8-fastbuild/bin/_objs/kvs/main.pic.o:main.cpp:vtable for InMemoryKVS<int>: error: undefined reference to 'InMemoryKVS<int>::put(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int)'
bazel-out/k8-fastbuild/bin/_objs/kvs/main.pic.o:main.cpp:vtable for InMemoryKVS<int>: error: undefined reference to 'InMemoryKVS<int>::remove(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)'
collect2: error: ld returned 1 exit status

What on earth?

I see a lot of undefined references. But all of those functions are definitely defined. The fact that it specifies that it's looking for the <int> version of the function is an important clue. But ... why would it? Shouldn't the generic version satisfy the case where T is an int?

In order to explain what's happening here we need to understand what C++ generics, also known as templates, actually do.

How do Templates Work

(This won't be a super in depth explanation, since I'm just learning this myself!)

To tell the C++ that a class is generic you use the template keyword and that's for good reason: generics in C++ behave way more like Mustache or PHP than they do generics in other languages I'm used to. Take this line in my main.cpp file where we create an InMemoryKVS for ints:

InMemoryKVS<int> x = InMemoryKVS<int>();

When the compiler sees this it generates a whole new version of InMemoryKVS with all of the T replaced by ints, and then uses that as what x points to. For example, the type signature of get would become:

int InMemoryKVS<T>::get(std::string key)

With this in mind there's a plausible explanation for the above error message. The int version of InMemoryKVS isn't being generated. But why?

How does the C++ compiler work?

(This also won't be super in depth because C++ compilers are very complicated)

Remember back when you compiled your very first C(++) program? It probably looked something like this:

g++ main.cpp -o main

Later on you had a project that had two files some main file and some library file. You maybe compiled them like this:

g++ library.cpp -o library.o # compile the library
g++ main.cpp -o main.o # compile main
g++ main main.o library.o # Link the object code, creating an executable named main

(Even if you compiled them like g++ main.cpp library.cpp -o main I think it is essentially still doing the above)

When the compiler is compiling main it doesn't know anything about library or vice versa. The translation units are completely independent. This is why header files are important: they allow the compiler to see the rough shape of files you're including without having to compile the entire thing.

Now we can understand what's happening with my key value store interface.

What Happened

When the compiler compiles main it looks at main.cpp and any header files that it includes. In this case that's only kvslib.h (and iostream but let's ignore that for now). kvslib.h contains only the definitions for KeyValueStoreInterface and InMemoryKVS. This is sufficient to reference it.

However, when we reference InMemoryKVS<int> in main the compiler can't generate a complete definition for it. Why? Because all it has is the header for that constructor, not the implementation. Is T used to allocate more memory? It has no way of knowing. So it just doesn't generate it and assumes it will be defined later.

Separately the compiler also compiles kvslib.cpp. This file also includes kvslib.h. These files don't contain any usages of InMemoryKVS at all, never mind an int usage, so the compiler doesn't generate an int version of InMemoryKVS here either.

Then when the linker comes along to link these two translation units together it is only there that it sees there is no definition of InMemoryKVS<int> available, so finally the linker fails.

Solution

The solution is simple. Move the implementation of InMemoryKVS in to the header so that when the compiler compiles main, which includes kvslib.h it has the complete implementation so it can generate an int version of InMemoryKVS replete with function definitions.

This was, as a newcomer, super confusing and I'm now beginning to understand why people curse C++. :D

Things I'm going to look in to based off of this, that you might want to look in to as well if you find yourself here:

  • Forward declarations
  • Opaque pointers
  • And, my favorite, PIMPL!