Virtual Constructor
15 September, 2013 - 12 min read
To implement virtual functions, C++ uses a special form of late binding known as the virtual table. The virtual table is a lookup table of functions used to resolve function calls in a dynamic/late binding manner. The virtual table sometimes goes by other names, such as “vtable”, “virtual function table”, “virtual method table”, or “dispatch table”.
Because knowing how the virtual table works is not necessary to use virtual functions, this section can be considered optional reading.
The virtual table is actually quite simple, though it’s a little complex to describe in words. First, every class that uses virtual functions (or is derived from a class that uses virtual functions) is given it’s own virtual table. This table is simply a static array that the compiler sets up at compile time. A virtual table contains one entry for each virtual function that can be called by objects of the class. Each entry in this table is simply a function pointer that points to the most-derived function accessible by that class.
Second, the compiler also adds a hidden pointer to the base class, which we will call *__vptr. *__vptr is set (automatically) when a class instance is created so that it points to the virtual table for that class. Unlike the *this pointer, which is actually a function parameter used by the compiler to resolve self-references, *__vptr is a real pointer. Consequently, it makes each class object allocated bigger by the size of one pointer. It also means that *__vptr is inherited by derived classes, which is important.
By now, you’re probably confused as to how these things all fit together, so let’s take a look at a simple example:
12 |
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
public:
virtual void function1() {};
virtual void function2() {};
};
class D1: public Base
{
public:
virtual void function1() {};
};
class D2: public Base
{
public:
virtual void function2() {};
};
Because there are 3 classes here, the compiler will set up 3 virtual tables: one for Base, one for D1, and one for D2.
The compiler also adds a hidden pointer to the most base class that uses virtual functions. Although the compiler does this automatically, we’ll put it in the next example just to show where it’s added:
12 |
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
public:
FunctionPointer *__vptr;
virtual void function1() {};
virtual void function2() {};
};
class D1: public Base
{
public:
virtual void function1() {};
};
class D2: public Base
{
public:
virtual void function2() {};
};
When a class object is created, *__vptr is set to point to the virtual table for that class. For example, when a object of type Base is created, *__vptr is set to point to the virtual table for Base. When objects of type D1 or D2 are constructed, *__vptr is set to point to the virtual table for D1 or D2 respectively.
Now, let’s talk about how these virtual tables are filled out. Because there are only two virtual functions here, each virtual table will have two entries (one for function1(), and one for function2()). Remember that when these virtual tables are filled out, each entry is filled out with the most-derived function an object of that class type can call.
Base’s virtual table is simple. An object of type Base can only access the members of Base. Base has no access to D1 or D2 functions. Consequently, the entry for function1 points to Base::function1(), and the entry for function2 points to Base::function2().
D1′s virtual table is slightly more complex. An object of type D1 can access members of both D1 and Base. However, D1 has overridden function1(), making D1::function1() more derived than Base::function1(). Consequently, the entry for function1 points to D1::function1(). D1 hasn’t overridden function2(), so the entry for function2 will point to Base::function2().
D2′s virtual table is similar to D1, except the entry for function1 points to Base::function1(), and the entry for function2 points to D2::function2().
Here’s a picture of this graphically:
Although this diagram is kind of crazy looking, it’s really quite simple: the *__vptr in each class points to the virtual table for that class. The entries in the virtual table point to the most-derived version of the function objects of that class are allowed to call.
So consider what happens when we create an object of type D1:
12 |
3
4
D1 cClass;
}
Because cClass is a D1 object, cClass has it’s *__vptr set to the D1 virtual table.
Now, let’s set a base pointer to D1:
12 |
3
4
5
D1 cClass;
Base *pClass = &cClass;
}
Note that because pClass is a base pointer, it only points to the Base portion of cClass. However, also note that *__vptr is in the Base portion of the class, so pClass has access to this pointer. Finally, note that pClass->__vptr points to the D1 virtual table! Consequently, even though pClass is of type Base, it still has access to D1′s virtual table.
So what happens when we try to call pClass->function1()?
12 |
3
4
5
6
D1 cClass;
Base *pClass = &cClass;
pClass->function1();
}
First, the program recognizes that function1() is a virtual function. Second, uses pClass->__vptr to get to D1′s virtual table. Third, it looks up which version of function1() to call in D1′s virtual table. This has been set to D1::function1(). Therefore, pClass->function1() resolves to D1::function1()!
Now, you might be saying, “But what if Base really pointed to a Base object instead of a D1 object. Would it still call D1::function1()?”. The answer is no.
12 |
3
4
5
6
Base cClass;
Base *pClass = &cClass;
pClass->function1();
}
In this case, when cClass is created, __vptr points to Base’s virtual table, not D1′s virtual table. Consequently, pClass->__vptr will also be pointing to Base’s virtual table. Base’s virtual table entry for function1() points to Base::function1(). Thus, pClass->function1() resolves to Base::function1(), which is the most-derived version of function1() that a Base object should be able to call.
By using these tables, the compiler and program are able to ensure function calls resolve to the appropriate virtual function, even if you’re only using a pointer or reference to a base class!
Calling a virtual function is slower than calling a non-virtual function for a couple of reasons: First, we have to use the *__vptr to get to the appropriate virtual table. Second, we have to index the virtual table to find the correct function to call. Only then can we call the function. As a result, we have to do 3 operations to find the function to call, as opposed to 2 operations for a normal indirect function call, or one operation for a direct function call. However, with modern computers, this added time is usually fairly insignificant.
What happens in the below code:
class Base
{
public:
void func() {
cout << "Base Class" << endl ;
}
} ;
class D1: public Base
{
public:
virtual void func() {
cout << "D1 Class" << endl ;
}
} ;
int main ()
{
Base * p = new D2() ;
p->func() ;
}
Will the base func() be called or the D1 func() will. It will be Base.func() will be called.
the appropriate method to be called is decided by fetching the vpt
, fetching the address of the appropriate method and then calling the appropiate member function.
Thus dynamic dispatch is essentially a fetch-fetch-call
instead of a normal call
in case of static binding.
_vptr :
This vtable pointer or _vptr, is a hidden pointer added by the Compiler to the base class. And this pointer is pointing to the virtual table of that particular class.
This _vptr is inherited to all the derived classes.
Each object of a class with virtual functions transparently stores this _vptr.
Call to a virtual function by an object is resolved by following this hidden _vptr.
Here, I will explain what happens when you use the “virtual” keyword in C++.
This post will cover the following topics:
- _virtual___ __keyword
- _vtable___ __(___virtual function table_ __or__ _virtual method table_) and__ ___vptr___
- _this___ __pointer and virtual functions
I will use many code examples, so don’t worry if you don’t understand something, look at the code example and it would be ok :)
Virtual keyword
When you use a pointer to a base class that was instantiated from a derived class, you can use the virtualkeyword to call functions from the real instance.
#include <iostream>
class Human
/* ^ Human is the base class */
{
public:
virtual ~Human() {}
void sayHello() const { std::cout << "Hello, I'm a Human!" << std::endl; }
virtual void talk() const { std::cout << "Hey, how are you?" << std::endl; }
};
class Sprinter : public Human
/* ^ Sprinter is the derived class */
{
public:
virtual ~Sprinter() {}
/* ^ Not mandatory here, the compiler will create a default virtual
destructor because we inherit from Human that already defines a virtual destructor */
void sayHello() const { std::cout << "Hi, I'm a Sprinter!" << std::endl; }
virtual void talk() const { std::cout << "Do you like to run?" << std::endl; }
};
int main()
{
Human* fakeHuman = new Sprinter();
/* ^ implicit static\cast from Sprinter* to Human* */_
Human* human = new Human();
fakeHuman->sayHello();
/* ^ non-virtual function: static call to Human::sayHello()
output: Hello, I'm a Human! */
fakeHuman->talk();
/* ^ virtual function: dynamic call to Sprinter::talk()
output: Do you like to run? */
human->sayHello();
/* ^ non-virtual function: static call to Human::sayHello()
output: Hello, I'm a Human! */
human->talk();
/* ^ virtual function: dynamic call to Human::talk()
output: Hey, how are you? */
delete fakeHuman;
/* ^ thanks to the virtual destructor, it will be cleaned by calling Sprinter::~Sprinter() */
delete human;
/* ^ Human::~Human() is called here */
return 0;
}
Ok, we understand how to use the virtual keyword, but how the compiler manages to call the correct function?!
vtable and vptr
A virtual function table is like a static array that contains pointers to the (virtual) functions of a class.
Each class having at least one virtual function will get its own vtable.
When one of these classes is instantiated, it gets an extra variable member (called vptr) that will contains a pointer to its vtable.
Here’s what the vtables might look like:
You have to know that when you compile a C++ code, the class functions are transformed into regular functions having an extra parameter: a pointer to the instance of the class, named “this” (yes, that’s where the “this” pointer comes from).
So, a class function void Human::talk(); could be transformed to a C function like void_human_talk(Human* this); (you can Google C++ name mangling if you are interested to how compilers name symbols after C++ function names).
To simplify, let’s use a FuncPtr type as a basic pointer to function: typedef void (*FuncPtr)(void*this);
The Human vtable might be:
static const FuncPtr _vtable_Human[] =
{
&_human_destructor,
&_human_talk
};
The compiler just has to add an extra variable member to these classes: private: FuncPtr* _vptr;
Then initialize it in the constructor of these classes.
In the constructor of Human: this->_vptr = &_vtable_Human;
In the constructor of Sprinter: this->_vptr = &_vtable_Sprinter;
When you call a virtual function, the compiler just has to look at the vtable to call the correct function.
When you write:
Human* human = new Sprinter();
human->talk();
A compiler might generate this code:
Human* human = new Sprinter();
/* ^ the constructor of Sprinter will initialize the \vptr attribute. */_
FuncPtr talkPtr = human->_vptr[1];
/* ^ [0]: destructor, [1]: talk */
(*talkPtr)(human);
/* ^ Call to the correct function! Here, "human" is the "this" pointer.
It's sometimes a little bit more complicated to get a correct
"this" pointer, more on that bellow. */
For your information, you can call a specific implementation of a class function by explicitly naming it:
Human* human = new Sprinter();
human->Human::talk();
/* ^ Will NOT use the vtable, and will directly call "talk"
implementation of the Human class */
this pointer and virtual functions
Sometimes, the compiler has to do a little bit of arithmetic to pass the correct this pointer to a virtual function.
Imagine a class Centaur that inherits from both Human and Horse:
Here is a possible memory representation of the Centaur class (note: in this example, the compiler optimizes the memory by using the same vptr attribute for both the Human and Centaur classes):
So, what happens if we delete an instance of Centaur through a pointer to Human?
As the data for a Human and a Centaur starts at the same position in memory, no special operations are needed. The this pointer is the same for both classes.
But what happens when we use an instance of Centaur through a pointer to Horse?
Well… the static cast from a Centaur pointer to a Horse pointer will move the data pointer from some bytes (from the Centaur data to the Horse data).
Centaur* centaur = new Centaur();
Human* human = static_cast<Human*>(centaur);
/* ^ Internally: human = ((void*)centaur) + 0; // <= same pointer ! */
Horse* horse = static_cast<Horse*>(centaur);
/* ^ Internally: horse = ((void*)centaur) + sizeof(Human); // <= different pointer ! */
So, when you delete the “horse” pointer that is an instance of Centaur, how the compiler passes the correctthis pointer?
A way to do that (the g++ way!) is to use a “wrapper” that will modify the this pointer then call the correct function.
void _destructor_horse_fromCentaur(Horse* this)
{
Centaur* centaur = static_cast<Centaur*>(this);
/* ^ Internally: centaur = ((void*)this) - sizeof(Human); */
_destructor_centaur(centaur);
/* ^ Call to Centaur::~Centaur() then operator delete(centaur) to free the memory. */
}
The vtable of Centaur will use the \destructor_centaur function as destructor.
While the vtable of Horse instantiated from a Centaur will use the _destructor_horse_fromCentaur_ function as destructor.
I hope this explanation helped you to understand how the “magic” of C++ really works!
Please tell me if I made some mistakes or if something is unclear. Thank you!
The runtime class of the object is a property of the object itself. In effect, vptr
represents the runtime class, and therefore can't be static
. What it points to, however, can be shared by all instances of the same runtime class.
The whole point of the vptr
is because you don't know exactly which class an object has at runtime. If you knew that, then the virtual function call would be unnecessary. That is, in fact, what happens when you're not using virtual functions. But with virtual functions, if I have
class Sub : Parent {};
and a value of type Parent*
, I don't know at runtime if this is really an object of type Parent
or one of type Sub
. The vptr lets me figure that out.
END