Wednesday, June 17, 2009

C++ name mangling

Ever wondered what the 'extern "C" { ... }' block stands for in C++? In C and C++ the 'extern' keyword is used to declare external variables that are instantiated in other translation units. Using 'extern' tells the compiler not to allocate space for the variable as space is being allocated elsewhere. The keyword is also used within variable instantiation. In that case its semantics are to denote that the variable may be referenced from other translation units.

The 'extern "C" { ... }' block, however, has additional semantics. Let's first explain what name mangling is and why it is required.

In C functions cannot be overloaded by type. Thus we cannot declare, for example, functions with prototypes 'int foo(void)' and 'int foo(int)' in the same program. In C++, however, such overloading is possible. Due to the tight coupling of C and C++, particularly in the past (C++ was first implemented by compiling C++ code to C code and then using a regular C compiler to produce machine code), C++ compilers must use distinct symbols for overloaded functions to be compatible with the linker. Name mangling is the act of adding type information in the function names to separate overloaded functions from each other.

For example, the GNU C compiler mangles the first 'foo' function [int foo(void)] to '_Z3foov' and the second 'foo' function [int foo(int)] to '_Z3fooi'. The '_Z' is just a prefix that is a unique identifier in C/C++ that prevents conflicts with user defined identifiers. The number is the length of the original name, three in case of 'foo', and the following letters encode argument types (here 'i' for int and 'v' for void, respectively).

Now, suppose there is a C++ function 'bar' that we would want to use in our C module. We would declare a prototype for 'bar' in the C module that references the function and link the C module with the C++ module, that provides the definition for 'bar'. Let's try just that:


-------module.c--------
int bar(void);
int hooray() {
return bar() + 42;
}
-----------------------


-------bar.cpp--------
int bar(void) {
return 7;
}
-----------------------


Let's first compile our modules:

$ g++ -c -O0 -o bar.o bar.cpp
$ gcc -c -O0 -o module.o module.c


Then link them together to produce an executable

$ gcc -O0 bar.o module.o


module.o: In function `hooray':
module.c:(.text+0x7): undefined reference to `bar'
collect2: ld returned 1 exit status


The linker could not find 'bar' from the symbol table. And that is precisely due to C++ name mangling. Let's have a look inside bar.o:

$ objdump -D bar.o

-- [clip] --
00000000 <_Z3barv>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: b8 07 00 00 00 mov $0x7,%eax
8: 5d pop %ebp
9: c3 ret
-- [/clip] --


There it is again, the function is called '_Z3barv' instead of 'bar'. This is the point where the extern "C" block comes to the rescue. It declares that the symbols declared inside the block are to be referenced from C context, and thus the compiler cannot mangle the names.

If the code in bar.cpp is wrapped inside the extern "C" block, everything will work:

-------bar.cpp--------
extern "C" {

int bar(void) {
return 7;
}

}
-----------------------


Recompiling bar.cpp:

$ g++ -c -O0 -o bar.o bar.cpp

.. and relinking the modules:

$ gcc -O0 bar.o module.o


No errors. If you now disassemble bar.o you will see that the name of the bar function is now 'bar' instead of '_Z3barv', and thus the C module is able to reference the C++ function.

Friday, June 5, 2009

Software recommendations

It took a while but I have now replaced grep with ack to satisfy my daily grepping needs.

Another tip of the day is that if you are a Vim user you will want to start using ctags to move around in your codebase.