A few years ago, the folks at the Free Software Foundation quietly changed the meaning of
gcc.
Whereas it originally meant "GNU C Compiler", they now say that it stands for "GNU Compiler Collection" since it has become
a front end for many different compilers.
However, I have my own theory and have decided that it really stands for "GNU Command Collection".
This is in honor of the more than 10,000 command line options
that one can pass to gcc.
Alright, maybe not 10,000... perhaps I exaggerate a wee bit. But gcc is certainly *rich* with
options -- more than I will ever use in my lifetime, I suspect. Nevertheless, I recently discovered,
among the dense forest of choices, a gcc option that is very cool and that I can't resist playing with.
This options allows you to trace all function calls in the source code. Simply include:
-finstrument-functions
on the compile command line, and gcc will insert calls to special hook functions at the entry and exit
of every function in the source code. To enable this, you must define a pair of functions with these signatures:
__cyg_profile_func_enter (void *this_fn, void *call_site)
__cyg_profile_func_exit (void *this_fn, void *call_site)
Not surprisingly, __cyg_profile_func_enter() is called at the entry of every function.
The address of the function being entered is the first parameter;
the address of the calling function is the second parameter.
Likewise, __cyg_profile_func_exit() is called at the exit point of every function,
using the same two parameters.
Ah, system hooks! (eyes widen with excitement). What fun you can have when you are provided
with hooks into the system. With entry and exit hooks such as these, the obvious exploits are to
implement tracing (to observe the call tree) and profiling (to obtain statistics on relative
function usage and timing). You could probably even add in enough backing code to support a kind of
primitive debugger, but the first two are good starters.
Having seen the availability of these hooks, there was no way I could resist trying to write
some code to take advantage of them. So what kind of service should I implement?
Tracing or profiling? Hmmm... how about both? It's not too hard to do a little of each,
and, without being too greedy, generate scads of useful info for a small amount of work.
Tracing the call tree
Functions call other functions (which, in turn, call other functions).
This creates a run time structure that is like a tree.
For instance, function A calls function B which calls function C:
enter A
enter B
enter C
...
exit C
...
exit B
...
exit A
Tracing function calls allows you to view the tree-like traversal of the execution environment.
Of course, you can do your own tracing by hand, as in:
void
foo ()
{
printf ("entering foo...\n");
. . .
// function code
. . .
printf ("...exiting foo\n");
}
Probably every programmer has written code like that at least a few times, usually in the midst
of a bug hunt. But imagine doing that for every single function in the source code. It's too
tedious to even contemplate. That's the beauty of the gcc profile hooks: they'll generate the
entry and exit calls for you, automatically, and for every single function. All you have to do
is flesh out the hook routines with something useful.
There is one little problems with this scenario. If the compiler generates entry and exit hooks
for every function in the source code, won't it generate those for the hooks themselves
(after all, they are functions too)? Well, yes, it will. Unless you instruct gcc to not generate
the hooks. There is a special function attribute that will stop the compiler from trying to trace the tracing
function themselves.
A function attribute? What is that? Well, gcc has a set of extensions called attributes that allow
you to control some specific aspects of the compiler's behavior. Mostly these attributes concern
low-level, architecture specific details.
Look at the document Declaring Attributes of Functions
for more information about these directives. They're a pretty fascinating topic in and of themselves.
Of course, since they are extensions, they are specific to gcc, so any source code that makes use of
attributes will be non-portable to other compilers.
The attribute that prevents gcc from generating the tracing hooks is:
__attribute__((__no_instrument_function__))
Yes, the double parentheses are required.
Just place this tag immediately before a function name in the definition, and gcc will not
emit the extra calls. All other functions without this special attribute will have the tracing hooks
generated. To make this easier to use, I define a macro called NO_TRACE that expands to this
rather awkward text.
Ezprof
Ezprof is the cheesy name that I choose for my tracer/profiler (cuz it's so EZ.. see?).
Since tracing is a rather dynamic thing, I thought it would be fun to write ezprof as a server.
That is, the running program would send messages (via IPC) to ezprof, which would then display the call
tree as each function was invoked. The overhead for sending the messages is small, so there is little
(if any) effect on the program execution. When the client program finishes, the server then computes
a few profiling statistics and writes them to stdout.
So how is the client/server messaging setup?
A standard message port is used to deliver the address calls from the client app to the profile server.
The client calls write_port() each time a hook is invoked, and the server calls read_port() to receive the messages.
There is a file called trace.c that contains the needed tracing hooks.
You need only include this file with any source program that you wish to profile.
Another file, ezprof.c, implements the server.
To set this up, first compile the server and run it.
Then compile your source code with the -finstrument-functions compiler option and the trace.c file included.
Then run the client program -- it will sends messages to the server at the entry and exit of each function call.
The server counts how many times each function is called, and stores the cumulative time spent in each function.
When the client exits, the server displays the final statistics and stops.
Here's the file trace.c containing the needed client interface:
#include <stdio.h>
#include <OS.h>
#define NO_TRACE __attribute__((__no_instrument_function__))
void NO_TRACE __ezprof_trace (int, void *);
void
NO_TRACE
__cyg_profile_func_enter (void *this_fn, void *call_site)
{
// called upon function entry
__ezprof_trace ('entr', this_fn);
}
void
NO_TRACE
__cyg_profile_func_exit (void *this_fn, void *call_site)
{
// called at function exit
__ezprof_trace ('exit', this_fn);
}
void
NO_TRACE
__ezprof_trace (int mode, void *this_fn)
{
// sends the function address to the ezprof server
//
// portability warning:
// this routine assumes the function address is 4 bytes
// (true for IA-32 machines -- Intel PCs, etc.)
static bool initialized = false;
static port_id port;
static uint32 addr;
if (!initialized)
{
port = find_port ("ezprof_port");
if (port < 0)
{
fprintf (stderr, "unable to connect to ezprof server\n");
exit (1);
}
initialized = true;
}
addr = (uint32) this_fn;
write_port (port, mode, &addr, sizeof addr);
}
Not much to it, really. The compiler hooks are just wrappers for the one
tracing routine which sends the given function address along with the
message code 'entr' or 'exit' to the ezprof server.
The 'addr' variable isn't strictly required, but it's useful for two reasons.
First, we need to pass the address of a buffer to write_port(), and using a
static variable avoids passing the address of a value on the stack. Secondly,
it balances the write_port() and read_port() calls by allowing both to use a
4-byte integer for the message buffer.
Of course, for this to be valid, it has to be assumed that the function address
is, in fact, 4 bytes. This will be true on any IA-32 machine
(Intel Architecture-32 bit) and probably most other 32-bit computers as well.
Function addresses are not the same size as integers on every machine
architecture, but they are on most, and none of the exceptions are of any
importance to us in the BeOS world.
Gathering symbols
As convenient as the gcc tracing hooks are, there is one obvious problem that has to be dealt with:
only the function addresses are passed, not the symbolic names. This limits the usefulness of
the tracing and profiling output considerably. It's not very convenient to report that
function 0x800009f0 was invoked 23 times. What we really want to report is that function
my_function() was invoked 23 times. Thus we need to load the symbolic names that are
associated with each address.
Ordinarily the symbols are located within the image file itself.
For some compilers, you have to specify that the symbols be kept, otherwise they are stripped out
during the compilation. Gcc, however, keeps all symbols in the image file -- at least by default.
Because of this, you should not have to worry about anything, just compile as usual.
Obviously, you don't want to use any command options that will remove the symbols.
How to go about gathering the symbols? Well, a couple of options come to mind.
The first idea I had was to shell to 'nm',
a binutils program
that returns a listing of all the symbols found in an image file, and just read from its output.
This program may or may not be included with your version of the BeOS. If it's not already present,
you can download it as part of the BeOS R5 Developer Tools found here.
But then, I remembered that you can load the image symbols directly by using some image functions
in the Kernel Kit. It's pretty simple really. Here's a test program I wrote to do just that.
// ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
// symdump.c
//
// displays the symbols found in a given BeOS image file
//
// Copyright (c) Daniel Reinhold
// written May 2002
//
// ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <image.h>
#include <Errors.h>
#include <OS.h>
char *
symboltype (int32 type)
{
switch (type)
{
case B_SYMBOL_TYPE_DATA: return "DATA";
case B_SYMBOL_TYPE_TEXT: return "TEXT";
case B_SYMBOL_TYPE_ANY: return "ANY";
}
}
void
set_absolute_path (char *path, int len, char *arg)
{
char *p = strchr (arg, '/');
if (p)
strcpy (path, arg);
else
{
getcwd (path, len);
strcat (path, "/");
strcat (path, arg);
}
}
int
main (int argc, char *argv[])
{
if (argc == 2)
{
image_id img;
char imgfile[256];
set_absolute_path (imgfile, 256, argv[1]);
img = load_add_on (imgfile);
if (img < 0)
printf ("unable to load image: '%s'\n", strerror (img));
else
{
char buf[600];
int len;
int32 type;
void (*addr)(void);
int i, n = 0;
for (i = 0; n != B_BAD_INDEX; ++i)
{
len = sizeof buf;
n = get_nth_image_symbol (img, i, buf, &len, &type, (void **) &addr);
printf ("[%4d]:%4s\t %p\t '%s'\n", i, symboltype (type), addr, buf);
}
}
}
else
puts ("usage: symdump imagefile");
putchar ('\n');
return 0;
}
This is nice, but there's still a problem... C++ name mangling. C++ programs generate mangled names for the symbols and
the simple routine above can't unmangle them back.
For example, a fuction with the signature "int doit (char *, int, int)"
becomes "doit__FPcii" in the image file.
There is no demangling function available in the Kernel Kit, which is too bad
(perhaps we need to add one). I don't know the algorithm gcc is using either. I tried looking it up,
found a few references, but nothing definitive. You can somewhat guess the algorithm used by
comparing mangled versions to their original form, but writing a general function to do this correctly
for all cases is very non-trivial.
So I went back to my original idea. You see, 'nm' has an option, --demangle, that will decode
the mangled names back to their readable versions in the source code. Thus, ezprof executes the call
"nm --demangle" at startup time and receives the program output via a pipe. The output is
parsed and the symbols loaded into a hash table.
Keeping statistics
Given that the server can load up the symbolic names and adresses for functions in an image file,
what kind of statistics should be kept? Well, your imagination is the limit here, but I only do the
obvious stuff.
Naturally, we want to know how many times a function is called, so we keep a count.
We also want to know how long each function invocation took, so we keep some timing info too.
In the case of ezprof, this is stored in two ways: one variable marks the system time when the
function is invoked, another stores the cumulative (total) time that all the function calls have used.
Here's the node structure used by the hash table:
typedef struct node node;
struct node
{
char *name; // function name
u4 addr; // function address
u4 calls; // number of times the function has been called
bigtime_t start_time; // time the current function call began
double cum_time; // cumulative time spent in function
node *next;
};
This is pretty much the basic info that any profiler would want.
You could do more if you care to add in the extra processing.
Fixing up addresses
When the server starts up, it loads the symbolic names and addresses of each function
as found by the 'nm' utility. However, the addresses aren't quite right. They need to
be "fixed up" before being added to the hash table. Fortunately, the fixup
is very, very simple. But the reason for the fixup is a little more complicated.
I can't really explain it all without delving a bit into the topic of linking and loading.
The process of compiling/executing programs is naturally divided into two phases
usually called "compile time" and "run time".
Compilers themselves don't execute code. They merely translate the source code (text files)
into a binary format and save it to disk. The binary file is available for
execution later. To execute the binary file, another program, called a loader,
loads the binary file into memory and then runs it.
To be more precise, the compiler proper creates the object (*.o) files, and another program,
called a linker, stitches the object files together to create the image file, which is saved to disk.
For BeOS Intel, the binary format is ELF, so I'll just refer to the image files as ELF files
for simplicity.
Now there is a mix of terms used to describe these utilities that can be very confusing.
The linker is also sometimes called a "linking loader" or even just a "loader".
This is reflected in the name of gcc's linker
which is called "ld". The program that loads existing image files is called a "program loader", "run time loader",
or just "loader" or, sometimes, it's even referred to as the "run time linker". Not very helpful.
For my part, I will always refer to the compile time utility that creates ELF files
as the "linker" and the run time utility that loads existing ELF files as the "loader".
Obviously, for the compilation/execution process to succeed, the linker and loader have to work together.
Originally, there was something of a linking/loading mismatch in ezprof -- at least, without the function address fixup.
I was quite disappointed when my first sample runs wouldn't produce a function trace,
but instead only gave "address not found" errors when doing table lookups.
An inspection of the addresses made it clear very quickly what was going wrong:
if function foo() had address 0x80000c80 sent by the client app,
it had address 0x00000c80 stored in the hash table.
There was an additional offset of 0x80000000 in the client addresses that wasn't present in the addresses stored by the server.
Of course, the difference lies in the source of the information. The hash table was loaded from
the output of the 'nm' utility which read the stored ELF file. In other words, it was spitting
back the addresses exactly as they had been written by the linker. But the client app was actually
running, which meant it had been processed by the loader. The loader is a sub-system of the kernel
and does any address fixups needed to make the freshly created process fit the memory model that the kernel uses.
For the BeOS, the memory map looks like this:
0x00000000 - kernel
0x80000000 - application code
heap follows code
0xea000000 - addons
0xec000000 - libraries
0xfc000000 - stack
Notice that addresses 0x0 up to 0x7fffffff are in the kernel.
Userland starts at address 0x80000000.
But the linker doesn't know this. And doesn't need to know. It just assumes that all addresses are relative
to address 0x0. It lets the loader do any address fixups needed to create a running process.
The linker is part of gcc, written by members of the Free Software Foundation.
It is not part of the kernel and knows nothing about the BeOS memory map.
The loader, in turn, was written by Be programmers and is naturally tied to the specifics of the BeOS kernel.
Of course, the two can work together because they both understand and use the ELF format.
Well, knowing what the problem was, the fix was easy enough: just add 0x80000000 to any function address
before storing it in the server's hash table. That is, if it's a userland function. So I defined a macro
called USER_MODE that causes the fixup to occur if the macro is true. So, does this mean kernel code can be
traced/profiled with ezprof as well? Um... actually, I don't know.
I think it should work, but honestly, I haven't even tried it yet (*shrug*).
Closing the connection
When the client program finishes, the server can compute the final profiling statistics and display them.
How does it know when the client is finished? Continually poll the client with ping messages? Look up the
client program in the roster and try to detect when it is no longer there?
Nah, I didn't want to fiddle with any of that mess.
Instead, ezprof has two different ways of closing the connection.
The first method (the nice one) is when the client's main() function exits. To enable detecting this,
the server makes note of the address of the main function at startup. Every time an exit function message is
received, the address is checked to see if it's the main function that is ending. If so, the server loop is
broken, and the stats can be generated.
If, however, the client program does not exit normally, it's up to the user, watching the command line, to
kill the server manually. A signal is raised on a SIGINT interrupt (triggered by typing Ctl-C or Alt-C), which will then
cause the server to display the final stats and exit. Not all that graceful, but it works.
Excluded functions
Ezprof does not actually store info on every function. Only the functions that are available in the source code.
Library functions used by the client program are not profiled. This is for the simple reason that the profile
hooks can only be generated for functions being compiled.
This may seem like a limitation that will weaken the value of the profiling numbers, but it isn't really.
If you have the source for a needed library, you can compile it with tracing too -- i.e. make a bigger client program.
If you don't have the source, then profiling info isn't very useful anyway -- you won't be able to do a damned thing
about a bottleneck in a library function if you don't have access to the code (I guess you could complain to the vendor).
Also, the profile hooks themselves don't need to be profiled. Clearly, they are going to be called every time a
local function is called, so counting their invocations isn't useful. Hopefully, the timing penalty is minimal,
but it doesn't matter anyway.The purpose of profiling is to determine what parts of the source code could be
re-written to improve their speed or efficiency. The hooks are only present when debugging, so their effect is
ultimately not an issue.
Additionally, the standard startup and closedown routines found in all C programs is not profiled, again because the
hooks cannot be generated for them. All C programs have a run time environment that is initialized before the
main program starts, and then closed down after main exits to release any resources. That is, when the loader
runs a C program, it invokes a run time sequence that looks like this:
init_C_runtime();
main();
end_C_runtime();
Actually, the C runtime routines have names like _init, _init_one, and _end.
Additionally, C++ programs will run initialization code that runs constructors for global objects and then
the destructors for these objects at closedown. The same idea, basically.
Since they are run before and after the actual code in the client program, profiling them doesn't make sense
anyway.
What all these functions share in common is that their names all begin with an underscore, which is the usual convention
for naming internal system functions (user programs should never call them directly). Therefore, when ezprof
is loading the symbols into the hash table, it simply skips all the functions whose names begin with
an underscore. If you've named a local function in your own source code to start with an underscore, then it
will be skipped. But shame on you for doing that! That's a big no-no!
Finally, the main() function itself is not profiled. Since it represents the sum total of all activity
in the client program, including it would only cause all the other numbers to be skewed downward since the
entire program run (except for the init/close above) occurs inside main(). It is added to the hash table,
however, because the call tree would look odd if the main (root) function wasn't there.
Conclusion
There is a GNU profiler that is included with many POSIX systems called 'gprof'.
It is included with BeOS R5 as well (or at least with the developer tools mentioned above), but frankly, it doesn't work.
However, Be did provide their own profiler called "profile".
This used to be on their ftp server. You should be able to find it on one of the Be mirrors;
here's one that was good at the time of this writing:
ProfileTools_x86.zip.
Run "profile imagefile" to examine a particular program.
Calls to library functions are recorded, altho, as I mentioned before,
there is not much you can do about this information.
Given this, is there any reason to look at ezprof? Sure. As a learning tool, if nothing else.
Obviously, ezprof is not a high powered, production quality profiler.
Still, it does provide some insights into how profilers work and how we might build our own tools within
the project to examine and critique the quality of the source code.
Source Code:
ezprof.zip