This is a sort of back-to-basics article, so all you coding wizards out there can go back to sleep.
Being able to see the exact memory layout of data structures is very useful, even vital in some instances, so most
programmers will have a handy function for doing this laying around. Here's mine:
typedef unsigned char byte;
void
show_bytes (void *arg, int size)
{
byte *b = (byte *) arg;
printf ("\n{ ");
while (size--)
{
printf ("%02x ", *b++);
}
printf ("}\n");
}
This allows me to dump the binary contents of any variable like this:
int x = 3476;
show_bytes (&x, sizeof x);
On my machine this will display:
{ 94 0d 00 00 }
The hex value of 3476 is 0xd94, so you can clearly see that my machine, an Intel PC, uses little-endian byte order to
store data in memory. I like to think of it as "psychotic byte order" because you surely have a need to mess with people's
heads to store data in a screwed-up, reverse order like that!
By the way, if you want to quickly see the hex value of an integral value, here's a tip: go to a command line and
type "pc 3476" (or whatever number) to see it displayed in several formats. Pc stands for "programmer's calculator"
and was written by Dominic Giampaolo of BFS fame. It is integer only, but still very useful. It's included with every copy
of BeOS (and you probably didn't even know it!)
If the variable above had been stored as { 00 00 0d 94 } this would be big-endian byte order. Different machines store data
in either byte order, so it was decided (long ago) that all data sent across networks would be put in big-endian byte order
(i.e. the "sane byte order") as a standard format to avoid confusion and scrambling of data. Therefore, this format is usually called
network byte order or just NBO. This is important for network programming (using sockets, etc.) because you must make sure
that any bytes sent across the wire are first put into NBO for delivery and that any bytes received must be converted back to
the byte order used by the receiving computer -- usually called host byte order.
Now on to structures.
The function above works fine for structures too. For example:
typedef struct
{
char c;
short h;
int n;
char s[11];
}
foo;
. . .
foo f;
f.c = 'A';
f.h = 4533;
f.n = -777;
strcpy (f.s, "hello!");
show_bytes (&f, sizeof f);
On my machine, this displays:
{ 41 00 b5 11 f7 fc ff ff 68 65 6c 6c 6f 21 00 00 00 00 00 00 }
This is alright, but it would be more useful to see the offsets of the different field members marked in some way.
This would make it easier to see the binary contents of each field as well as the padding used to fill out the structure.
So I have another function that just dumps structures.
Note that we need to pass a little bit more info here. Every struct can have a different number of fields and field sizes,
so we need to pass in the offsets to let the dump function know where the boundaries are. Here's my version:
void
show_struct (void *arg, int size, int offs[])
{
int i;
int k = 0;
byte *b = (byte *) arg;
printf ("\n{ ");
for (i = 0; i < size; ++i)
{
if (i == offs[k])
{
if (i > 0) printf ("| ");
++k;
}
printf ("%02x ", *b++);
}
printf ("}\n");
}
The call for this would look about the same as above, except for adding an array to hold field offsets. The simplest
way to do this is to initialize an array using the 'offsetof' macro. This is a standard C library macro included in
<stddef.h>. Here's how to use it for my example:
foo f;
f.c = 'A';
f.h = 4533;
f.n = -777;
strcpy (f.s, "hello!");
{
int offs[] =
{
offsetof (foo, c),
offsetof (foo, h),
offsetof (foo, n),
offsetof (foo, s)
};
show_struct (&f, sizeof f, offs);
}
On my machine, this displays:
{ 41 00 | b5 11 | f7 fc ff ff | 68 65 6c 6c 6f 21 00 00 00 00 00 00 }
Here we can see how the first field member, a character, has nonetheless been stored in a two-byte slot. Also, the
character buffer, defined as 11 bytes is padded to 12 bytes. This gives the entire structure a 20 byte size, so we
can infer that the compiler wants to pack structures so that both individual members and the entire structure are
memory aligned on two-byte boundaries.
Now to make the output even just a bit more friendly, I'll throw in another version that displays bytes as characters
when possible. This is basically just a clone of 'show_struct' modified slightly:
void
show_struct_chars (void *arg, int size, int offs[])
{
char c;
int i;
int k = 0;
byte *b = (byte *) arg;
printf ("\n{ ");
for (i = 0; i < size; ++i)
{
if (i == offs[k])
{
if (i > 0) printf ("| ");
++k;
}
c = *b++;
printf ("%02c ", isprint (c) ? c : '.');
}
printf ("}\n");
}
Combining a call to this function with the previous version gives:
{ 41 00 | b5 11 | f7 fc ff ff | 68 65 6c 6c 6f 21 00 00 00 00 00 00 }
{ A . | . . | . . . . | h e l l o ! . . . . . . }
Gosh, ain't it perty?
Notice that the string "hello!" is in the proper left-to-right order. String data is composed only of single bytes,
so byte order doesn't come into play. As such, string data (i.e. a contiguous sequence of characters) can always be sent
from one machine to another regardless of local byte order conventions and still be interpreted correctly on the other end.
It's only multi-byte data types -- ints, floats, structs, etc. -- that must be packed carefully in NBO in order to be
correctly interpreted on the receiving machine.
Unfortunately, there's a big limitation to the technique used in my struct dumping routines. They only handle "flat"
structures -- i.e. structures whose fields are basic data types. What about structs that contain other structs?
Well, the dump function would need to be a bit more intelligent, but it's not too hard to implement.
However, I'll save that for Part 2 in the next newsletter.