C tooling
One of my favorite past-time activities I’ve been getting into lately is looking into various tools I can use to make my C development experience better on the daily. This ranges from formatters to memory leak detection to debuggers. This blog isn’t a tutorial; I’m not explaining to you how to use these tools effectively, more so I want to make you aware of what’s possible and how large your toolkit can actually be!
Compiler
Let’s start simple with your compiler. Whether you’re using gcc
, clang
, or any other compiler, there are a few things you should be made aware of.
Warnings
Take this code:
int x = -8;
unsigned int y = 2;
if (x > y)
printf("%d is greater than %d\n", x, y);
else
printf("%d is less than or equal to %d\n", x, y);
Logically, you’d assume that since -8 is less than 2 it’d accurately print that -8 is less than or equal to 2
. Well, let’s compile and see what happens!
> gcc main.c
> ./a.out
-8 is greater than 2
The phenomenon we’re observing here is actually caused because we’re comparing a signed integer (x
) to an unsigned integer (y
).
I won’t go into detail here, but to sum it up; signed integers (especially negative ones) are stored differently in memory compared to unsigned integers.
In this case, the actually binary associated with -8 (0xFFF8) represents a totally different integer if it’s unsigned, namely 65528.
Because the signed integer is promoted into an unsigned integer, and thus reinterpreted as 65528, you’re actually comparing whether 65528 is greater than 2, which it obviously is (wanna read more on this? Take a look at my previous blog).
It would be really useful if the compiler would give you a heads-up about this to prevent any ambiguity in your code right? Well, luckily it can; just apply the -Wsign-compare
flag with the compilation and it will warn you about this potential danger:
> gcc -Wsign-compare main.c
main.c: In function ‘main’:
main.c:7:9: warning: comparison of integer expressions of different signedness: ‘int’ and ‘unsigned int’ [-Wsign-compare]
7 | if (x > y)
| ^
Or, if you wanna catch all extra warnings you can use -Wall -Wextra
:
> gcc -Wall -Wextra main.c
main.c: In function ‘main’:
main.c:7:9: warning: comparison of integer expressions of different signedness: ‘int’ and ‘unsigned int’ [-Wsign-compare]
7 | if (x > y)
| ^
Sanitizers
Warnings are really great, but your compiler can do so much more. For example catching memory leaks and undefined behavior through a sanitizer. To activate the sanitizer to catch memory leaks you can use the flag -fsanitize=address
.
Take the following code:
#include <stdlib.h>
int main() {
char* str = malloc(1000);
// Not freeing our memory here
return 0;
}
Now compiling and running this code it’ll notify us that we aren’t freeing all of the memory:
> gcc -fsanitize=address main.c
> ./a.out
=================================================================
==28796==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 1000 byte(s) in 1 object(s) allocated from:
#0 0x78eb5b0fd9c7 in malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
#1 0x5690c6fe819e in main (/tmp/tmp.JnHHZDQBFH/a.out+0x119e) (BuildId: a2d08c4cdc083bbbc8bf78d733164d056c13f521)
#2 0x78eb5ac2a1c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
#3 0x78eb5ac2a28a in __libc_start_main_impl ../csu/libc-start.c:360
#4 0x5690c6fe80c4 in _start (/tmp/tmp.JnHHZDQBFH/a.out+0x10c4) (BuildId: a2d08c4cdc083bbbc8bf78d733164d056c13f521)
SUMMARY: AddressSanitizer: 1000 byte(s) leaked in 1 allocation(s).
The sanitizer can catch way more memory related bugs like use-after-free or NULL deferences.
You have more options for your sanitizer though, like -fsanitize=undefined
, which catches undefined behavior like integer overflows. You can combine sanitizers like this -fsanitize=undefined,address
.
Sanitizers are great, but do keep in mind that a sanitizer slows down your code significantly, so it’s better if you just keep it for testing and leave it out of any performance-dependent production code!
Valgrind
A sanitizer isn’t the only way to catch memory bugs in your code; another classic tool is valgrind
.
Let’s use our previous code:
#include <stdlib.h>
int main() {
char* str = malloc(1000);
return 0;
}
If we run this program using valgrind
it’ll point out how many bytes are still in use after exiting, as follows:
> gcc main.c
> valgrind ./a.out
==33848== Memcheck, a memory error detector
==33848== Copyright (C) 2002-2022, and GNU GPL\'d, by Julian Seward et al.
==33848== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==33848== Command: ./a.out
==33848==
==33848==
==33848== HEAP SUMMARY:
==33848== in use at exit: 1,000 bytes in 1 blocks
==33848== total heap usage: 1 allocs, 0 frees, 1,000 bytes allocated
==33848==
==33848== LEAK SUMMARY:
==33848== definitely lost: 1,000 bytes in 1 blocks
==33848== indirectly lost: 0 bytes in 0 blocks
==33848== possibly lost: 0 bytes in 0 blocks
==33848== still reachable: 0 bytes in 0 blocks
==33848== suppressed: 0 bytes in 0 blocks
==33848== Rerun with --leak-check=full to see details of leaked memory
==33848==
==33848== For lists of detected and suppressed errors, rerun with: -s
==33848== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
As you can see in the HEAP SUMMARY
, there are still 1,000 bytes in use at the point of the program exiting. It also keeps track out of the amount of memory allocations and memory frees.
You might be wondering “Why use Valgrind over a sanitizers or vice versa?”, the answer comes down to a trade-off; sanitizers are generally faster than using a Valgrind, but Valgrind doesn’t require the recompilation of your program.
Pahole
One of the easiest way to shave off a few bytes in your program is through removing any padding which is currently present in your structs. We can use pahole
to analyze the layout of our structs, and it’ll display all kinds of useful data about the struct, which can help us minimize the padding which is added.
Take the following code:
struct Person {
char first_name_initial;
int8_t age;
int64_t id;
enum Colors fav_color;
char *name;
};
Our Person
has 5 different fields associated with it, let’s analyze!
> clang -g main.c
> pahole ./a.out
struct Person {
char first_name_initial; /* 0 1 */
int8_t age; /* 1 1 */
/* XXX 6 bytes hole, try to pack */
int64_t id; /* 8 8 */
enum Colors fav_color; /* 16 4 */
/* XXX 4 bytes hole, try to pack */
char * name; /* 24 8 */
/* size: 32, cachelines: 1, members: 5 */
/* sum members: 22, holes: 2, sum holes: 10 */
/* last cacheline: 32 bytes */
};
Let’s first explain the structure of the output.
As you can see there is the struct with each field, after each field there is a comment with two numbers.
The first number represents the offset of the field and the second one the size of it.
At the end of the struct you see some more general information like the overall size of Person
, the amount of cachelines it uses and the amount of members on it.
One thing you might’ve noticed is that there are two holes in between the struct, one 6 byte hole and one 4 byte hole. I won’t go into detail here, you can read my other blog, but to put it simply, members on a C struct must be ordered from the highest alignof
to the lowest alignof
value. In this case that order would look like this:
struct Person {
int64_t id;
char *name;
enum Colors fav_color;
int8_t age;
char first_name_initial;
};
Which, after analyzing the struct again, gives us the following output:
struct Person {
int64_t id; /* 0 8 */
char * name; /* 8 8 */
enum Colors fav_color; /* 16 4 */
int8_t age; /* 20 1 */
char first_name_initial; /* 21 1 */
/* size: 24, cachelines: 1, members: 5 */
/* padding: 2 */
/* last cacheline: 24 bytes */
};
As you can see, no more holes in our structs and 8 bytes less data!
Clang-format
It’s no lie that code can get pretty messy from time to time, especially working across a team. Therefore, it’s important to use formatters; not only for consistency but also everyone’s sanity. Luckily, clang-format
exists!
Take our (overly messy) code for example:
#include <stdio.h>
#include <stdlib.h>
int main() {
int * m= NULL;
if ( !m)
printf("m is null\n");
else
printf("m is real\n");
return 0;
}
We can format this whole file to follow just one format style using the following command:
> clang-format main.c -i
Now, looking at our code again:
#include <stdio.h>
#include <stdlib.h>
int main() {
int *m = NULL;
if (!m)
printf("m is null\n");
else
printf("m is real\n");
return 0;
}
Neat and organized indeed.
I like to add a
format
command to my makefile to format every file in mysrc
directory:format: @find src/ -iname "*.h" -o -iname "*.c" | xargs clang-format -i
Debuggers
One of the most powerful tools for C are debuggers. Debuggers are tools that let you pause execution, inspect variables and step through your code line by line, which can be really useful for catching tricky bugs. gdb
is one of these debuggers. To use gdb
you should activate debug symbols in your binary using the -g
flag on compilation. If you want to have specific debug symbols for gdb
you can also use the -ggdb
flag:
> gcc -ggdb main.c
Now, if you want to actually use the debugger just run the binary with gdb
as followed:
> gdb ./a.out
gdb
is a really complicated program, but I’ll give you a real simple demo of how you can run and analyze the following program:
int main() {
int i = 5;
i += 7;
return 0;
}
After compilation and running the binary in gdb
you get into the gdb
cli, now you can set your breakpoints and run your code:
(gdb) break main
Breakpoint 1 at 0x1131: file main.c, line 2.
(gdb) run
Starting program: /tmp/tmp.7q8c456djm/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, main () at main.c:2
2 int i = 5;
(gdb) next
3 i += 7;
(gdb) info local
i = 5
(gdb) next
4 return 0;
(gdb) info local
i = 12
(gdb) next
5 }
(gdb) next
__libc_start_call_main (main=main@entry=0x555555555129 <main>, argc=argc@entry=1, argv=argv@entry=0x7fffffffdc68) at ../sysdeps/nptl/libc_start_call_main.h:74
warning: 74 ../sysdeps/nptl/libc_start_call_main.h: No such file or directory
(gdb) next
[Inferior 1 (process 51643) exited normally]
(gdb) exit
On the first line we set our break point, that’s where our program will stop execution and wait for any prompts.
Then we execute run
, which simply runs the program.
info local
is a command used to print out all of the local variables and their value; in this case you can see i
changing between values.
next
goes into the next line of execution.
This is a really basic rundown on what you can do with gdb
, but it’s way more powerful than just this, it’s also way too much stuff for me to cover in this little section of the article, so hopefully this has left you with a basic understanding on what is possible.
Final thoughts
I’ve just barely scraped the surface of many of the tools showcased in this blog, but hopefully I’ve left you with the knowledge of some new tools you can experiment with and hopefully incorporate into your toolkit!
I’d love to hear some suggestions on tools I’m not aware of myself or any general feedback on any of these topics!
Happy coding!