Debugging with gdb - Fixing a NULL Pointer Dereference in dhcpcd

About the Project

Several tutorials exist on how to leverage the GNU Debugger (GDB) to debug misbehaving applications. However, a majority of these blogs just show commands to run that poke at memory addresses, and don’t show the process of resolving said bug. This blog post will walk through how I recently identified, tried to fix, and ultimately reported a bug in dhcpcd 10.0.1 via gdb.

Identifying The Issue

The command line utility coredumpctl is used to interact with the coredumps saved by the systemd-coredump service. Simply executing the command coredumpctl will show recent segfaults along with corresponding metadata about said segfault. The image below shows a series of personal applications identified by a.out followed by the dhcpcd entry. The metadata associated with this segfault shows the user id, the command line arguments, time stamps and other information to allow the end user to gain context to the environment.

00_coredump_ctl

The command coredumpctl gdb will drop us into a gdb shell with the corefile that was generated during the segfault. Per gdb documentation, A core file (or coredump) is:

a file that records the memory image of a running process and its process status (register values etc.). Its primary use is post-mortem debugging of a program that crashed while it ran outside a debugger. A program that crashes automatically produces a core file, unless this feature is disabled by the user.

Next, executing the gdb command bt (short for backtrace) will show the series of function calls (the stack trace) that occurred prior to the segfault. This information is also available in the image above, but when debugging with gdb its helpful to have this command in your back pocket to identify which function has called the function you’re currently debugging.

01_coredump_bt

The image above shows that function ps_root_unlink is the last function called before the segfault occurred. This means something in this function occurred that caused the crash. To try and recreate the issue, the gdb command r can be used to re-run the binary and see if the crash occurs again. In some situations its worth setting break points at a given function (b function_name_here) to inspect variables passed to said function.

02_coredump_recreate

Huzzah! The bug can be recreated, it appears that running dhcpcd as a non-root user, in dhcpcd version 10.0.1 causes a segfault. Now that the bug has been recreated, further root cause analysis can be done to resolve the issue.

-ggdb, Your New Favorite GCC Flag

Still in the gdb shell, next we’ll execute disassemble ps_root_unlink, and be presented with just a wall of x64-x86 assembly. Oh boy…lots of offsets from rsp referencing local variables, but what are these variables? How can we make this process easier? Introducing -ggdb!

03_coredump_disassembly

The gcc flag -ggdb produces additional debugging information meant to be used by gdb for debugging purposes. Given that dhcpcd is an Open Source project, the Makefile can be modified and the -ggdbflag can be added.

....truncated....
 12 CFLAGS+=  -g -ggdb -Wall -Wextra -Wundef
....truncated....

Identifying the Bug!

After compiling dhcpcd with the additional debugging information, the binary can be re-executed to cause the segfault and force a coredump to be generated. Now, executing coredumpctl gdb the function and associated local variables are shown in the backtrace.

04_with_debugsymbols These variables can then be explored with the p command.

06_error_accessing_struct.png

The image above shows some of the values being referenced from this dhcpcd struct containing a NULL value (0x0). With the information gained so far, it appears when running as a non-root user a struct does not get populated as expected, and a NULL pointer dereference occurs leading to our segfault.

NULL pointer dereferences are sometimes associated with CVEs depending on the severity/implications of the bug. In this case, a non-privileged user can cause the dhcpcd daemon to crash. That’s not great, but is it CVE worthy? I don’t know. Either way, it’s a bug and bugs should be fixed.

As a rudimentary fix, I checked if whether or not the uid of the user executing the dhcpcd binary as root (uid 0). If they were not, a negative value would be returned.

ps_root_unlink(struct dhcpcd_ctx *ctx, const char *file)
{
+   if (getuid() != 0) {
+	    return -1;
+   }

	if (ps_sendcmd(ctx, PS_ROOT_FD(ctx), PS_UNLINK, 0,
	    file, strlen(file) + 1) == -1)

This did lead to resolving the segfault when a non-root user executed the dhcpcd binary as shown in the image below. However, this was fix was rejected by the maintainers as a much better fix would be to simply check if the value was NULL. That’s the nice thing about Open Source development, even though the my fix was wrong, the bug was resolved and I walked away with a better understanding of how to contribute in the future.

07_segfault_fixed

Conclusion

Knowing how to re-compile, debug, and troubleshoot strange behaving programs is an important skill in just about any part of the computing field. GDB is a powerful tool, but comes with a bit of a learning curve. My hope is that you walk away from this blog with a practical understanding of how gdb can be used to resolve bugs. Thank you for reading, if you found this helpful, please share!