The PostgresChecker: Hunting Use-after-free Bugs in PostgreSQL (2)
[1] Use-after-free – Problem Description
[2] Static Program Analysis – Approaches to Handle Use-after-frees
[3] Analyzing PostgreSQL – Extracting Information from the Codebase
[4] Implementing a Custom Static Analysis Tool – The PostgresChecker
[5] Applying the Checker – Finding Bugs in PostgreSQL
[6] Conclusio – Lessons Learned
Now that we have looked at why use-after-free bugs are hard to spot and potentially dangerous, the next step is to examine the static analysis tools that can help us find them.
In this blog I am going to present the PostgresChecker: a static analysis tool that is able to detect use-after-free bugs in source code using the PostgreSQL API.
In the last blog post we looked at problems developers might face when working with the C programming language in general and the PostgreSQL API in particular. In this post I am going to look at static analysis tools that can help us identify these kinds of issues so that we can address them. So grab another cup of coffee and let’s dig in.
Static Program Analysis
Tools for static program analysis offer great value because they detect problems early in the development cycle. The earlier bugs are detected, the easier they are to fix. Conversely, bugs detected later in the development process take more time to fix and are therefore more costly.
So what is static program analysis exactly? It refers to analysis conducted on a program’s source code without actually running the program. There are several use cases that can be distinguished.
Basic Compiler Tasks
First, the compiler must perform many basic tasks that constitute static analysis. For example, syntactic checks need to be performed so that the program can be translated correctly. If the developer makes a syntax error, the compiler must provide feedback so the error can be resolved. Another form of basic static analysis is type checking, which, although less critical in weakly typed languages like C, is an important part of compiling strongly typed languages like Java.
Compiler Optimizations
Beyond the basic tasks described above, static analysis also allows the compiler to make our code leaner and more performant. The compiler can, for example, eliminate dead or redundant code and even optimize the allocation of CPU registers. All these optimizations make our programs run faster, which is especially important in the context of databases.
Beyond the Compiler
Static analysis can go even further than what a compiler does. It is able to detect issues such as use-after-free or null pointer dereferences. It does so by working with the parsed code, which in most cases is available as a syntax tree. The tool can then step through the code, scan it for these issues, and give feedback to the developer so they can address the problem.
Static Analysis Tool: Clang
One tool that we already encountered in our last blog post is Clang. Clang is a full-fledged compiler that handles everything from parsing your source code to translating it into machine language. However, it also comes with the Static Analyzer [1], which provides a framework for developing static analysis checks, typically bundled into so-called ‚Checkers‘. A particularly useful example for us is the MallocChecker [2], which provides a wide variety of memory-related static checks including use-after-free, null pointer dereferences, memory leaks, and more.
Below you can see an example of how Clang can be used to analyze the simple ‚Hello Memory‘ program we looked at in our last blog post:
As you can see, we simply pass the ‚–analyze‘ option to Clang and it gives us feedback on our program. It is able to detect the use-after-free, warn us about it, and even identify the function responsible.
However, what Clang detects is the use of the standard C function ‚free‘. It has no notion of any function from the PostgreSQL API, such as our ‚output_simple_statement‘ function from the last blog post. In fact, it does not even know PostgreSQL exists!
Fortunately, the Static Analyzer is highly extensible, and we can provide this information to the checker (either by working with the MallocChecker or by implementing our own static checker). Doing this presents a new challenge: which functions of the PostgreSQL API actually manage memory? There are thousands of functions, and it is not feasible to examine all of them manually. We need to be smart about it. Fortunately, there are tools at our disposal.
Pattern Matching Tool: Coccinelle
Coccinelle [3] was developed to manage changes in the Linux kernel source code. It is able to perform so-called ‚transformations‘ on a C codebase. Imagine you are using a function throughout your codebase and in the next release the signature of that function suddenly changes. With Coccinelle you can programmatically update all calls to that function across the rest of your code. Pretty neat.
Coccinelle achieves this using so-called ’semantic patches‘. Here is an example of a patch that replaces all occurrences of ‚free‘ with the PostgreSQL-native equivalent ‚pfree‘:
These patches always have a header section (lines 1-3) where you can define placeholders. In our example we define a type ‚t‘ and an identifier ‚x‘, which we then use in the body (lines 4-8). There, we specify that when a variable is declared (line 5) and later released with ‚free‘ (line 7), the call should be replaced with ‚pfree‘ (line 8). The ‚-‚ represents code to remove and the ‚+‘ represents the replacement. The ellipsis (…, line 5) tells Coccinelle to ignore any amount of code between the two statements.
The nice thing about these patches is that they resemble C code; they feel native to the language. When it comes to pattern matching, this makes them more accessible to many developers than other tools like grep, while also being more powerful than regular expressions. They are called ’semantic‘ patches because Coccinelle goes ahead and actually parses your code, meaning it understands constructs like loops and branching statements like if. This is very useful when trying to detect more complex patterns.
Using Coccinelle we are able to define patterns and run them against the PostgreSQL source code to extract all the functions we are interested in. How we are going to do that will be discussed in the next blog post in this series.
[1] https://clang-analyzer.llvm.org/
[2] https://clang.llvm.org/doxygen/MallocChecker_8cpp_source.html
[3] https://coccinelle.gitlabpages.inria.fr/website/


