Many Static Application Security Testing (SAST) tools struggle with false positives. They often report that a vulnerability is present, while, in reality, it does not exist. This inaccuracy weighs down the engineering team, as they spend productive hours triaging the false alarms.
By setting a benchmark of false positives — a limit, above which is unacceptable — you can establish a point of reference or standard against which to measure the efficacy of your SAST tool. It will also help you know the extent of false positives you can allow from your security analysis tool.
This blog post discusses how to set a benchmark of false positives for SAST tools.
Performing application security testing is an important way to identify flaws that attackers could use to compromise the application. If a security tool can properly identify vulnerabilities, developers can fix them and thus improve the security of their applications.
SAST tools produce results that are usually grouped into four categories:
The objective of any SAST tool should be to maximize the number of true positives and true negatives while minimizing the false negatives and false positives. However, this is difficult to accomplish from an engineering perspective.
Thus, the engineer needs to make a design decision: Should he design the SAST tool to generate more true positives while also generating more false positives? Or should he dial it down, generating fewer false positives at the expense of missing some true positives?
The most common decision is to lean towards the former design because this is considered to be a “winning” strategy in a sales situation, when a customer tests one product against another. The engineer is betting that the customer is going to simplistically choose the product that produces the most “results” without bothering to examine the validity of those results.
In reality, this might win the sale, but it hurts the customer in the long run. Why? Because a security product that produces too many false positives will overwhelm your developers and make them avoid using the tool, or at least avoid paying much attention to the results the tool produces. And when that happens, your security program suffers. Applications containing vulnerabilities are deployed to production.
To solve this problem, it’s important to set a benchmark of false positives with SAST tools. This is a measure or an agreed amount of false positives that your organization considers acceptable, so that you can avoid wasting a lot of time hunting vulnerabilities that actually do not exist.
A simple way to measure the success of a SAST tool is to subtract its false positive rate from its true positive rate. If you get a perfect accuracy score of 100%, it implies that the true positive rate for the SAST tool is 100%, and the false positive rate is 0%.
Let’s say scanning the vulnerabilities in an application with three different SAST tools generates the following results:
The OWASP Foundation has established a free and open source Benchmark Project that assesses the speed, coverage, and accuracy of automated software vulnerability identification tools.
The Benchmark Project is a sample application seeded with thousands of exploitable vulnerabilities, some true and some false positives. You can run a SAST tool against it and score the results of the tool.
Ideally, the best results for a security tool would be at the upper left corner—indicating minimal false positives and maximum true positives.
We mentioned above that a simple way to measure the success of a SAST tool is to subtract its false positive rate from its true positive rate. But this measure by itself is not adequate because it does not look at other important factors.
Take the example of these two different SAST tools, each of which has been scored against the OWASP project:
Although the simplistically derived accuracy scores for these two different tools are the same, you may strongly prefer one over the other. So we need to introduce some additional metrics, as follows:
Setting appropriate benchmarks for your application testing program needs to be done collaboratively, because different teams have different goals. The security team naturally wants every application to introduce the lowest possible security risk, which means they want security tools that score very high on the completeness scale regardless of the number of false positives they produce. The development team has almost opposite goals. They want to spend their time developing new features, and they don’t want to be slowed down by unproductive work such as dealing with false positives.
Furthermore, your benchmarks might also differ depending on which application you are testing. Some applications might be higher value than others, or more exposed to attack. For these sensitive applications, you might accept higher false positives to obtain higher completeness.
And all of these accuracy scores are just one dimension of a SAST tool. Other important dimensions include how fast the tool runs, how conveniently the results can be consumed by developers, and how easily the tool can be deployed and automated as part of your workflow.
Based on all of these considerations, Mend SAST has proven to be an extremely effective and efficient tool for modern organizations that are striving for both speed and security. If you aren’t yet familiar with Mend SAST, Check it out!