Welcome to the inaugural edition of the Mend.io Open-Source Reliability Leaderboard! Powered by data from Renovate Bot, Mend.io’s wildly popular open-source dependency management tool, the Leaderboard presents the top packages in terms of reliability across three of the most widely used languages.
We built the Leaderboard for several reasons, starting with the risk imposed by our increasingly vulnerable software supply chain. The ongoing rise in cyberattacks that target the software supply chain, coupled with a shifting regulatory landscape, highlights the growing urgency of building secure applications.
We also wanted a different lens through which to view application security. While existing technologies like software composition analysis (SCA) and static application security testing (SAST) are vital for detecting and remediating problems, little has been done to build a more holistic strategy of preventing, or at least preparing, for problems. The need to consider a more holistic strategy is akin to adopting fitness and healthy living routines as a way to avoid longer term health problems.
Successful implementation of the strategy hinges on having access to the knowledge necessary to prevent possible open-source vulnerabilities from ever being installed in the first place. For that to happen, companies need to know not only what packages are in use at their companies, but how safe they are. This is becoming more important at larger companies, as we see enterprise customers increasingly take this approach by standardizing on a pre-curated selection of reliable open-source code packages.
And finally, we wanted to leverage and share a valuable resource. The Mend.io team knows that there is no better arbiter of package reliability than Renovate, which has gathered crowd-sourced data on over 25 million dependency updates.
While evaluating software reliability is a challenge for any development program, the world of open source software adds additional hurdles in the form of variances in how open-source code is created, distributed, and supported. While software reliability should naturally be considered when selecting software components, the reality is that ‘should be’ does not always translate to actually doing so. Therefore, tapping Renovate’s rich trove of data to create some reliability rankings seemed like a worthy project.
By analyzing what packages are consistently releasing good updates, we can arrive at an accurate picture of the package’s overall reliability for software engineers trying to balance functional risk with security risk.
Like any data-driven project, selecting filtering criteria proved to be a complex and nuanced process. What languages should we evaluate? Did we want to rank the reliability of packages that were updated individually? Doing so would omit packages that were updated as part of a group, an increasingly common practice. Should we filter by major and minor releases? Major versions are by the nature of the update more apt to cause dependency trouble downstream. The criteria we ultimately settled on produced a pretty comprehensive profile for npm, PyPi, and Maven. However, we also wanted to give a tl;dnr option for curious readers who are strapped for time. The shorthand version is below—a top 25 list aggregated from both our individual and group rankings. You can find more granular details in the General Findings section.
Data was pulled for 2022 across three languages: npm, Maven, and PyPi. Keep in mind that the test results are user tests, and sometimes users write bad tests. That’s not related to the package.
As such, there will nearly always be some level of failed tests for every package. Nobody scores 100 percent at scale, because users write bad tests.
After quite a bit of discussion, the team employed the following filters to build the tables:
Non-grouped (individual) updates and grouped updates were analyzed separately. While most updates are individual, it is increasingly common to run automated batch updates. With that in mind, we felt it was worthwhile to see what packages performed reliably in groups as well as individually. We were also looking for All-Star packages—those that ranked well in both groups.
Minor updates only. By this, we mean that the previous version and the updated version have the same major semantic version number. Because updates to a different major version are intended to cause breaking changes, we did not include those.
Sourced from reliable repos. We excluded repos deemed somewhat iffy due to consistently failing tests.
Tests run after a prior successful run. This was done to avoid counting a failure that was introduced prior to the current update.
Number of versions. With fewer than three releases, the success data for a given package varies too wildly to be useful. So we limited it to packages with at least three releases.
Package popularity. We defined this as the top packages in each language used by Renovate users. Because the data sets varied considerably by language, we chose what we considered to be reasonable cut-off filters for each language, which are noted in the pertinent section. When possible, we also presented separate leaderboards on what we call the Titans: packages for which Renovate has recommended more than 10,000 minor version updates.
The team created Reliability Leaderboards for the following categories in each of the three programming languages:
We had some general predictions going into this, and while some proved correct, we also busted.
Prediction: Group runs bring down overall package reliability.
Any fan of the TV show Survivor can tell you that in competition, groups are often hurt by their weakest link. The same holds true when it comes to group updates. A group of ten packages is ten times more likely to encounter a failure.
Prediction: Frequent releases improve average success rates.
You would think frequent releases would correlate to better reliability through faster bug fixes and an engaged maintainer community, but nope! Release frequency had no effect at all on how reliably a package updated. Maybe those teams that take more time between releases are doing better testing.
Looking across the categories, the most reliable packages for each language are the following.
There are always more questions than answers, but that doesn’t stop us from asking them. We hope to come up with data-driven answers to some of these in the next issue of the Open-Source Reliability Leaderboard.
Why do some packages update well individually and fall off the group update chart?
Yes, group runs bring everything down, but that’s just one aspect. We also wonder about interdependencies in a group—that is, would a package have a better success rate with a different group of packages?
Why are some packages more reliable in a group update?
Some packages need to be updated at the same time as others, so will be more likely to fail (or even destined to fail) if upgraded alone. We also wonder whether some good team players are playing on good teams. That is, a package is being updated with other reliable players.
How can we use this to improve group updates? We know that the way packages are grouped likely affect the success rate at which some are updated—which means that people need to be more intentional about what goes into group updates. Knowing which packages don’t perform well in a group allows companies to to improve the groups already in use by updating the trouble makers individually. Used in conjunction with automated merge confidence tools, this could prove helpful when planning group updates and provide visibility into the dependency update gap. Data like this will allow people to create larger groups that will succeed with higher confidence, which means that companies will spend less time processing updates. Bottom line? Improving the quality of group updates helps an organization improve application security.
Note: Some of these packages are not typically used as software dependencies, but instead are used as pipeline tools. Pipeline tools will run after the testing phase of a build has succeeded. So naturally, the failure rate would be close to zero. In a way, it’s cheating a bit, so we thought it worth noting.
Color Coding Explained
In the following charts, these colors denote the following:
Package appears in Individual Champion list
Package appears in Team Player list
Package is ranked in both lists
For npm, the filter was limited to the top 1,000 packages. The lint and nestjs communities did well, both individually and in group updates.
For Maven, the filter is limited by the number of updates. We were happy to see strong representation from Google, Apache, and SpringFramework packages. Indeed, Apache ended up with four packages on the All-Star list. It’s reassuring to see that the big players are producing safe updates.
For PyPi, the filter is limited by the number of updates.
When it comes to preventive application security, one of the smartest things companies can do is to proactively update open source dependencies to reduce the application attack surface and reduce possible problems from out-of-date libraries if emergency updates are needed.
Granted, there will always be a need to balance new development efforts with security requirements. But if done right, this preventive approach won’t negatively impact developers’ workload and may even free up development resources. We recommend the following:
Automate dependency management. Organizations should aim to put in place an automated dependency management routine that checks open source dependencies consistently, flags issues, and assists in the remediation process. A good example is the Smart Merge Control feature within Mend SCA. Mend’s Smart Merge Control is essentially like autopilot for component updates. It can examine dependencies within a project and batch only the updates that have a high confidence level that they will pass build tests and not break the build.
Smart Merge Control provides a high degree of automation, including identifying all the high confidence updates, generating the associated pull requests, and then merging them, all automatically but with ultimate developer oversight and control.
Assess confidence levels. Any time an organization updates open-source components for newer, potentially more secure versions, they risk functional problems with existing applications. That’s where confidence levels come in. Mend’s Merge Confidence ratings levels, available as a standard feature in Mend SCA, assess the likelihood of whether updating a given component will negatively impact application functionality or cause other issues. Mend.io’s Merge Confidence levels are based on peer crowd-sourced data from over 25 million dependency updates tracked by Mend Renovate. Developers can merge updated components with high confidence levels with assurance that they are unlikely to break the build. And when a component has a lower confidence level, it can flag to the developers that extra work may be required to merge it, so they can plan appropriately.
Create batch updates. Most open source development projects are getting increasingly complex, with more and more components. That means checking and updating dependencies tends to take more time and effort. That’s why the Batch Update capability in Mend SCA is especially important. Mend’s Smart Merge Control Batch Update functionality provides a way for developers to batch update (typically high confidence updates) into a single collection that can be applied all at once. Even though it’s not complex work, manually generating 10, 20, or 50 pull requests for components that need updating can be time consuming and boring. Mend.io’s Batch Update functionality eliminates the need to do that manually and helps automate the update process.
Justin Clareburt is the Product Owner for Renovate at Mend.io. He has been building software solutions since last century, and most recently for Microsoft, Google, and Amazon. Justin is passionate about developer productivity and is renowned for his love of keyboard shortcuts. He is an avid supporter of open-source development, and is responsible for many free popular productivity tools and keyboard shortcut packs.
Rhys Arkins is Vice President of Product Management, responsible for developer solutions at Mend.io. He was the founder of Renovate Bot – an automated tool for software dependency updating, which was acquired by Mend.io in 2019. Rhys is particularly fond of automation and a firm believer in never sending humans to do a machine’s job.
A veteran of Computerworld and CIO magazine, Hildebrand is an award-winning technology writer who writes extensively about cybersecurity and how it impacts business innovation.
Mend.io, formerly known as WhiteSource, has over a decade of experience helping global organizations build world-class AppSec programs that reduce risk and accelerate development—using tools built into the technologies that software and security teams already love. Our automated technology protects organizations from supply chain and malicious package attacks, vulnerabilities in open source and custom code, and open-source license risks. With a proven track record of successfully meeting complex and large-scale application security needs, Mend.io is the go-to technology for the world’s most demanding development and security teams. The company has more than 1,000 customers, including 25 percent of the Fortune 100, and manages Renovate, the open source automated dependency update project. For more information, visit www.mend.io, the Mend.io blog, and Mend.io on LinkedIn and Twitter.