Why do we collect ML Bugs?

In several cases, when we are working with programs, we identified bugs. These bugs are related to the functionality, the performance, or the security of the program, and they compromise its reliability.

When we are testing a normal program, we focus on the input/output behaviour. We create an input, we have an expected output, and we check whether the output that the program generates is the expected one. Although some behaviours are specific for specific programs such as concurrency or network traces, this testing strategy covers the majority of them. But in machine learning, this is not enough.

When we are testing a machine learning system, we need to have two components into account: the machine and the learner. The machine is the software that we are using to make everything work. The learner is the simulated intelligence behind the system. Testing the first is relatively easy, or at least it can be performed with some of the current testing tools. The only difference is that we need to adapt the tools to the structure that all these systems normally use, called machine learning pipelines. The question now is how we test the learner. Machine learning has its own mechanisms to evaluate whether a model or learner is learning correctly. We have multiple metrics such as the accuracy to evaluate individual cases. But we also need to evaluate a whole scenario where multiple individual cases will follow a specific pattern. This can be noise, an adversary, or it can be an error in the data that we are collecting.

Having to test these two components made us ask whether there are potential bugs in the middle of them. Is it possible that the connexion between the learner and the machine is broken and creates unexpected behaviours in specific circumstances? This led us to identify blind spots in the main APIs. Our initial steps were in R, where we identified several bugs, some of them were exploitable vulnerabilities. Then, we moved to Python, And we applied a similar strategy to identify new bugs and vulnerabilities in Python libraries. Our database collects these bugs to give you the option of reviewing your code fast. We also aim to report these bugs, although you can evaluate them in advance from the moment we add them to the database. Finding these bugs earlier in your production system will save a significant amount of money for your company, and it will reduce the drama of having someone else to find the bug in your system.

25 Comments