Uncovering Nuances in Data-led QA for AI/ML Applications

QA for AI/ML applications requires a different approach when compared to traditional applications. Unlike the latter that has set business rules with defined outputs, the continuously evolving nature of AI models makes their outcomes ambiguous and unpredictable.  QA methodologies need to adapt to this complexity and overcome issues relating to comprehensive scenario coverage, lack of security, privacy, and trust. 

How to test AI and ML applications?

The standard approach to AI model creation, also known as the cross-industry standard process for data mining (CRISP-DM), starts with data acquisition, preparation, and cleansing. The resulting data is then used on multiple model approaches iteratively before finalizing the perfect model. Testing this model starts by using a subset of data that has undergone the process outlined earlier. By inputting this data (test data) into the model, multiple combinations of hyperparameters or variations are run on the model to understand its correctness or accuracy, ably supported by appropriate metrics. 

Groups of such test data are generated randomly from the original data set and applied to the model. Very similar to the new data simulation approach, this process dictates how the AI model will scale in the future with accuracy.

Also Read: How to adopt the right testing strategies to assure the quality of AI/ML-based models

Challenges in data-led QA for AI/ML applications

The data-led testing and QA for AI/ML applications outlined above suffer from myriad issues, some of which are given below.

Explainability

The decision-making algorithms of  AI models have always been perceived to be black boxes. Of late, there is a strong move towards making them transparent by explaining how the model has arrived at a set of outcomes based on a set of inputs. It helps understand and improve model performance and helps recipients grasp the model behavior. This is even more paramount in complaint-heavy areas like insurance or health care systems. Multiple countries have also started mandating that along with the AI model, there needs to be an explanation set on the decisions made.

Post facto analysis is key to addressing explainability. By retrospectively analyzing specific instances misclassified by an AI model, data scientists understand the part of the data set that the model actively focused on to arrive at its decision. On similar lines, positively classified findings are also analyzed.

Combining both helps to understand the relative contribution made by each data set and how the model stresses specific attribute classes to create its decision. It further enables data scientists to reach out to domain experts and evaluate the need to change data quality to get more variation across sensitive variables and understand the need to re-engineer the decision-making parameter set used by the model. In short, the data science process itself is being changed to incorporate explainability.

You may also like: 5 points to evaluate before adopting AI in your organization

Bias

Decision-making ability of an AI model hinges to a large extent on the quality of data that it’s exposed to. Numerous instances show seepage of biases into the input data or how the models are streamed, like Facebook’s gender discriminatory Ads or Amazon’s AI-based automated recruiting system that showed discrimination against women.

The historical data that Amazon used for its system was heavily skewed on account of male domination across its workforce and the tech industry over a decade. Even large models like open AI or codepilot suffer from the percolation of world biases into their models since they are trained on global data sets that are themselves biased. While removing biases, it’s sufficient to understand what has gone into data selection and the feature sets that contribute to decision-making.

Detecting bias in a model mandates evaluating and identifying those attributes that excessively influence the model compared to other attributes. Attributes so unearthed are then tested to see if they represent all available data points. 

Security

According to Deloitte’s State of AI in the Enterprise survey, 62% of respondents view cyber security risks as a significant concern while adopting AI. ‘The Emergence Of Offensive AI’ report from Forrester Consulting found that 88% of decision-makers in the security industry believe offensive AI is coming.

Since AI models themselves are built on the principle of becoming smarter with each iteration of real-life data, attacks on such systems also tend to become smarter. The matter is further complicated by the rise of adversarial hackers whose goal is to target AI models by modifying a simple aspect of input data, even to the extent of a pixel in an image. Such small changes can potentially bring out more significant perturbations in the model, leading to misclassifications and erroneous outcomes.

The starting point for overcoming such security issues is to understand the type of attacks and vulnerabilities in the model that hackers can exploit. Gathering literature on such kinds of attacks and domain knowledge to create a repository that can predict such attacks in the future is critical. Adopting AI-based cyber security systems is an effective technique to thwart hacking attempts since the AI-based system can predict hacker responses very similar to how it predicts other outcomes.

Privacy

With the increased uptake of privacy concerns like GDPR, CCPA across all applications and data systems, AI models have also come under the scanner. More so because AI systems depend heavily on large volumes of real-time data for intelligent decisions – data that can reveal a tremendous amount of information about a person’s demographic, behavior and consumption attributes, at the minimum. 

The AI model in question needs to be audited to evaluate how it leaks information to address privacy concerns. A privacy-aware AI model takes adequate measures to deanonymize, pseudonymize or use cutting-edge technology for differential privacy. By analyzing how privacy attackers get access to input training data from the model and reverse engineer effectively to get access to PII (Personally Identifiable Information), the model can be evaluated for privacy leakage. A two-stage process of detecting the inferable training data by inference attacks and then identifying the presence of PII in the data can help identify privacy concerns when the model is deployed.  

Want to know more? Read: Best practices for test data management in an increasingly digital world

Ensuring accuracy in QA for AI/ML applications

Accurate testing of AI-based applications calls for extending the notion of QA beyond the confines of performance, reliability, and stability to newer dimensions of explainability, security, bias, and privacy. The international standards community has also embraced this notion by expanding the conventional ISO 25010 standard to include the aforementioned facets. As AI/ML model development progresses, focus across all these facets will lead to better performing, continuously learning, a compliant model with the ability to generate far more accurate and realistic results.

Need help? Ensure seamless performance and functionality for your intelligent application. Call us now

Adopt the Right Testing Strategies for AI/ML Applications

The adoption of systems based on Artificial Intelligence (AI) and Machine Learning (ML) has seen an exponential rise in the past few years and is expected to continue to do so. As per the forecast by Markets and Markets, the global AI market size will grow from USD 58.3 billion in 2021 to USD 309.6 billion by 2026, at a CAGR of 39.7% during the aforementioned forecast period. In a recent Algorithmia Survey, 71% of respondents mentioned an increase in budgets for AI/ML initiatives. Some organizations are even looking at doubling their investments in these areas. With the sporadic growth in these applications, the QA practices and testing strategies for AI/ML applications models also need to keep pace.

An ML model life-cycle involves multiple steps. The first is training the model based on a set of feature sets. The second involves deploying the model, assessing model performance, and modifying the model constantly to make more accurate predictions. This is different from the traditional applications, where the model’s outcome is not necessarily an accurate number but can be right depending on the feature sets used for its training. The ML engine is built on certain predictive outcomes from datasets and focuses on constant refining based on real-life data. Further, since it’s impossible to get all possible data for a model, using a small percentage of data to generalize results for the larger picture is paramount.

Since ML systems have their architecture steeped in constant change, traditional QA techniques need to be replaced with those focusing on taking the following nuances into the picture.

The QA approach in ML

Traditional QA approaches require a subject matter expert to understand possible use case scenarios and outcomes. These instances across modules and applications are documented in the real world, which makes it easier for test case creation. Here the emphasis is more on understanding the functionality and behavior of the application under test. Further, automated tools that draw from databases enable the rapid creation of test cases with synthesized data. In a Machine Learning (ML) world, the focus is mainly on the decision made by the model and understanding the various scenarios/data that could have led to that decision. This calls for an in-depth understanding of the possible outcomes that lead to a conclusion and knowledge of data science.

Secondly, the data that is available for creating a Machine Learning model is a subset of the real-world data. Hence, there is a need for the model to be re-engineered consistently through real data. A rigor of manual follow-up is necessary once the model is deployed in order to enhance the model’s prediction capabilities continuously. This also helps to overcome trust issues within the model as the decision would have been taken through human intervention in real life. QA focus needs to be more in this direction so that the model is closer to real-world accuracy.

Finally, business acceptance testing in a traditional QA approach involves the creation of an executable module and being tested in production. This traditional QA approach is more predictable as the same set of scenarios continue to be tested until a new addition is made to the application. However, the scenario is different with ML engines. Business acceptance testing, in such cases, should be seen as an integral part of refining the model to improve its accuracy, using real-world usage of the model. 

The different phases of QA

Three phases characterize every machine learning model creation:

The QA focus, be it functional or non-functional, is applied to the ML engine across these 3 phases.

  • Data pipeline: The quality of input data sets has a significant role in the ability to predict a Machine Learning system. The success of an ML model lies in the testing data pipelines which ensure clean and accurate data availability through big data and analytics techniques.
  • Model building: Measuring the effectiveness of a model is very different from traditional techniques. Out of a specified number of datasets available, 70-80% is used in training the model, while the remaining is used in validating & testing the model. Therefore, the accuracy of the model is based on the accuracy shown on the smaller of datasets. Ensuring that the data sets used for validating & testing the model are representative of the real-world scenario is essential. It shouldn’t come to pass that the model, when pushed into production, will fail for a particular category that has not been represented either in the training or the testing data sets. There is a strong need to ensure equitable distribution and representation in the data sets.
  • Deployment: Since the all-round coverage of scenarios determines the accuracy of an ML model and the ability to do that in real life is limited, the system cannot be expected to be performance-ready in one go. A host of tests need to be done to the system like candidate testing; A/B testing to ensure that the system is working correctly and can ease into a real-life environment. The concept of a sweat drift becomes valid here whereby we arrive at a measure of time by when the model starts behaving reliably. During this time, the QA person needs to manage data samples and validate model behavior appropriately. The tool landscape that supports this phase is still in an evolving stage.

The QA approaches need to emphasize the following for ensuring the development and deployment of a robust ML engine.

Fairness:

The ideal ML model should be nonjudgmental and fair. Since it depends largely on learning based on data received from real-life scenarios, there is a strong chance that the model will be biased if it gets data from a particular category/feature set. For example, if a chatbot with learning ability through ML engine is made live and receives many inputs that are racist, the datasets that are being received for learning by ML engine are heavily skewed towards racism. The feedback loops that power many of these models ensure that racist bias comes into the ML engine. There have been instances of such chatbots being pulled down after noticeable differences in their behavior.

In a financial context, the same can be extended to biases being developed by the model receiving too many loan approval requests from a particular category of requestors, as an example. Adequate efforts need to be made to remove these biases while aggregating or slicing and dicing these datasets and adding them to the ML engine.

One approach that’s commonly followed to remove the bias that can creep into a model is by building another model (an adversary) that understands the potential of bias from the list of various parameters and incorporates that bias within itself. By frequently moving back and forth between these two models with the availability of real-life data, the possibility of a model that removes the bias becomes higher.

Security:

Many ML models are finding widespread adoption across industries and are already beginning to be used in critical real-life situations. The ML model development is very different from that adopted for software development. It is more error-prone on account of loopholes that can cause malleable attacks and a higher propensity to err on the wrong side on account of erroneous input data.

Many of these models do not start from scratch. They are built atop pre-existing models through transfer and learning methods. If created by a malicious actor, these transfer learning models have every possible way of corrupting the purpose of the model. Further, even after the model goes into production, malicious intent data being fed into the model can change the prediction generated by the model.

In conclusion, assuring the quality of AL/ML-based models and engines needs a fundamentally different approach from traditional testing. It needs to be continuously changing to focus on the data being fed into the system and on which predictive outcomes are made. Continuous testing, focusing on the quality of data, the ability to affect the predictive outcome, and remove biases in prediction is the answer.

(This blog was originally published in EuroStar)

Artificial Intelligence (AI) and Its Impact on Software Testing

Enterprises impacted by ‘Digital Disruption‘ are forced to innovate on the go, while delighting customers and increasing operational efficiency. As a result, software development teams who are used to time-consuming development cycles do not have the luxury of time any longer. Delivery times are decreasing, but technical complexity is increasing with emphasis on user experience!

Continuous Testing has been able to somewhat cope with the rigorous software development cycles, but keeping in mind the rapid speed with which innovation is transforming the digital world, it might just fall short. What is therefore needed is the ability to deliver world-class user experiences, while maintaining delivery momentum and not compromising on technical complexity. To meet the challenges of accelerated delivery and technical complexity requires test engineers to test smarter instead of harder.

So what has all this got to do with Artificial Intelligence (AI)?

The fact is, AI and software testing were never discussed together. However, AI can play an important role in testing and it has already begun transforming testing as a function and helping development teams to identify bug-fixes early, assess, and correct code faster than ever before. Using Test Analytics, AI-powered systems could generate Predictive Analytics – to identify specific areas of the software most likely to break.

Before delving into AI-based software testing, it might be good to understand what AI actually means. Forrester defines AI as “A system, built through coding, business rules, and increasingly self-learning capabilities, that is able to supplement human cognition and activities and interacts with humans natural, but also understands the environment, solves human problems, and performs human tasks.”

Related: Improved time to market and maximized business impact with minimal schedule variance and business risk

AI is providing the canvas for software testing but its uses have to be defined by testers. Some engineers have already tested their imagination and they use AI to simplify test management by creating test cases automatically. They know that AI could help to reduce the level of effort (LOE) while ensuring adherence to built-in standards.

AI could also help to generate code-less test automation, which would create and run tests automatically on a web or mobile application. AI-based testing could identify the ‘missing requirement’ from the Requirements document, based on bug-requirement maps.

Machine learning bots are capable of helping with testing especially with end-user experience taking the front seat in testing. When trying to understand the role of bots in software testing, we need to bear in mind the fact that most applications have some similarities, i.e. size of a screen, shopping carts, search boxes, and so forth. Bots can be trained to be specialists in a particular area of an app. AI bots can manage tens of thousands of test cases when compared to regression testing which can handle much lesser numbers. It is this ability of AI testing that elevates its importance in the DevOps age where iteration happens on-the-go.

To summarize, while bots do some of the routine stuff, testers can focus on more complex tasks, taking the monotony out of testing and replacing it with the word ‘exciting’.

Learn more about Trigent’s automation testing services.

Read Other Blog on Artificial Intelligence: 

The Impact of Artificial Intelligence on the Healthcare Industry

Exit mobile version