Data Science Quality Assurance Kickoff

What is Data Science? Well in simple terms you could say it is the “Studying of Data”. Nowadays Data Science has become a prominent subject since it plays a major role in assisting industries with predictions and analysis which has become handy when making their business decisions.

Some of the actual use cases of Data science can be listed as follows: Internet Search, Recommendation Systems, Targeted Advertising, Image Recognition, Speech Recognition, Gaming, Airline Route Planning, Fraud and Risk detection and etc. And there are many more other applications that are basically run on the top of concept “Data Science”.

Now let’s jump-off to the point where Quality assurance comes to the play. Assuring quality of Data Science Solutions is not going to be an easy task, since the testing strategy can get vary based on the provided Data Science Solution. Unfortunately, we do not have much resources available to refer when it comes to the Quality Assurance of Data Science. With the available information we have, we can split the process of Data Science Application Testing in to three main components as below:

Let’s take one by one and clarify each component further.

Data Validation:

This component is to validate that the data is not corrupted and is accurate. To ensure the Data Validation component is achieved, below are few techniques you could follow;

  • Validating the schema of the record
  • Validating the Data Source
  • Validating the Data format
  • Validate duplicate records
  • Verify Non-existence of empty records

Process Validation:

This component can be simply named as Business Logic Validation, where you could verify the business
logic within the system node by node and then verify it against different nodes.

Output Validation:

In this component, you could validate the end result which is the processed data against the expected result. For this there are multiple techniques that can be used. Jacquard Index, Jacquard Distance, Precision & Recall and Area under the Curve are few of the basic techniques among them.

As a Quality Assurance engineer and a beginner in Data Science field what is our commitment?

  • Keep in your mind that the Testing life cycle and the QA Phases can be applied as it is for the Data Science test applications. QA Practices and standards will be the same no matter what domain, what field, what application it is.
  • Get the requirement and the given solution from the responsible party and analyze each and every point of the design where you could involve with testing. These testing points could fall under any components Data Validation, Process Validation or Output Validation.
  • Come up with your own testing approach for those identified points. Decide whether your approach should be manual or automated. And train yourself with the required knowledge materials accordingly. Ex: If your testing must be automated and you have to involve with scripting, expertise yourself with the relevant scripting language.

It is true that we have lack of resources related to Data Science Quality Assurance. And Data Science testing is going to be a complicated task since your testing strategy going to get vary with each requirement. But once you initiate the task, you could always start publishing your own resources for Data Science QA. And adapting to the changing nature of Data Science will be a snap to you.

Happy Data Science Testing!!!

References:

  1. https://www.analyticsvidhya.com/blog/2015/09/applications-data-science/
  2. https://www.cabotsolutions.com/2017/12/big-data-testing-how-to-overcome-quality-challenges

Image Courtesy: freepik.com/@starline

Elaine Nanayakkara

Senior QA Engineer 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.