By Anuradha M on Mar 1, 2018 5:27:22 AM
In the digitally transformed world, as businesses continue to grow, the pressure on data scientists to deliver workable models, in accelerated time is immense. In a typical scenario, when a valuable insight into the data has been seen, data scientists have to make this production ready, i.e. utilize this data through an organization and integrate it into the business process. However, it is difficult for data scientists to predict accurately how algorithms and models will perform in production, keeping in mind the fact that the conditions surrounding legacy data may not be applicable or even pertinent in evolving times.
Most of us are already familiar with DevOps, a practice that embraces collaboration of IT operations with software development resulting in faster pace for going to market. An offshoot of DevOps, DataOps is a buzzword making its rounds in the world of data science. DataOps is designed to do away with roadblocks when developing or deploying data-intensive applications like the predictive models build by data scientists. Gartner has defined DataOps as "The hub for collecting and distributing data, with a mandate to provide controlled access to systems of record for customer and marketing performance data, while protecting privacy, usage restrictions and data integrity".
DataOps maximizes process change and organizational realignment to smoothen data management by everyone who has access/handles data. DataOps connects the dots between data collection and preparation. It calls for a democratized atmosphere where data infrastructure is centralized and available to all stakeholders. It requires crossing organizational and cultural barriers that separate data and people and bringing the two data audiences together. On one side we have the data operators or people who are responsible for infrastructure, security, etc., and on the other side we have the actual consumers of data, the people responsible for using data to drive change, such as data scientists. What DataOps does is, it brings these two audiences together eliminating friction points. DataOps focuses on governance, operations, delivery, data transformation and version control. In a complex technology landscape of legacy systems and cloud solutions, DataOps will help to leverage the right technology for the right solutions, and reduce friction.
To make this model work it is important to not treat data scientists as separate from the end product. They become part of the team to analyze, question and identify the datasets that need to be analyzed. This information can then be handed over to the data team. There are a lot of considerations to made when implementing a DataOps model.
Already, there are several DataOps models making the rounds and organizations are exploring this methodology.
To summarize, DataOps will help to ensure that models which perform well in a lab, perform the same way in production.