A Lapse From Model-Centric To Data-Centric AI

What is Data-Centric AI? And How it Helps Data-Driven Businesses?

Many AI algorithms with varying degrees of sophistication have been developed to address a variety of problems (eg. ResNet50, Inception, VGG16, etc. for image classification). Along with that, many methodologies have been developed to further finetune the model, such as regularization, cross-validation, etc. However, these techniques are built to focus on the technical side of problem-solving. So, the emphasis lies on the coding part of the problem.

Characterizing the Aspects of High-Quality Data

For steeper insights, we want refined and high-quality data, but how do we define it, and what are the aspects of quality maintenance?


The data should be well defined. There should be clear guidelines and definitions for annotation and labeling. This could require inputs from multiple labelers and subject matter experts. For example, consider the following object detection problem. In the below figure, two lions are labeled very differently. Both ways are correct. However, the lack of a clear definition (how to label when there is another object in the foreground) led to different annotations. In more complex problems, this can be counterproductive. Therefore, it is essential to have clear guidelines.


Information such as time of creation, source, etc. are also important to determine the kind of data that is to be used. This helps us determine the principles on which the AI solution should be built. The abilityto select data precisely can be beneficial while dealing with data drift and updating the model.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


Affine is a provider of analytics solutions, working with global organizations solving their strategic and day to day business problems www.affineanalytics.com