Contact Us
blank

Task Definition and Specification

Data annotation is a well-established niche in technical fields, serving as the foundation for training neural networks. In recent months, everyone has been discussing LLM and data generation, but at the same time, artificial intelligence technologies are actively advancing in the industrial sector.

Alexey Kornilov Avatar

My name is Alexey Kornilov, and I am a project manager for data collection and annotation at Training Data. I want to talk about how data collection and annotation are used by ML developers in a mining company. About a year ago, I completed a project to prepare datasets with bubble annotation during flotation. It may sound complicated, but let’s break it down step by step.

The ultimate goal of the project was to train a neural network to control and analyze one of the stages of ore concentrate production at the plant. In other words, it was a task aimed at automating and mechanizing manual labor in potentially hazardous conditions for humans.I want to focus on the stages and specifics of such data annotation, the organization of the annotation team/AI trainers, and share insights into working with industrial data. This article will be of interest to project managers, data scientists, ML engineers, and anyone working in data annotation for machine learning tasks.

Task Definition and Specification

To train a neural network, we needed data with two types of annotations: semantic segmentation and tracking. Often, we parse or collect data for annotation, but when it comes to heavy industry, data is provided to us along with a technical specification.
At first glance, the task may seem daunting because not only do we need to annotate each bubble, but we also need to assign a unique ID to each one and ensure we don’t lose track of them in subsequent frames.
But in reality, for AI trainers, it doesn’t matter what needs to be annotated. From a technical standpoint, all segmentation tasks are the same, whether you’re working with people, vegetables, or bubbles. However, this project had its own unique features and complexities that became apparent during the pilot phase (a pilot is a test project to determine metrics and refine the project’s specifications).
So, according to the pilot project’s specifications, we performed two types of tasks:1. Semantic Segmentation: Outlining the contours of each bubble with polygons to determine the characteristics of the bubble foam.2. Object Tracking: Annotating each bubble with a bounding box, assigning a unique ID to each box, and tracking the position of the ID for each bubble across all subsequent frames.

blank

The photos show frames from surveillance cameras located above flotation machines. The images display the flows of numerous bubbles, the sizes and quantities of which indicate the readiness of the ore for the next processing stages. Below is an example of data for annotation. Three streams of bubble movement in the reservoir are highlighted with color: