The ability to automatically create plausible objects
Machine Learning
The system's ability to automatically interpret data and predict outcomes
NLP
The system's ability to understand, analyze, and interpret human languages
Data Collection
Gathering suitable data for subsequent labeling
100 000
text descriptions
5 weeks
duration
Case Description
The dataset was collected on the "Toloka" platform by Roman Kutsev as part of his personal project in 2019 - "Impressor"
User would send any photo in the chat, and within 5 minutes, a bot described its first impression of the person in the photo
The dataset contains 100,000 textual descriptions of people's photos. Each text fragment was checked for grammatical errors, insults, profanity, and other ethical violations