Spambase datasets
Text datasets of emails of different formats for training a neural network to identify spam and classify messages
Machine Learning
Enables computer systems to automatically learn from data and make predictions
обучение алгоритмов распознавать ситуации, способные причинить вред
The ability of a system to understand, analyze and interpret human's languages
Data Collection
Gathering data for subsequent annotation
Cases’ description
Content and format:
Collection metrics:
.csv file with message text
jpg/png screenshots
10,000 messages
15 days
The dataset consists of a wide range of spam messages, including promotional offers, fraudulent schemes and phishing attempts
SMS spam in English
Content and format:
.csv file with message text (title, text, type)
jpg/png screenshots
Collection metrics:
15,000 messages
20 days
English, Spanish, French, German, Polish, Czech
Email spam in European languages
The dataset consists of a set of emails divided into two main classes: “spam" and “not spam". E-mails with a length of 50 to 7,500 characters are written in different languages, designed in colloquial and official speech styles
Content and format:
.csv file with message text
jpg/png screenshots
Collection metrics:
10 000 messages
12 days
Russian spam SMS
The dataset contains examples of unsolicited text messages, which includes promotional mailings, viral links, microfinance offers and other fraudulent schemes
AI solutions for your business
LLM training to recognize different spam formats, generate, rewrite and perform any other actions on request based on spam texts

Spam protection in chat applications: NLP to improve the spam filtering system in chats and prevent unwanted messages, advertisements or malicious links, as well as to increase protection and security
Phishing Protection: Classification for recognizing phishing emails and preventing users from interacting with them
Preventing text spam in comments: NLP for detecting and blocking spam in comments and ensuring safety and comfort when using mobile applications
Optimization of marketing campaigns: Classification for automatic filtering of unwanted or fraudulent requests from users and improving the quality and accuracy of marketing campaigns
See other datasets
Scope of work
Markup complexity
Markup quality
Our data quality guarantee is 95%. When ordering markup with quality above 95%, we offer enterprise solutions
Tell us about your project!