The customer is developing a smart TV box with an RGB camera that recognizes users' gestures. To improve the quality of gesture recognition, the company asked the Training Data team to collect and annotate 15,000 videos with 14 various gestures. The main challenge was to mine videos that mirror actual users' gestures in real life when a user set up the TV box above the TV.