Large-Scale Video Classification with Feature Space Augmentation coupled with Learned Label Relations and Ensembling

09/21/2018
by   Choongyeun Cho, et al.
0

This paper presents the Axon AI's solution to the 2nd YouTube-8M Video Understanding Challenge, achieving the final global average precision (GAP) of 88.733 the model size constraint), and 87.287 requirement. Two sets of 7 individual models belonging to 3 different families were trained separately. Then, the inference results on a training data were aggregated from these multiple models and fed to train a compact model that meets the model size requirement. In order to further improve performance we explored and employed data over/sub-sampling in feature space, an additional regularization term during training exploiting label relationship, and learned weights for ensembling different individual models.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset