EML-NET:An Expandable Multi-Layer NETwork for Saliency Prediction
In this work, we apply state-of-the-art Convolutional Neural Network(CNN) architectures for saliency prediction. Our results show that better saliency features can be delivered by a deeper CNN model. However, it is very space-consuming to apply a complex model due to the large size of input images. The space complexity becomes even more problematic when we extract features from multiple convolutional layers or different models. In this paper, we propose a modular saliency system which aims at splitting the whole network into small modules. The main difference in our approach s that the encoder and decoder can be separately trained for the scalability. Furthermore, the encoder can contain more than one CNN model to extract features and the models can have different architectures or pre-trained on different datasets. This parallel design allows us to better utilize the computational space in order to apply more powerful encoder. More importantly, our network can be easily expanded almost without extra spaces, other pre-trained CNN models can be combined for a wider range of visual knowledge. We denote our expandable multi-layer network as EML-NET in this paper. Our method is evaluated on three public saliency benchmarks, SALICON, MIT300 and CAT2000. The proposed EML-NET achieves state-of-the-art results on the metric of Normalized Scanpath Saliency using a modified loss function.
READ FULL TEXT