Text-to-image synthesis aims to generate a photo-realistic and semantic
...
Co-saliency detection (CoSOD) aims at discovering the repetitive salient...
Recently, the cross-modal pre-training task has been a hotspot because o...
Image-Text Matching is one major task in cross-modal information process...