Large vision-language models (VLMs), such as CLIP, learn rich joint
imag...
The grokking phenomenon as reported by Power et al. ( arXiv:2201.02177 )...
The application of zero-shot learning in computer vision has been
revolu...
The conceptual blending of two signals is a semantic task that may under...