We introduce the task of automatic human action co-occurrence identifica...
Joint vision-language models have shown great performance over a diverse...
Recent progress in large language models has enabled the deployment of m...
We present Phenaki, a model capable of realistic video synthesis, given ...
Existing video understanding datasets mostly focus on human interactions...
Large-scale pretrained image-text models have shown incredible zero-shot...
We consider the task of temporal human action localization in lifestyle
...
We aim to automatically identify human action reasons in online videos. ...
Work to date on language-informed video understanding has primarily addr...
Sarcasm is often expressed through several verbal and non-verbal cues, e...
This article presents classifiers based on SVM and Convolutional Neural
...
Computational Humor, as the name implies, studies humor from a computati...
While humor has been historically studied from a psychological, cognitiv...