Deep Embedding for Spatial Role Labeling

by   Oswaldo Ludwig, et al.

This paper introduces the visually informed embedding of word (VIEW), a continuous vector representation for a word extracted from a deep neural model trained using the Microsoft COCO data set to forecast the spatial arrangements between visual objects, given a textual description. The model is composed of a deep multilayer perceptron (MLP) stacked on the top of a Long Short Term Memory (LSTM) network, the latter being preceded by an embedding layer. The VIEW is applied to transferring multimodal background knowledge to Spatial Role Labeling (SpRL) algorithms, which recognize spatial relations between objects mentioned in the text. This work also contributes with a new method to select complementary features and a fine-tuning method for MLP that improves the F1 measure in classifying the words into spatial roles. The VIEW is evaluated with the Task 3 of SemEval-2013 benchmark data set, SpaceEval.


Argument Labeling of Explicit Discourse Relations using LSTM Neural Networks

Argument labeling of explicit discourse relations is a challenging task....

Lipreading with Long Short-Term Memory

Lipreading, i.e. speech recognition from visual-only recordings of a spe...

Image Captioning with Object Detection and Localization

Automatically generating a natural language description of an image is a...

LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling

Semantic labeling of RGB-D scenes is crucial to many intelligent applica...

A Model for Spatial Outlier Detection Based on Weighted Neighborhood Relationship

Spatial outliers are used to discover inconsistent objects producing imp...

Please sign up or login with your details

Forgot password? Click here to reset