Searching Scenes by Abstracting Things
In this paper we propose to represent a scene as an abstraction of 'things'. We start from 'things' as generated by modern object proposals, and we investigate their immediately observable properties: position, size, aspect ratio and color, and those only. Where the recent successes and excitement of the field lie in object identification, we represent the scene composition independent of object identities. We make three contributions in this work. First, we study simple observable properties of 'things', and call it things syntax. Second, we propose translating the things syntax in linguistic abstract statements and study their descriptive effect to retrieve scenes. Thirdly, we propose querying of scenes with abstract block illustrations and study their effectiveness to discriminate among different types of scenes. The benefit of abstract statements and block illustrations is that we generate them directly from the images, without any learning beforehand as in the standard attribute learning. Surprisingly, we show that even though we use the simplest of features from 'things' layout and no learning at all, we can still retrieve scenes reasonably well.
READ FULL TEXT