Abstract
How can a machine learn to recognize visual attributes emerging out of online
community without a definitive supervised dataset? This paper proposes an automatic
approach to discover and analyze visual attributes from a noisy collection of image-text
data on the Web. Our approach is based on the relationship between attributes and
neural activations in the deep network. We characterize the visual property of the
attribute word as a divergence within weakly-annotated set of images. We show that
the neural activations are useful for discovering and learning a classifier that
well agrees with human perception from the noisy real-world Web data. The empirical
study suggests the layered structure of the deep neural networks also gives us
insights into the perceptual depth of the given word. Finally, we demonstrate
that we can utilize highly-activating neurons for finding semantically relevant regions.
BibTeX
@inproceedings{VittayakornECCV2016,
title = {Automatic Attribute Discovery with Neural Activations},
author = {Sirion Vittayakorn and Takayuki Umeda and Kazuhiko Murasaki and Kyoko Sudo and Takayuki Okatani and Kota Yamaguchi},
year = {2016},
booktitle = {European Conference on Computer Vision (ECCV)}
}
Download
Dataset | Description |
Etsy dataset | Product metadata from Etsy, such as title, description, tags, materials, or image URLs of 2.8 million product listings sold in Sep 2014. |
Wear dataset | Post metadata from Wear.jp, such as description, tags, item list, or image URLs of 212K posts collected in Oct 2015. |