Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

Liu, Xuejing; Li, Liang; Wang, Shuhui; Zha, Zheng-Jun; Meng, Dechao; Huang, Qingming

Computer Science > Computer Vision and Pattern Recognition

arXiv:1908.10568 (cs)

[Submitted on 28 Aug 2019]

Title:Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

Authors:Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Dechao Meng, Qingming Huang

View PDF

Abstract:Weakly supervised referring expression grounding aims at localizing the referential object in an image according to the linguistic query, where the mapping between the referential object and query is unknown in the training stage. To address this problem, we propose a novel end-to-end adaptive reconstruction network (ARN). It builds the correspondence between image region proposal and query in an adaptive manner: adaptive grounding and collaborative reconstruction. Specifically, we first extract the subject, location and context features to represent the proposals and the query respectively. Then, we design the adaptive grounding module to compute the matching score between each proposal and query by a hierarchical attention model. Finally, based on attention score and proposal features, we reconstruct the input query with a collaborative loss of language reconstruction loss, adaptive reconstruction loss, and attribute classification loss. This adaptive mechanism helps our model to alleviate the variance of different referring expressions. Experiments on four large-scale datasets show ARN outperforms existing state-of-the-art methods by a large margin. Qualitative results demonstrate that the proposed ARN can better handle the situation where multiple objects of a particular category situated together.

Comments:	Accepted by ICCV 2019
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1908.10568 [cs.CV]
	(or arXiv:1908.10568v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1908.10568

Submission history

From: Xuejing Liu [view email]
[v1] Wed, 28 Aug 2019 06:49:54 UTC (5,129 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators