Attend More Times for Image Captioning

Du, Jiajun; Qin, Yu; Lu, Hongtao; Zhang, Yonghua

Computer Science > Computer Vision and Pattern Recognition

arXiv:1812.03283 (cs)

[Submitted on 8 Dec 2018 (v1), last revised 11 Feb 2019 (this version, v2)]

Title:Attend More Times for Image Captioning

Authors:Jiajun Du, Yu Qin, Hongtao Lu, Yonghua Zhang

View PDF

Abstract:Most attention-based image captioning models attend to the image once per word. However, attending once per word is rigid and is easy to miss some information. Attending more times can adjust the attention position, find the missing information back and avoid generating the wrong word. In this paper, we show that attending more times per word can gain improvements in the image captioning task, without increasing the number of parameters. We propose a flexible two-LSTM merge model to make it convenient to encode more attentions than words. Our captioning model uses two LSTMs to encode the word sequence and the attention sequence respectively. The information of the two LSTMs and the image feature are combined to predict the next word. Experiments on the MSCOCO caption dataset show that our method outperforms the state-of-the-art. Using bottom up features and self-critical training method, our method gets BLEU-4, METEOR, ROUGE-L, CIDEr and SPICE scores of 0.381, 0.283, 0.580, 1.261 and 0.220 on the Karpathy test split.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1812.03283 [cs.CV]
	(or arXiv:1812.03283v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1812.03283

Submission history

From: Jiajun Du [view email]
[v1] Sat, 8 Dec 2018 08:23:33 UTC (854 KB)
[v2] Mon, 11 Feb 2019 12:10:54 UTC (854 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2018-12

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Jiajun Du
Yu Qin
Hongtao Lu
Yonghua Zhang

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Attend More Times for Image Captioning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Attend More Times for Image Captioning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators