Tagged articles
1 articles
Page 1 of 1
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 25, 2017 · Artificial Intelligence

How Hierarchical Multimodal LSTM Boosts Image Captioning Accuracy

This article reviews an ICCV paper introducing a hierarchical multimodal LSTM that jointly embeds images, phrases, and whole sentences, enabling detailed image descriptions and superior performance on Flickr30K, MS‑COCO, and region‑phrase datasets compared to previous methods.

Image CaptioningMultimodal Learningcomputer vision
0 likes · 8 min read
How Hierarchical Multimodal LSTM Boosts Image Captioning Accuracy