Artificial Intelligence 8 min read

Building a Multimodal Search with Alibaba Cloud Elasticsearch and Qwen‑VL

This article demonstrates how to integrate Alibaba Cloud Elasticsearch with the Qwen‑VL large model and DashScope Embedding API to extract image features and perform multimodal vector search, covering text‑to‑image, text‑to‑text, image‑to‑image, and image‑to‑text queries, with step‑by‑step code, environment setup, data loading, indexing, and a Streamlit demo.

Alibaba Cloud Big Data AI Platform

May 27, 2026

Building a Multimodal Search with Alibaba Cloud Elasticsearch and Qwen‑VL

Background

In multimodal search, unstructured image and text data are converted to vector representations and retrieved via vector search. This guide combines Alibaba Cloud Elasticsearch (vector database) with Qwen‑VL for image description extraction and DashScope Embedding API for image and text embeddings.

Tools

Elasticsearch (ES): vector database for storing and retrieving embeddings.

Qwen‑VL: extracts image descriptions and keywords.

DashScope Embedding API: converts images and text to vectors.

System Architecture

Prerequisites

Elasticsearch instance version 8.17 or later.

DashScope (Bailei) service enabled and API‑Key obtained.

Python 3.8 or newer.

Environment Preparation

Install required Python packages:

pip install elasticsearch dashscope requests streamlit

Download and unzip the example dataset:

wget https://github.com/milvus-io/pymilvus-assets/releases/download/imagedata/reverse_image_search.zip
unzip -q -o reverse_image_search.zip

Directory layout:

multi_modal_search/
├── reverse_image_search.csv    # dataset CSV
├── train/                     # image files
│   └── *.jpg
├── scripts/
│   ├── write.py               # data ingestion
│   ├── read.py                # query script
│   └── demo.py                # Streamlit front‑end

Ingestion (write.py)

The script extracts a textual description for each image using Qwen‑VL, stores it in the text_input field, then calls the DashScope Embedding API to obtain image_embedding and text_embedding. These vectors are indexed into Elasticsearch. The demo processes only the first 200 images.

# Pseudocode
# 1. Extract description → text_input
# 2. Generate embeddings → image_embedding, text_embedding
# 3. Index document into ES

Run the ingestion:

python3 write.py

Query (read.py)

Supports four query types: text‑to‑image, text‑to‑text, image‑to‑image, and image‑to‑text. The query input (text or image) is sent to the DashScope Embedding API; the resulting embedding is matched against the corresponding field ( image_embedding or text_embedding) in Elasticsearch.

# Pseudocode
# 1. Compute query embedding
# 2. Search ES
# 3. Return most relevant documents

Front‑end Demo

A Streamlit application ( demo.py) provides a web UI to select the search type, enter a text query or upload an image, and view retrieved results. streamlit run demo.py Access the UI at http://localhost:8501.

Operational Steps

Edit configuration parameters (ES host, credentials, DashScope API key) in write.py and read.py.

Run python3 write.py to ingest data.

Optionally verify ingestion with python3 read.py.

Start the front‑end demo with streamlit run demo.py and perform multimodal searches.

Sample Output – Text‑to‑Image

Text‑to‑Image – Search keyword "狮子"
✓ Score: 0.8077 – 一只狮子坐在倒下的树干上，周围是茂密的灌木和树枝
✓ Score: 0.7732 – 雄壮的狮子站在草地上，鬃毛在阳光下威武宁静
✓ Score: 0.7566 – 雄狮特写，鬃毛浓密，眼神锐利

References

[1] Image & Video Understanding – https://help.aliyun.com/zh/model-studio/vision

[2] Multimodal Embedding API – https://help.aliyun.com/zh/model-studio/multimodal-embedding-api-reference

[3] Create Elasticsearch instance – https://help.aliyun.com/zh/es/user-guide/create-an-alibaba-cloud-elasticsearch-cluster

[4] Get DashScope API key – https://help.aliyun.com/zh/model-studio/get-api-key

[5] AI Multimodal Search guide – https://help.aliyun.com/zh/es/user-guide/alibaba-cloud-es-ai-multimodal-search-refined#7fad4790f04xn

[6] Query implementation – https://help.aliyun.com/zh/es/user-guide/alibaba-cloud-es-ai-multimodal-search-refined#3f33fa0c792ux

[7] Full repository – https://help.aliyun.com/zh/es/user-guide/alibaba-cloud-es-ai-multimodal-search-refined

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Elasticsearch Vector Database Multimodal Search Streamlit Qwen-VL DashScope AI Embedding

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.