Building a Multimodal Search with Alibaba Cloud Elasticsearch and Qwen‑VL
This article demonstrates how to integrate Alibaba Cloud Elasticsearch with the Qwen‑VL large model and DashScope Embedding API to extract image features and perform multimodal vector search, covering text‑to‑image, text‑to‑text, image‑to‑image, and image‑to‑text queries, with step‑by‑step code, environment setup, data loading, indexing, and a Streamlit demo.
Background
In multimodal search, unstructured image and text data are converted to vector representations and retrieved via vector search. This guide combines Alibaba Cloud Elasticsearch (vector database) with Qwen‑VL for image description extraction and DashScope Embedding API for image and text embeddings.
Tools
Elasticsearch (ES): vector database for storing and retrieving embeddings.
Qwen‑VL: extracts image descriptions and keywords.
DashScope Embedding API: converts images and text to vectors.
System Architecture
Prerequisites
Elasticsearch instance version 8.17 or later.
DashScope (Bailei) service enabled and API‑Key obtained.
Python 3.8 or newer.
Environment Preparation
Install required Python packages:
pip install elasticsearch dashscope requests streamlitDownload and unzip the example dataset:
wget https://github.com/milvus-io/pymilvus-assets/releases/download/imagedata/reverse_image_search.zip
unzip -q -o reverse_image_search.zipDirectory layout:
multi_modal_search/
├── reverse_image_search.csv # dataset CSV
├── train/ # image files
│ └── *.jpg
├── scripts/
│ ├── write.py # data ingestion
│ ├── read.py # query script
│ └── demo.py # Streamlit front‑endIngestion (write.py)
The script extracts a textual description for each image using Qwen‑VL, stores it in the text_input field, then calls the DashScope Embedding API to obtain image_embedding and text_embedding. These vectors are indexed into Elasticsearch. The demo processes only the first 200 images.
# Pseudocode
# 1. Extract description → text_input
# 2. Generate embeddings → image_embedding, text_embedding
# 3. Index document into ESRun the ingestion:
python3 write.pyQuery (read.py)
Supports four query types: text‑to‑image, text‑to‑text, image‑to‑image, and image‑to‑text. The query input (text or image) is sent to the DashScope Embedding API; the resulting embedding is matched against the corresponding field ( image_embedding or text_embedding) in Elasticsearch.
# Pseudocode
# 1. Compute query embedding
# 2. Search ES
# 3. Return most relevant documentsFront‑end Demo
A Streamlit application ( demo.py) provides a web UI to select the search type, enter a text query or upload an image, and view retrieved results. streamlit run demo.py Access the UI at http://localhost:8501.
Operational Steps
Edit configuration parameters (ES host, credentials, DashScope API key) in write.py and read.py.
Run python3 write.py to ingest data.
Optionally verify ingestion with python3 read.py.
Start the front‑end demo with streamlit run demo.py and perform multimodal searches.
Sample Output – Text‑to‑Image
Text‑to‑Image – Search keyword "狮子"
✓ Score: 0.8077 – 一只狮子坐在倒下的树干上,周围是茂密的灌木和树枝
✓ Score: 0.7732 – 雄壮的狮子站在草地上,鬃毛在阳光下威武宁静
✓ Score: 0.7566 – 雄狮特写,鬃毛浓密,眼神锐利References
[1] Image & Video Understanding – https://help.aliyun.com/zh/model-studio/vision
[2] Multimodal Embedding API – https://help.aliyun.com/zh/model-studio/multimodal-embedding-api-reference
[3] Create Elasticsearch instance – https://help.aliyun.com/zh/es/user-guide/create-an-alibaba-cloud-elasticsearch-cluster
[4] Get DashScope API key – https://help.aliyun.com/zh/model-studio/get-api-key
[5] AI Multimodal Search guide – https://help.aliyun.com/zh/es/user-guide/alibaba-cloud-es-ai-multimodal-search-refined#7fad4790f04xn
[6] Query implementation – https://help.aliyun.com/zh/es/user-guide/alibaba-cloud-es-ai-multimodal-search-refined#3f33fa0c792ux
[7] Full repository – https://help.aliyun.com/zh/es/user-guide/alibaba-cloud-es-ai-multimodal-search-refined
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
