Artificial Intelligence 19 min read

How Amazon Nova’s Multimodal Embedding Model Handles All Modalities in One Go

Amazon Nova, a new multimodal embedding model now available on Amazon Bedrock, unifies text, document, image, video, and audio into a single semantic space, offering up to 8000‑token context, multiple output dimensions, and detailed Python examples for embedding generation, storage, and cross‑modal search.

Amazon Cloud Developers

Oct 29, 2025

How Amazon Nova’s Multimodal Embedding Model Handles All Modalities in One Go

Amazon Nova is a multimodal foundation model launched on Amazon Bedrock that provides a single embedding service for text, documents, images, video, and audio. By mapping all modalities into a unified semantic space, it enables cross‑modal retrieval, semantic search, and Retrieval‑Augmented Generation (RAG) scenarios.

The model supports up to 8000 tokens of text context and can process 200 languages. It offers four output dimensions (3072, 1024, 384, 256) via Matryoshka Representation Learning, allowing users to balance representation detail against storage and compute costs.

Performance evaluation shows that the out‑of‑the‑box accuracy of Amazon Nova is leading among comparable models, as illustrated in the benchmark table (included in the original article). The model also provides chunking capabilities to split long texts, videos, or audio into manageable segments for embedding.

Basic text embedding example (Python, Boto3) :

import json
import boto3
MODEL_ID = "amazon.nova-2-multimodal-embeddings-v1:0"
EMBEDDING_DIMENSION = 3072
bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")
text = "Amazon Nova is a multimodal foundation model"
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "text": {"truncationMode": "END", "value": text}
    }
}
response = bedrock_runtime.invoke_model(
    body=json.dumps(request_body),
    modelId=MODEL_ID,
    contentType="application/json"
)
embedding = json.loads(response["body"].read())["embeddings"][0]["embedding"]
print(f"Generated embedding with {len(embedding)} dimensions")

Image embedding example (the image is read, base64‑encoded, and sent to the same endpoint with an image payload).

with open("photo.jpg", "rb") as f:
    image_bytes = base64.b64encode(f.read()).decode("utf-8")
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "image": {"format": "jpeg", "source": {"bytes": image_bytes}}
    }
}
response = bedrock_runtime.invoke_model(body=json.dumps(request_body), modelId=MODEL_ID, contentType="application/json")

Video (and audio) embedding requires the asynchronous API because files larger than 25 MB must be processed asynchronously. The workflow uploads the video to an S3 bucket, starts an async job with

SEGMENTED_EMBEDDING**, and polls for completion.</p>
<pre><code>S3_VIDEO_URI = "s3://my-video-bucket/videos/presentation.mp4"
S3_EMBEDDING_DESTINATION_URI = "s3://my-video-bucket/embeddings-output/"
model_input = {
    "taskType": "SEGMENTED_EMBEDDING",
    "segmentedEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": EMBEDDING_DIMENSION,
        "video": {
            "format": "mp4",
            "embeddingMode": "AUDIO_VIDEO_COMBINED",
            "source": {"s3Location": {"uri": S3_VIDEO_URI}},
            "segmentationConfig": {"durationSeconds": 15}
        }
    }
}
response = bedrock_runtime.start_async_invoke(
    modelId=MODEL_ID,
    modelInput=model_input,
    outputDataConfig={"s3OutputDataConfig": {"s3Uri": S3_EMBEDDING_DESTINATION_URI}}
)
# poll until status != "InProgress"

After embeddings are generated, they can be stored in a vector database. The article demonstrates using Amazon S3 Vectors to create a vector bucket and index, then bulk‑load embeddings for three sample texts.

VECTOR_BUCKET = "my-vector-store"
INDEX_NAME = "embeddings"
# create bucket and index if needed
s3vectors = boto3.client("s3vectors", region_name="us-east-1")
# ... (bucket/index creation omitted for brevity)
texts = ["Machine learning on AWS", "Amazon Bedrock provides foundation models", "S3 Vectors enables semantic search"]
vectors = []
for text in texts:
    response = bedrock_runtime.invoke_model(... )  # same request as text example
    embedding = json.loads(response["body"].read())["embeddings"][0]["embedding"]
    vectors.append({"key": f"text:{text[:50]}", "data": {"float32": embedding}, "metadata": {"type": "text", "content": text}})
s3vectors.put_vectors(vectorBucketName=VECTOR_BUCKET, indexName=INDEX_NAME, vectors=vectors)

Cross‑modal search is illustrated by generating an embedding for a query string, then using query_vectors to retrieve the top‑5 most similar vectors across all stored modalities, with distance scores and optional metadata displayed.

query_text = "foundation models"
# generate query embedding (same as text example, but purpose = GENERIC_RETRIEVAL)
response = s3vectors.query_vectors(
    vectorBucketName=VECTOR_BUCKET,
    indexName=INDEX_NAME,
    queryVector={"float32": query_embedding},
    topK=5,
    returnDistance=True,
    returnMetadata=True
)
for i, result in enumerate(response["vectors"], 1):
    print(f"{i}. {result['key']} - Distance: {result['distance']:.4f}")
    if result.get("metadata"):
        print(f"   Metadata: {result['metadata']}")

Practical considerations include choosing the output dimension (larger dimensions capture richer semantics but increase storage/computation), handling long contexts (up to 8192 tokens for text, 30‑second chunks for video/audio), and responsible AI features such as content safety filtering and fairness mitigations built into Bedrock. The model is accessible via both synchronous and asynchronous APIs, making it suitable for real‑time search interfaces as well as batch processing of large media files. Amazon Nova is currently available in the US East (N. Virginia) region on Amazon Bedrock; pricing details are on the Bedrock pricing page, and further documentation is provided in the Amazon Nova user guide and the GitHub sample repository.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

vector search cross-modal retrieval Python SDK AWS Bedrock multimodal embeddings Amazon Nova

Written by

Amazon Cloud Developers

Official technical community of Amazon Cloud. Shares practical AI/ML, big data, database, modern app development, IoT content, offers comprehensive learning resources, hosts regular developer events, and continuously empowers developers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.