Artificial Intelligence 20 min read

Achieving Pro‑Level Vision Detection with Minimal Cost: Fine‑Tuning Amazon Nova Lite

By fine‑tuning Amazon Nova Lite 1.0 on Amazon Bedrock, the study demonstrates how a small training dataset can dramatically improve instruction following and reduce detection boxes—up to 92% fewer—while achieving Pro‑grade accuracy in aerial group detection and low‑light monitoring, all at a fraction of the cost.

Amazon Cloud Developers

Apr 1, 2026

Achieving Pro‑Level Vision Detection with Minimal Cost: Fine‑Tuning Amazon Nova Lite

Amazon Nova Lite 1.0 provides strong cost‑performance for generic vision tasks but does not reliably follow complex instructions required in specialized scenarios.

Background

The study fine‑tunes Nova Lite on two real‑world use cases: (1) aerial‑view group detection, where users prefer a few large bounding boxes instead of many fine‑grained ones, and (2) low‑light monitoring, where high‑confidence detection is required to reduce false alarms.

Case Study 1 – Aerial‑View Group Detection

Problem definition : In dense object layouts, customers want fewer, larger boxes to highlight key regions, improving user experience, system performance, and API cost.

Instruction‑following method : The prompt includes a control clause – “if there are many (more than 10) target objects in the image, ONLY OUTPUT FEW BIG bounding boxes to show the areas or group of target objects.”

Your task is to perform object detection on input images, identifying all relevant elements based on the given descriptions of target objects, and output the results in the required JSON format.
if there are many (more than 10) target objects in the image, ONLY OUTPUT FEW BIG bounding boxes to show the areas or group of target objects

Baseline performance : The un‑fine‑tuned Nova Lite model outputs >60 detection boxes on average (input token cost $0.00006 vs $0.0008 for Nova Pro), increasing inference latency and token cost.

Statistical comparison shows Nova Pro follows the control instruction, while Lite continues to produce >90 boxes per image.

Fine‑Tuning Design

A custom fine‑tuned model (referred to as Custom‑Model ) was trained on a small dataset (≈10–50 images) using Nova Pro as a teacher for label generation. The pipeline: data annotation → create JSONL dataset → launch Bedrock fine‑tuning job → evaluate.

uv run generate_labels_by_llm.py \
  --num_threads=1 \
  --upload_to_s3 \
  --model_id=us.amazon.nova-pro-v1:0

Training job creation example:

uv run create_nova_ft_job.py \
  --jsonl ../data/train/train_data_algae_and_cars_20250821.jsonl \
  --job-name drones-nova-lite-with-all-job \
  --custom-model-name drones-nova-lite-with-all

Results

Detection box count reduced by 92 % (average 91.47 → 7.04) when using keyword‑based prompts.

Without keywords, reduction was 49 % (94.17 → 47.68).

Custom‑Model is highly sensitive to the keyword prompt (7.04 vs 47.68), whereas Nova Lite remains at ≈91‑94 boxes regardless of prompt.

Distribution analysis shows Custom‑Model outputs cluster in the low‑value region, while Nova Lite consistently produces high‑value outliers.

Case Study 2 – Low‑Light Detection Optimization

In low‑light surveillance, false positives erode operator trust. The goal is to enforce a high‑confidence threshold (≥95 %) and suppress detections below that level.

HIGH CONFIDENCE REQUIREMENT: Only detect objects with 95%+ certainty. If confidence is insufficient for any detected object, set "object_count": 0, "objects": []

Fine‑tuned Nova Lite learned to obey this rule, eliminating low‑confidence false alarms.

Data Preparation

Images were labeled using Nova Pro, then converted to the Bedrock JSONL schema:

{
  "schemaVersion": "bedrock-conversation-2024",
  "system": [{"text": "You are a smart assistant that can detect objects in images."}],
  "messages": [{
    "role": "user",
    "content": [{"text": "Your task is to perform object detection on input images, identifying all relevant elements based on the given descriptions of target objects, and output the results in the required JSON format."}]
  }]
}

Training & Deployment

Training on ~50 images took 90–300 minutes and cost roughly $0.02 in tokens. After fine‑tuning, the model was deployed on‑demand via Bedrock; storage cost is $1.95 / month and inference pricing matches the base Lite model.

Technical Issue Troubleshooting

Several samples failed because they were mislabeled as JPEG while actually being MPO format, causing an “Invalid input error.” Converting all images to proper JPEG resolved the issue.

Invalid input error: train data problematic samples: [8, 9, ...].
Sample 8 - ('messages', 0, 'content', 1, 'image'): Image is not of type JPEG

Cost Analysis

Fine‑tuning Nova Lite: $0.002 per 1 000 tokens.

Small dataset (~10 K tokens) total training cost ≈ $0.02.

Storage: $1.95 / month for all fine‑tuned models.

Inference pricing identical to the base Nova Lite model.

The experiments demonstrate that targeted fine‑tuning of Amazon Nova Lite can achieve Pro‑grade instruction compliance and detection accuracy while preserving the model’s low‑cost advantage, making it a practical solution for cost‑sensitive computer‑vision workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Computer Vision Object Detection fine-tuning Cost Efficiency Amazon Bedrock Amazon Nova Lite

Written by

Amazon Cloud Developers

Official technical community of Amazon Cloud. Shares practical AI/ML, big data, database, modern app development, IoT content, offers comprehensive learning resources, hosts regular developer events, and continuously empowers developers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.