Achieving Pro‑Level Vision Detection with Minimal Cost: Fine‑Tuning Amazon Nova Lite
By fine‑tuning Amazon Nova Lite 1.0 on Amazon Bedrock, the study demonstrates how a small training dataset can dramatically improve instruction following and reduce detection boxes—up to 92% fewer—while achieving Pro‑grade accuracy in aerial group detection and low‑light monitoring, all at a fraction of the cost.
Amazon Nova Lite 1.0 provides strong cost‑performance for generic vision tasks but does not reliably follow complex instructions required in specialized scenarios.
Background
The study fine‑tunes Nova Lite on two real‑world use cases: (1) aerial‑view group detection, where users prefer a few large bounding boxes instead of many fine‑grained ones, and (2) low‑light monitoring, where high‑confidence detection is required to reduce false alarms.
Case Study 1 – Aerial‑View Group Detection
Problem definition : In dense object layouts, customers want fewer, larger boxes to highlight key regions, improving user experience, system performance, and API cost.
Instruction‑following method : The prompt includes a control clause – “if there are many (more than 10) target objects in the image, ONLY OUTPUT FEW BIG bounding boxes to show the areas or group of target objects.”
Your task is to perform object detection on input images, identifying all relevant elements based on the given descriptions of target objects, and output the results in the required JSON format.
if there are many (more than 10) target objects in the image, ONLY OUTPUT FEW BIG bounding boxes to show the areas or group of target objectsBaseline performance : The un‑fine‑tuned Nova Lite model outputs >60 detection boxes on average (input token cost $0.00006 vs $0.0008 for Nova Pro), increasing inference latency and token cost.
Statistical comparison shows Nova Pro follows the control instruction, while Lite continues to produce >90 boxes per image.
Fine‑Tuning Design
A custom fine‑tuned model (referred to as Custom‑Model ) was trained on a small dataset (≈10–50 images) using Nova Pro as a teacher for label generation. The pipeline: data annotation → create JSONL dataset → launch Bedrock fine‑tuning job → evaluate.
uv run generate_labels_by_llm.py \
--num_threads=1 \
--upload_to_s3 \
--model_id=us.amazon.nova-pro-v1:0Training job creation example:
uv run create_nova_ft_job.py \
--jsonl ../data/train/train_data_algae_and_cars_20250821.jsonl \
--job-name drones-nova-lite-with-all-job \
--custom-model-name drones-nova-lite-with-allResults
Detection box count reduced by 92 % (average 91.47 → 7.04) when using keyword‑based prompts.
Without keywords, reduction was 49 % (94.17 → 47.68).
Custom‑Model is highly sensitive to the keyword prompt (7.04 vs 47.68), whereas Nova Lite remains at ≈91‑94 boxes regardless of prompt.
Distribution analysis shows Custom‑Model outputs cluster in the low‑value region, while Nova Lite consistently produces high‑value outliers.
Case Study 2 – Low‑Light Detection Optimization
In low‑light surveillance, false positives erode operator trust. The goal is to enforce a high‑confidence threshold (≥95 %) and suppress detections below that level.
HIGH CONFIDENCE REQUIREMENT: Only detect objects with 95%+ certainty. If confidence is insufficient for any detected object, set "object_count": 0, "objects": []Fine‑tuned Nova Lite learned to obey this rule, eliminating low‑confidence false alarms.
Data Preparation
Images were labeled using Nova Pro, then converted to the Bedrock JSONL schema:
{
"schemaVersion": "bedrock-conversation-2024",
"system": [{"text": "You are a smart assistant that can detect objects in images."}],
"messages": [{
"role": "user",
"content": [{"text": "Your task is to perform object detection on input images, identifying all relevant elements based on the given descriptions of target objects, and output the results in the required JSON format."}]
}]
}Training & Deployment
Training on ~50 images took 90–300 minutes and cost roughly $0.02 in tokens. After fine‑tuning, the model was deployed on‑demand via Bedrock; storage cost is $1.95 / month and inference pricing matches the base Lite model.
Technical Issue Troubleshooting
Several samples failed because they were mislabeled as JPEG while actually being MPO format, causing an “Invalid input error.” Converting all images to proper JPEG resolved the issue.
Invalid input error: train data problematic samples: [8, 9, ...].
Sample 8 - ('messages', 0, 'content', 1, 'image'): Image is not of type JPEGCost Analysis
Fine‑tuning Nova Lite: $0.002 per 1 000 tokens.
Small dataset (~10 K tokens) total training cost ≈ $0.02.
Storage: $1.95 / month for all fine‑tuned models.
Inference pricing identical to the base Nova Lite model.
The experiments demonstrate that targeted fine‑tuning of Amazon Nova Lite can achieve Pro‑grade instruction compliance and detection accuracy while preserving the model’s low‑cost advantage, making it a practical solution for cost‑sensitive computer‑vision workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Amazon Cloud Developers
Official technical community of Amazon Cloud. Shares practical AI/ML, big data, database, modern app development, IoT content, offers comprehensive learning resources, hosts regular developer events, and continuously empowers developers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
