Turn a Simple Voice Command into Hands‑Free To‑Do Management with Amazon Nova Sonic

This article demonstrates how to build a smart to‑do application that uses Amazon Nova Sonic’s real‑time bidirectional streaming API to enable hands‑free voice interaction, detailing the underlying AI model, AWS serverless architecture, deployment steps, and verification procedures.

Amazon Cloud Developers
Amazon Cloud Developers
Amazon Cloud Developers
Turn a Simple Voice Command into Hands‑Free To‑Do Management with Amazon Nova Sonic

For decades graphical user interfaces have dominated, but users now expect to converse directly with applications. Amazon Nova Sonic, an advanced foundation model on Amazon Bedrock, provides a low‑latency, streaming API for natural two‑way voice dialogue, allowing applications to move beyond mouse‑keyboard interactions.

Voice‑Centric Smart To‑Do App Example

The article uses a Smart Todo App to illustrate how voice can become the core interaction mode, converting traditional task management into a hands‑free conversational flow. Voice interaction is presented as a universal modality that complements, not replaces, existing UI controls and enhances accessibility.

Nova Sonic Capabilities

Nova Sonic can handle multi‑step workflows, invoke backend tools, and maintain context across multiple turns. It recognizes user intent, calls the required APIs, and returns confirmation without any form filling.

Bidirectional Streaming API Workflow

The streaming session is initiated with InvokeModelWithBidirectionalStream. The process includes:

Session start: client sends a sessionStart event with model parameters (e.g., temperature, topP).

Prompt and content type: client indicates whether subsequent data are audio, text, or tool input.

Audio streaming: microphone audio is sent as base64‑encoded audioInput events.

Model response: asynchronously streams back ASR results, tool‑call commands, textual replies, and audio output.

Session end: client sends contentEnd, promptEnd and sessionEnd events.

Solution Architecture

The solution adopts a serverless model with a React single‑page front end and a containerized backend API. Core AWS services include:

Amazon Bedrock (Nova Sonic model)

Amazon CloudFront (CDN for the React app)

Amazon Fargate for Amazon ECS (runs the WebSocket and REST API services)

Application Load Balancer (routes /api and /novasonic traffic)

Amazon VPC, NAT Gateway, WAF, Cognito, DynamoDB, and S3

These services collaborate to provide low‑latency, bidirectional streaming for voice interactions.

Deployment Prerequisites

AWS account with appropriate permissions (least‑privilege principle)

Docker Engine installed locally

AWS CLI configured with admin credentials

Node.js (≥20.x) and npm

Amazon Nova Sonic enabled in Bedrock

Deployment Steps

Clone the repository:

git clone https://github.com/aws-samples/sample-amazon-q-developer-vibe-coded-projects.git
cd NovaSonicVoiceAssistant

Run the first‑time deployment script: npm run deploy:first-time This script installs dependencies, builds Docker images, bootstraps and synthesizes the CDK stack, updates Cognito environment variables, rebuilds the UI, and finally deploys the infrastructure.

Validate the deployment by accessing the CloudFront URL shown in the CDK output, creating a user via the registration page, and testing voice commands.

To clean up, remove the stack:

# move to the infra folder
cd infra
# destroy the AWS stack
npm run destroy

Verification

After deployment, users can log in, grant microphone access, and issue voice commands such as “Add a note reminding me to follow up on the project charter” or “Archive all completed tasks.” The app processes these commands end‑to‑end, updating notes and task status without manual interaction.

Conclusion

Voice interaction is more than an accessibility add‑on; it is becoming a core modality for complex business workflows. The demonstrated solution shows how Amazon Nova Sonic can be integrated into a full‑stack, serverless application to achieve efficient, hands‑free task management.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

serverlessReActAWSspeech interactionCDKvoice AIAmazon Nova Sonic
Amazon Cloud Developers
Written by

Amazon Cloud Developers

Official technical community of Amazon Cloud. Shares practical AI/ML, big data, database, modern app development, IoT content, offers comprehensive learning resources, hosts regular developer events, and continuously empowers developers.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.