Artificial Intelligence 8 min read

Building a Python Voice Chatbot with Baidu AI Speech Recognition and Qingyunke

This tutorial explains how to create a Python voice chatbot by recording audio, converting speech to text with Baidu AI, sending the text to the Qingyunke chatbot API for a response, and finally synthesizing the reply back into speech using pyttsx3.

Python Programming Learning Circle

Apr 22, 2022

Building a Python Voice Chatbot with Baidu AI Speech Recognition and Qingyunke

In recent days I needed a small Python program that enables a human to have a voice conversation with an intelligent robot using a smart dialogue interface. The overall workflow consists of four steps: capture voice input, convert the audio to text, call an AI dialogue API to obtain a textual response, and finally convert that response back to speech.

Required environment : install the following Python packages:

pip install pyaudio – for recording and generating WAV files

pip install baidu-aip – Baidu AI SDK for speech‑to‑text

pip install pyttsx3 – text‑to‑speech synthesis

Recording voice and saving as WAV :

import time
import wave
from pyaudio import PyAudio, paInt16

framerate = 16000  # sampling rate
num_samples = 2000  # frames per buffer
channels = 1        # mono
sampwidth = 2      # 2 bytes per sample
FILEPATH = '../voices/myvoices.wav'  # ensure directory exists

class Speak():
    def save_wave_file(self, filepath, data):
        wf = wave.open(filepath, 'wb')
        wf.setnchannels(channels)
        wf.setsampwidth(sampwidth)
        wf.setframerate(framerate)
        wf.writeframes(b''.join(data))
        wf.close()

    def my_record(self):
        pa = PyAudio()
        stream = pa.open(format=paInt16, channels=channels, rate=framerate, input=True, frames_per_buffer=num_samples)
        my_buf = []
        t = time.time()
        print('正在讲话...')
        while time.time() < t + 5:  # record for 5 seconds
            string_audio_data = stream.read(num_samples)
            my_buf.append(string_audio_data)
        print('讲话结束')
        self.save_wave_file(FILEPATH, my_buf)
        stream.close()

Speech‑to‑text with Baidu AI (you need APP_ID, API_KEY and SECRET_KEY from the Baidu AI console):

from aip import AipSpeech

APP_ID = '25990397'
API_KEY = 'iS91n0uEOujkMIlsOTLxiVOc'
SECRET_KEY = ''  # fill with your secret key

client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

class ReadWav():
    def get_file_content(self, filePath):
        with open(filePath, 'rb') as fp:
            return fp.read()

    def predict(self):
        return client.asr(self.get_file_content('../voices/myvoices.wav'), 'wav', 16000, {'dev_pid': 1537})

readWav = ReadWav()
print(readWav.predict())  # example output contains the recognized text

Chatbot interaction using Qingyunke (free GET‑based API):

def talkWithRobot(msg):
    url = 'http://api.qingyunke.com/api.php?key=free&appid=0&msg={}'.format(urllib.parse.quote(msg))
    html = requests.get(url)
    return html.json()["content"]

print(talkWithRobot("你好呀!"))  # → 哟~ 都好都好

Text‑to‑speech synthesis with pyttsx3 :

import pyttsx3

class RobotSay():
    def __init__(self):
        self.engine = pyttsx3.init()
        self.rate = self.engine.getProperty('rate')
        self.engine.setProperty('rate', self.rate - 50)

    def say(self, msg):
        self.engine.say(msg)
        self.engine.runAndWait()

robotSay = RobotSay()
robotSay.say("你好呀")  # plays the spoken text

Putting everything together into an interactive loop:

robotSay = RobotSay()
speak = Speak()
readTalk = ReadWav()

while True:
    speak.my_record()                               # record voice
    text = readTalk.predict()['result'][0]            # speech‑to‑text
    print("本人说:", text)
    response_dialogue = talkWithRobot(text)          # chatbot reply
    print("青云客说:", response_dialogue)
    robotSay.say(response_dialogue)                  # speak the reply

The program continuously records the user’s speech, converts it to text via Baidu AI, obtains a chatbot reply from Qingyunke, and reads the reply aloud. Sample console output demonstrates a back‑and‑forth conversation.

Future work includes adding a graphical user interface and extending functionality.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Chatbot Speech Recognition text-to-speech Baidu AI qingyunke

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.