Building a Python Voice Chatbot with Baidu AI Speech Recognition and Qingyunke
This tutorial explains how to create a Python voice chatbot by recording audio, converting speech to text with Baidu AI, sending the text to the Qingyunke chatbot API for a response, and finally synthesizing the reply back into speech using pyttsx3.
In recent days I needed a small Python program that enables a human to have a voice conversation with an intelligent robot using a smart dialogue interface. The overall workflow consists of four steps: capture voice input, convert the audio to text, call an AI dialogue API to obtain a textual response, and finally convert that response back to speech.
Required environment : install the following Python packages:
pip install pyaudio – for recording and generating WAV files
pip install baidu-aip – Baidu AI SDK for speech‑to‑text
pip install pyttsx3 – text‑to‑speech synthesis
Recording voice and saving as WAV :
import time
import wave
from pyaudio import PyAudio, paInt16
framerate = 16000 # sampling rate
num_samples = 2000 # frames per buffer
channels = 1 # mono
sampwidth = 2 # 2 bytes per sample
FILEPATH = '../voices/myvoices.wav' # ensure directory exists
class Speak():
def save_wave_file(self, filepath, data):
wf = wave.open(filepath, 'wb')
wf.setnchannels(channels)
wf.setsampwidth(sampwidth)
wf.setframerate(framerate)
wf.writeframes(b''.join(data))
wf.close()
def my_record(self):
pa = PyAudio()
stream = pa.open(format=paInt16, channels=channels, rate=framerate, input=True, frames_per_buffer=num_samples)
my_buf = []
t = time.time()
print('正在讲话...')
while time.time() < t + 5: # record for 5 seconds
string_audio_data = stream.read(num_samples)
my_buf.append(string_audio_data)
print('讲话结束')
self.save_wave_file(FILEPATH, my_buf)
stream.close()Speech‑to‑text with Baidu AI (you need APP_ID, API_KEY and SECRET_KEY from the Baidu AI console):
from aip import AipSpeech
APP_ID = '25990397'
API_KEY = 'iS91n0uEOujkMIlsOTLxiVOc'
SECRET_KEY = '' # fill with your secret key
client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)
class ReadWav():
def get_file_content(self, filePath):
with open(filePath, 'rb') as fp:
return fp.read()
def predict(self):
return client.asr(self.get_file_content('../voices/myvoices.wav'), 'wav', 16000, {'dev_pid': 1537})
readWav = ReadWav()
print(readWav.predict()) # example output contains the recognized textChatbot interaction using Qingyunke (free GET‑based API):
def talkWithRobot(msg):
url = 'http://api.qingyunke.com/api.php?key=free&appid=0&msg={}'.format(urllib.parse.quote(msg))
html = requests.get(url)
return html.json()["content"]
print(talkWithRobot("你好呀!")) # → 哟~ 都好都好Text‑to‑speech synthesis with pyttsx3 :
import pyttsx3
class RobotSay():
def __init__(self):
self.engine = pyttsx3.init()
self.rate = self.engine.getProperty('rate')
self.engine.setProperty('rate', self.rate - 50)
def say(self, msg):
self.engine.say(msg)
self.engine.runAndWait()
robotSay = RobotSay()
robotSay.say("你好呀") # plays the spoken textPutting everything together into an interactive loop:
robotSay = RobotSay()
speak = Speak()
readTalk = ReadWav()
while True:
speak.my_record() # record voice
text = readTalk.predict()['result'][0] # speech‑to‑text
print("本人说:", text)
response_dialogue = talkWithRobot(text) # chatbot reply
print("青云客说:", response_dialogue)
robotSay.say(response_dialogue) # speak the replyThe program continuously records the user’s speech, converts it to text via Baidu AI, obtains a chatbot reply from Qingyunke, and reads the reply aloud. Sample console output demonstrates a back‑and‑forth conversation.
Future work includes adding a graphical user interface and extending functionality.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.