Batch Image Translation Demo Using Youdao OCR API with Python
This article demonstrates how to build a Python desktop application that batch‑processes cosmetic product images, sends them to Youdao's OCR translation service, and displays the translated text, covering API preparation, request parameters, signature generation, and full source code.
The author describes a personal need to translate cosmetic product labels for a girlfriend and decides to develop a batch image translation demo using Youdao's AI‑powered OCR translation service.
Effect Demonstration
Several screenshots show successful translation results for different product images, including mixed Korean/English text.
Preparing API Call – Generating App ID and Secret
Instructions explain how to create an instance and application on the Youdao platform to obtain the required appKey and appSecret .
API Interface Introduction
The OCR translation endpoint is https://openapi.youdao.com/ocrtransapi , accessed via POST with form data and JSON response.
API Call Parameters
Field Name
Type
Meaning
Required
Notes
type
text
File upload type
True
Set to 1 for Base64
from
text
Source language
True
Can be "auto"
to
text
Target language
True
Can be "auto"
appKey
text
Application ID
True
Find in application management
salt
text
UUID
True
e.g., 1995882C5064805BC30A39829B779D7B
sign
text
Signature
True
MD5(appKey+q+salt+appSecret)
ext
text
Audio format for result
False
mp3
q
text
Image to recognize
True
Base64 of image when type=1
docType
text
Response type
False
json
render
text
Return rendered image
False
0 (no) or 1 (yes)
nullIsError
text
Return error if no text detected
False
"false" or "true"
Signature generation steps: concatenate appKey , q , salt , and appSecret in that order, then compute the MD5 hash to obtain sign .
Development Process
1. API Interface Overview
The core of the project is calling the Youdao OCR translation API.
2. Detailed Development
The demo consists of three Python files: maindow.py (Tkinter UI), transclass.py (image handling), and pictranslate.py (API wrapper).
UI Code (maindow.py)
<code>root=tk.Tk()
root.title("netease youdao translation test")
frm = tk.Frame(root)
frm.grid(padx='50', pady='50')
btn_get_file = tk.Button(frm, text='选择待翻译图片', command=get_files)
btn_get_file.grid(row=0, column=0, ipadx='3', ipady='3', padx='10', pady='20')
text1 = tk.Text(frm, width='40', height='10')
text1.grid(row=0, column=1)
btn_get_result_path=tk.Button(frm,text='选择翻译结果路径',command=set_result_path)
btn_get_result_path.grid(row=1,column=0)
text2=tk.Text(frm,width='40', height='2')
text2.grid(row=1,column=1)
btn_sure=tk.Button(frm,text="翻译",command=translate_files)
btn_sure.grid(row=2,column=1)
root.mainloop()</code>File selection, result path selection, and translation trigger are implemented with Tkinter dialogs.
<code>def get_files():
files = filedialog.askopenfilenames(filetypes=[('text files', '.jpg')])
translate.file_paths = files
if files:
for file in files:
text1.insert(tk.END, file + '\n')
text1.update()
else:
print('你没有选择任何文件')
</code> <code>def set_result_path():
result_path = filedialog.askdirectory()
translate.result_root_path = result_path
text2.insert(tk.END, result_path)
</code> <code>def translate_files():
if translate.file_paths:
translate.translate_files()
tk.messagebox.showinfo("提示", "搞定")
else:
tk.messagebox.showinfo("提示", "无文件")
</code>Batch Image Processing (transclass.py)
<code>class Translate():
def __init__(self, name, file_paths, result_root_path, trans_type):
self.name = name
self.file_paths = file_paths
self.result_root_path = result_root_path
self.trans_type = trans_type
def translate_files(self):
for file_path in self.file_paths:
file_name = os.path.basename(file_path)
print('===========' + file_path + '===========' )
trans_reult = self.translate_use_netease(file_path)
open(self.result_root_path + '/result_' + file_name.split('.')[0] + '.txt','w').write(trans_reult)
def translate_use_netease(self, file_content):
result = connect(file_content)
return result
</code>Calling Youdao API (pictranslate.py)
<code>def connect(file_content, fromLan, toLan):
f = open(file_content, 'rb')
q = base64.b64encode(f.read()).decode('utf-8')
f.close()
data = {}
data['from'] = 'auto'
data['to'] = 'auto'
data['type'] = '1'
data['q'] = q
salt = str(uuid.uuid1())
signStr = APP_KEY + q + salt + APP_SECRET
sign = encrypt(signStr)
data['appKey'] = APP_KEY
data['salt'] = salt
data['sign'] = sign
response = do_request(data)
result = json.loads(str(response.content, encoding="utf-8"))
translateResults = result['resRegions']
pictransresult = ""
for i in translateResults:
pictransresult = pictransresult + i['tranContent'] + "\n"
return pictransresult
</code>Result Summary
The JSON response contains fields such as orientation , lanFrom , lanTo , and resRegions with detailed translation content, bounding boxes, and layout information.
Field
Description
orientation
Image orientation
lanFrom
Detected source language
textAngle
Image tilt angle
errorCode
Error code
lanTo
Target language
resRegions
Translated content per region
The author concludes that leveraging an open AI platform makes image recognition and natural language processing straightforward, allowing more time for personal enjoyment.
Project repository: https://github.com/LemonQH/BatchPicTranslate
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.