Artificial Intelligence 18 min read

Practical Guide to OCR Text Recognition, Message Push, Image Processing, and Android UI Dump for Automation

This tutorial walks through OCR fundamentals using pytesseract and chineseocr_lite, demonstrates how to push notifications via Server酱, provides reusable Python image‑processing utilities, and shows how to dump and parse Android UI XML for automated interactions.

Rare Earth Juejin Tech Community

Oct 10, 2022

Practical Guide to OCR Text Recognition, Message Push, Image Processing, and Android UI Dump for Automation

This article introduces a series of practical automation techniques, starting with an overview of OCR (Optical Character Recognition) concepts and common use cases such as license‑plate, document, and captcha recognition.

It then details two OCR libraries. The first, pytesseract , requires installing the Python wrapper and the Tesseract engine (including language packs). After installation, a simple script loads an image and extracts Chinese text:

# Install pytesseract
pip install pytesseract

# Install Tesseract (system specific)
# Windows: download exe from https://github.com/UB-Mannheim/tesseract/wiki
# macOS: brew install tesseract

import pytesseract
from PIL import Image

if __name__ == '__main__':
    image = Image.open('test_ocr.jpg')
    text = pytesseract.image_to_string(image, lang='chi_sim')
    print(text)

If the default language pack is missing, the guide shows how to download chi_sim.traineddata and place it in the tessdata directory.

The second library, chineseocr_lite , is recommended for better speed and accuracy. It can be used via a local service or through cloud APIs such as Baidu OCR. Example code for Baidu OCR:

from aip import AipOcr

def get_file_content(file_path):
    with open(file_path, 'rb') as fp:
        return fp.read()

class BaiDuOCR:
    def __init__(self):
        self.APP_ID = "xxx"
        self.API_KEY = "xxx"
        self.SECRET_KEY = "xxx"
        self.client = AipOcr(self.APP_ID, self.API_KEY, self.SECRET_KEY)

    def general(self, pic_path):
        ocr_result = self.client.basicGeneral(get_file_content(pic_path))
        if ocr_result is not None:
            print("识别结果：" + str(ocr_result))
        else:
            print("识别失败")
            raise Exception("识别失败异常")

if __name__ == '__main__':
    ocr_client = BaiDuOCR()
    ocr_client.general('test_ocr.jpg')

Both approaches require handling API limits; the free tier of Server酱 (used for message push) allows five messages per day, which is sufficient for simple auto‑check‑in scripts.

The article then covers message‑push options, focusing on Server酱. After obtaining a SendKey, a Python function sends a WeChat notification:

import requests as r

send_key = "xxx"
send_url = "https://sctapi.ftqq.com/%s.send" % send_key

def send_wx_message(title, desp, short, channel=9):
    """Send a WeChat message via Server酱.
    title: required, max 32 chars
    desp: optional, max 32KB
    short: optional, max 64 chars
    channel: default 9 (方糖服务号)
    """
    resp = r.post(send_url, data={'title': title, 'desp': desp, 'short': short, 'channel': channel})
    if resp and resp.status_code == 200:
        print("消息发送成功")
    else:
        print("消息发送失败")
    print(resp.text)

if __name__ == '__main__':
    send_wx_message("测试标题", "测试消息内容

" * 16, "测试卡片")

Next, a collection of reusable image‑processing utilities is provided (crop, resize, grayscale, binary conversion). Example snippet:

import os, time
from PIL import Image

def get_picture_size(pic_path):
    """Return (width, height, format) of an image."""
    img = Image.open(pic_path)
    return img.width, img.height, img.format

def crop_area(pic_path, start_x, start_y, end_x, end_y):
    img = Image.open(pic_path)
    region = img.crop((start_x, start_y, end_x, end_y))
    save_path = os.path.join(os.getcwd(), "crop_" + str(round(time.time()*1000)) + ".jpg")
    region.save(save_path)
    return save_path
# ... other functions: resize_picture, picture_to_gray, picture_to_black_white

Finally, the guide explains how to obtain all UI elements of the current Android screen using the built‑in uiautomator.jar tool. The UI hierarchy is dumped to an XML file via ADB, pulled to the PC, and parsed with lxml to extract node attributes such as bounds, text, and resource‑id. Sample parsing code:

from lxml import etree
import re

bounds_pattern = re.compile(r"\[(\d+),(\d+)\]\[(\d+),(\d+)\]")

class Node:
    def __init__(self, index=None, text=None, resource_id=None, class_name=None, package=None, content_desc=None, bounds=None):
        self.index = index
        self.text = text
        self.resource_id = resource_id
        self.class_name = class_name
        self.package = package
        self.content_desc = content_desc
        self.bounds = bounds
        self.nodes = []
    def add_node(self, node):
        self.nodes.append(node)

def analysis_ui_xml(xml_path):
    root = etree.parse(xml_path, parser=etree.XMLParser(encoding="utf-8"))
    root_node_element = root.xpath('/hierarchy/node')[0]
    node = analysis_element(root_node_element)
    print_node(node)
    return node

def analysis_element(element):
    if element is not None and element.tag == "node":
        bounds_result = re.search(bounds_pattern, element.attrib['bounds'])
        node = Node(
            int(element.attrib['index']),
            element.attrib['text'],
            element.attrib['resource-id'],
            element.attrib['class'],
            element.attrib['package'],
            element.attrib['content-desc'],
            (int(bounds_result[1]), int(bounds_result[2]), int(bounds_result[3]), int(bounds_result[4]))
        )
        for child in element.xpath('node'):
            child_node = analysis_element(child)
            if child_node:
                node.nodes.append(child_node)
        return node

def print_node(node, space_count=0):
    widget_info = "%d - %s - %s - %s - %s - %s - %s" % (
        node.index, node.text, node.resource_id, node.class_name, node.package, node.content_desc, node.bounds)
    print("  " * space_count + widget_info)
    for child in node.nodes:
        print_node(child, space_count + 1)

if __name__ == '__main__':
    analysis_ui_xml('ui_1665224943842.xml')

The article concludes with a brief recap, emphasizing that the covered techniques—OCR, notification, image handling, and UI dumping—are highly practical for building robust Android automation scripts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Android Automation OCR ImageProcessing

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.