Artificial Intelligence 9 min read

Building a Weibo Influencer Finder with LangChain and LLM

This article demonstrates how to use LangChain, LLMs, and SerpAPI to create a Weibo influencer‑search tool that extracts UID numbers, scrapes profile data, filters Chinese content, and prepares the information for automated marketing outreach.

Rare Earth Juejin Tech Community

Jan 28, 2024

Building a Weibo Influencer Finder with LangChain and LLM

The author introduces LangChain as a mature AI application framework and outlines a plan to build a social‑network tool for locating suitable Weibo influencers (大V) to promote a dry‑goods store.

Project requirements : The marketing team wants to identify influential users on Weibo who are interested in food‑supplement topics, then contact them for collaborations.

Technical analysis : Use LangChain's search chain to find relevant UID numbers, write a crawler to fetch public profile data in JSON, and employ LLM prompts to generate invitation messages.

Environment setup and UID lookup code :

# 环境变量设置
import os
os.environ['OPENAI_API_KEY'] = ''
os.environ['SERPAPI_API_KEY'] = ''
# 正则模块
import re 
# 核心开发一个weibo_agent find_v方法
from agents.weibo_agent import find_V

if __name == "__main__":
    response_UID = find_v(food_type="助眠")
    print(response_UID)
    # 从返回结果中正则所有的UID数字
    UID = re.findall(r'\d+', response_UID)[0]
    print("这位大V的微博ID是", UID)

Custom agent implementation (find_V) :

# tools_search_tool后面会编写
from tools_search_tool import get_UID
# 模板
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
# 准备定制Agent
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType

def find_V(food_type: str):
    llm = ChatOpenAI(temperature=0,model_name="gpt-3.5-turbo")
    template = """given the {food} I want you to get a related 微博 UID.
    Your answer should contain only a UID.
    The URL always starts with https://weibo.com/u/<
    for example, if https://weibo.com/u/3659536733 is her 微博, then 3659536733 is him UID This is only the example don't give me this, but the actual UID
    """
    prompt_template = PromptTemplate(
        input_variables=["food"],
        template=template
    )
    tools = [
        Tool(
            name="Crawl Google for 微博 page",
            func=get_UID,
            description="useful for when you need get the 微博UID"
        )
    ]
    agent = initialize_agent(
        tools,
        llm,
        agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
        verbose=True
    )
    ID = agent.run(prompt_template.format_prompt(food=food_type))
    return ID

The agent uses langchain.agents.initialize_agent and Tool to call a SerpAPI‑based search for the Weibo page.

SerpAPI UID retrieval :

# langchain 集成了 SerpAIWrapper
from langchain.utilities import SerpAIWrapper
def get_UID(food: str):
    """Searches for Weibo Page."""
    search = SerpAPIWrapper()
    res = search.run(f"{food}")
    return res

Running the above returns a UID such as 3659536733.

Web scraping for profile data :

import tools.scraping_tool import get_data

person_info=get_data(UID)
print(person_info)

The scraping_tool module contains:

import json     # json解析
import requests #发送请求
import time #时间

def scrape_weibo(url: str):
    '''爬取相关博主的资料'''
    headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36", "Referer": "https://weibo.com" }
    cookies = { "cookie": "..." }
    response = requests.get(url, headers=headers,cookies=cookies)
    time.sleep(2) # 加上2秒的延时 防止被反爬
    return response.text

def get_data(id):
    url = "https://weibo.com/ajax/profile/detail?uid={}".format(id)
    html = scrape_weibo(url)
    response = json.loads(html)
    return response

After fetching the JSON profile, non‑Chinese fields are removed to keep only Chinese content:

import re

def contains_chinese(s):
    return bool(re.search('[\u4e00-\u9fa5]', s))

def remove_non_chinese_fields(d):
    if isinstance(d, dict):
        to_remove = [key for key, value in d.items() if isinstance(value, (str, int, float, bool)) and (not contains_chinese(str(value)))]
        for key in to_remove:
            del d[key]
        for key, value in d.items():
            if isinstance(value, (dict, list)):
                remove_non_chinese_fields(value)
    elif isinstance(d, list):
        to_remove_indices = []
        for i, item in enumerate(d):
            if isinstance(item, (str, int, float, bool)) and (not contains_chinese(str(item))):
                to_remove_indices.append(i)
            else:
                remove_non_chinese_fields(item)
        for index in reversed(to_remove_indices):
            d.pop(index)

Summary : The article completes the workflow of locating a suitable Weibo influencer UID, scraping their profile, and preparing the data for downstream LLM‑generated invitation content.

Key LangChain components used: PromptTemplate, LLM, Chain, Agent, and custom Agent Tool.

Reference : Huang Jia’s LangChain course.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM LangChain Agent Web Scraping PromptTemplate Weibo

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.