Snowball Knowledge Graph Construction, Applications, and Industrial Deployment
The article details Snowball's large‑scale financial knowledge graph, covering its background challenges, two‑layer ontology and data design, data sourcing and pipeline, graph database selection, search and NLP services, domain‑specific pre‑training models, and future industrial considerations.
Introduction
Snowball, founded in 2010, provides a high‑engagement investment community and comprehensive services for Chinese investors, aiming to connect people with assets.
1.2 Knowledge Graph Construction Background
Data silos and costly cross‑cluster queries
Increasing structured query demands
Inconsistent data granularity
Deep relational queries and inference needs
Strict real‑time requirements for search and recommendation
To serve these needs, Snowball built a financial knowledge graph that integrates products, targets, articles, and users.
Snowball Knowledge Graph Construction
2.1 Knowledge Graph Overview
The graph models funds, stocks, articles, concepts, and scenarios, covering millions of nodes and edges with billions of attributes, providing comprehensive coverage of financial knowledge.
2.2 Structural Design
The graph consists of an ontology layer (entities such as persons, institutions, targets, articles) and a data layer (actual instances and relationships), enabling multi‑level, multi‑dimensional, cross‑business modeling.
Unified knowledge granularity – from basic terms to product categories and business‑specific needs.
Cross‑scenario knowledge interconnection – linking funds, stocks, texts, events, and more via heterogeneous relationships.
2.3 Construction Process
2.3.1 Data Sources
Structured expert knowledge (fund, stock, company details) and unstructured self‑generated knowledge (algorithm‑produced data) are combined.
2.3.2 Data Pipeline
Data engineers aggregate and process expert and self‑generated data into Hive tables, then import them into the graph; NLP teams use the graph for search and algorithm development.
2.3.3 Maintenance & Monitoring
Real‑time updates (hour‑level) ensure freshness of market data; comprehensive monitoring of data accuracy, completeness, and graph size guarantees high availability.
2.4 Graph Database Selection
After evaluating Neo4j, Nebula, and Jenus, Snowball chose Nebula Graph for its distributed scalability and millisecond‑level query latency, supporting billions of vertices and edges.
Knowledge Graph Applications
3.1 Search Service
The graph powers a structured search system that interprets user intent, rewrites queries into graph language, and returns precise results, improving over the previous unstructured post‑search.
Intent recognition combines rule‑based engines (using DFA for fast regex matching) with deep‑learning models for generalization.
3.2 Front‑End Card Display
For financial‑report queries, the graph delivers structured card views of earnings data, reducing user effort and enhancing the search experience.
3.3 Pre‑trained Language Model
Snowball released SnowBERT, a finance‑domain pre‑training model built on a corpus of billions of tokens from the graph’s textual data, achieving notable gains (e.g., +8% recall) on semantic matching and fund classification tasks.
3.4 Future Outlook
Future work includes expanding graph content, enhancing search features, and deeper integration of graph knowledge with pre‑training models to enable mutual reinforcement between algorithms and the graph.
Industrial Deployment Reflections
Knowledge graphs now overcome earlier adoption barriers; the key challenge is practical integration into business workflows, leveraging graph‑based inference and deep‑learning to unlock hidden insights.
Snowball Engineer Team
Proactivity, efficiency, professionalism, and empathy are the core values of the Snowball Engineer Team; curiosity, passion, and sharing of technology drive their continuous progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.