Databases 18 min read

Snowball Knowledge Graph Construction, Applications, and Industrial Deployment

The article details Snowball's large‑scale financial knowledge graph, covering its background challenges, two‑layer ontology and data design, data sourcing and pipeline, graph database selection, search and NLP services, domain‑specific pre‑training models, and future industrial considerations.

Snowball Engineer Team

Sep 1, 2022

Snowball Knowledge Graph Construction, Applications, and Industrial Deployment

Introduction

Snowball, founded in 2010, provides a high‑engagement investment community and comprehensive services for Chinese investors, aiming to connect people with assets.

1.2 Knowledge Graph Construction Background

Data silos and costly cross‑cluster queries

Increasing structured query demands

Inconsistent data granularity

Deep relational queries and inference needs

Strict real‑time requirements for search and recommendation

To serve these needs, Snowball built a financial knowledge graph that integrates products, targets, articles, and users.

Snowball Knowledge Graph Construction

2.1 Knowledge Graph Overview

The graph models funds, stocks, articles, concepts, and scenarios, covering millions of nodes and edges with billions of attributes, providing comprehensive coverage of financial knowledge.

2.2 Structural Design

The graph consists of an ontology layer (entities such as persons, institutions, targets, articles) and a data layer (actual instances and relationships), enabling multi‑level, multi‑dimensional, cross‑business modeling.

Unified knowledge granularity – from basic terms to product categories and business‑specific needs.

Cross‑scenario knowledge interconnection – linking funds, stocks, texts, events, and more via heterogeneous relationships.

2.3 Construction Process

2.3.1 Data Sources

Structured expert knowledge (fund, stock, company details) and unstructured self‑generated knowledge (algorithm‑produced data) are combined.

2.3.2 Data Pipeline

Data engineers aggregate and process expert and self‑generated data into Hive tables, then import them into the graph; NLP teams use the graph for search and algorithm development.

2.3.3 Maintenance & Monitoring

Real‑time updates (hour‑level) ensure freshness of market data; comprehensive monitoring of data accuracy, completeness, and graph size guarantees high availability.

2.4 Graph Database Selection

After evaluating Neo4j, Nebula, and Jenus, Snowball chose Nebula Graph for its distributed scalability and millisecond‑level query latency, supporting billions of vertices and edges.

Knowledge Graph Applications

3.1 Search Service

The graph powers a structured search system that interprets user intent, rewrites queries into graph language, and returns precise results, improving over the previous unstructured post‑search.

Intent recognition combines rule‑based engines (using DFA for fast regex matching) with deep‑learning models for generalization.

3.2 Front‑End Card Display

For financial‑report queries, the graph delivers structured card views of earnings data, reducing user effort and enhancing the search experience.

3.3 Pre‑trained Language Model

Snowball released SnowBERT, a finance‑domain pre‑training model built on a corpus of billions of tokens from the graph’s textual data, achieving notable gains (e.g., +8% recall) on semantic matching and fund classification tasks.

3.4 Future Outlook

Future work includes expanding graph content, enhancing search features, and deeper integration of graph knowledge with pre‑training models to enable mutual reinforcement between algorithms and the graph.

Industrial Deployment Reflections

Knowledge graphs now overcome earlier adoption barriers; the key challenge is practical integration into business workflows, leveraging graph‑based inference and deep‑learning to unlock hidden insights.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Graph Database NLP knowledge graph Search financial data

Written by

Snowball Engineer Team

Proactivity, efficiency, professionalism, and empathy are the core values of the Snowball Engineer Team; curiosity, passion, and sharing of technology drive their continuous progress.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.