Social Network Analysis and Graph Database Solution for 58 Community Using JanusGraph and Spark GraphX
This article describes how the 58 community builds a large‑scale social network graph, evaluates graph databases such as Neo4j, JanusGraph and HugeGraph, implements centrality metrics with Spark GraphX, and designs a JanusGraph‑based pipeline for detecting valuable and fraudulent users.
As the 58 community deepens its social network analysis applications, the increasing complexity of data demands a solution capable of mining valuable users, analyzing their relationships, and detecting cheating users among tens of millions of members.
58 Community Network Overview
The network is a point‑based topology where each node represents a user and edges represent interactions such as posts, comments, follows, and likes. By linking users through these behaviors, a massive graph is constructed for analysis.
Graph Database Research
Popular graph databases such as Neo4j, JanusGraph and HugeGraph were compared. The comparison table shows support for scalability, storage engines, transactions, partitioning, full‑text search, indexing, and other features.
Graph Storage
Neo4j
JanusGraph
HugeGraph
Scalability
Not supported
Supported
Supported
Storage Engine
Standalone
Supports HBase, Cassandra, etc.
Supports HBase, Cassandra, MySQL, etc.
Transactions
Not supported
Supported
RC‑level supported
Graph Partition
Not supported
Supported
Supported
Full‑text Search
Lucene
ES, Solr, Lucene
Built‑in
In‑Memory Store
Supported
Supported
Supported
Secondary Index
Supported
Supported
Supported
Range Index
Supported
Not supported
Supported
Persistence
Supported
Supported
Supported
Composite Index
Supported
Supported
Supported
Neo4j and JanusGraph provide good query capabilities, but Neo4j lacks distributed architecture while JanusGraph lacks built‑in graph algorithms. Therefore Spark is used for large‑scale computation, with JanusGraph as the storage engine.
Social Network Centrality
Centrality measures how central a node is within the network. Three common metrics are degree centrality, closeness centrality, and betweenness (intermediary) centrality.
Degree Centrality
Degree centrality is the total number of direct connections a node has. The formula is shown below:
High degree users are typically “big V” users with many followers.
Closeness Centrality
Closeness centrality measures the sum of the shortest distances from a node to all other nodes. It is defined as the number of nodes divided by the total distance:
Users with high closeness are well‑connected to many others, indicating strong social influence.
Betweenness (Intermediary) Centrality
Betweenness centrality counts how often a node lies on the shortest paths between other node pairs. The formula is illustrated below:
High‑betweenness users act as bridges between community clusters, facilitating information flow.
JanusGraph Architecture
JanusGraph supports massive graph storage, real‑time traversal, and OLAP analytics. Its architecture includes OLTP query, OLAP computation, transaction management, and compatibility with multiple storage back‑ends (Cassandra, HBase) and index back‑ends (Elasticsearch, Solr).
JanusGraph Cluster
The cluster diagram shows JanusGraph nodes connected to HBase for storage and Elasticsearch for indexing.
58 Community User Graph Framework
Users are modeled as nodes with properties (id, age, gender, level, degree, closeness, betweenness, etc.) and edges represent follow, like, and comment actions with attributes such as timestamp and count.
Node label: User
Edge labels: FOLLOW, LIKE, COMMENT
Node properties: node_id, age, name, degree, closeness, betweenness, …
Edge properties: date, values, …
Bulk Import into JanusGraph
Initial imports via JanusGraph server were slow for large datasets. To improve performance, the import tool was extended to connect directly to HBase and Elasticsearch, support batch transactions, enable multi‑worker parallel writes, and automatically create schema and indexes.
The optimized tool significantly reduced import time, as shown in the speed comparison chart.
System Effect and Demonstration
The pipeline automatically identifies cheating users (high degree, low closeness/betweenness) and high‑value users (high scores on all centrality metrics), improving community health and user experience.
Conclusion and Outlook
Integrating JanusGraph with Spark GraphX enables effective value‑user mining for the 58 community. Future work will explore additional graph algorithms, community detection, link analysis, and richer user tagging to support broader business scenarios.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.