How Fangdd Scales Real‑Estate Search with Elasticsearch: Architecture & Lessons
This article explains how Fangdd leverages Elasticsearch to boost search performance across consumer, broker, and internal products, detailing a platformized architecture that separates indexing and querying, addresses operational challenges, and outlines design patterns for index management and incremental updates.
Fangdd has been exploring the use of internet technology to improve service and transaction efficiency in the real‑estate industry, a concept often referred to as the "Industrial Internet".
Long industry chains and many user roles demand high collaborative efficiency across multiple products.
Transforming traditional industries with internet technology requires high SLA standards for each product function node to truly enhance user efficiency.
The company faces several real‑world search scenarios:
C‑end product provides massive information search for listings, projects, and updates.
Broker product offers complex queries for projects, listings, consultations, and online visits.
Back‑office operations products involve even more complex retrieval conditions, wide data associations, and large data volumes.
To improve performance and efficiency, Fangdd uses Elasticsearch as a platform for search and complex query services.
Beyond the obvious search use cases, Elasticsearch also allows the separation of complex query logic from core microservices, reducing both service and database complexity. For example, in the new‑home order‑transaction domain, order and transaction microservices handle write operations and simple queries while a dedicated query service syncs data to Elasticsearch for complex, multi‑service queries.
Challenges encountered include:
Multiple Elasticsearch clusters consume excessive server resources because different business lines cannot share resources.
Development teams focus on business features and lack time for cluster operation and optimization, affecting high availability.
High learning and resource costs for integrating Elasticsearch cause some teams to avoid it, increasing data‑model and code complexity.
To standardize search usage, Fangdd abstracts common problems:
Index creation and updates.
Full or batch index rebuilding.
Index querying.
Authentication, permission control, and monitoring during queries.
Platform maintenance.
Platform Design
The design separates index creation and querying into two services: Indexer and Searcher . All other services interact with Elasticsearch through these wrappers.
Indexer provides functions for creating indexes, incremental updates, and full rebuilds.
Searcher offers query interfaces, performing identity verification, permission checks, and monitoring.
An index management UI allows visual configuration of indexes.
Index Creation Details
Index creation is broken down into sub‑problems such as defining data scope, acquiring data, and synchronizing updates. Fangdd defines an IndexInfoProvider interface for these tasks:
<code>public interface IndexInfoProvider {
/**
* Minimum document ID in the index
*/
public long getMinId();
/**
* Maximum document ID in the index
*/
public long getMaxId();
/**
* Get all documents whose IDs are in [start, end)
*/
public List<IndexDocument> getDocsByIdRange(long start, long end);
/**
* Get documents for specific IDs
*/
public List<IndexDocument> getDocsByIds(List<Long> ids);
}
</code>Implementations of IndexInfoProvider are exposed as Dubbo services, using the group attribute to separate different indexes, e.g.:
<code><dubbo:service group="amc_port" ref="amcPortIndexInfoProvider" interface="com.fangdd.searchplatform.indexer.protocol.IndexInfoProvider" version="1.0.0"/></code>Administrators add new indexes via the index management UI, configure rebuild schedules and mappings, and the Indexer automatically registers the corresponding Dubbo consumer.
Full Rebuild Process
The full‑rebuild sequence splits the primary‑key range [min, max] into fixed steps, calls getDocsByIdRange for each segment, and writes the documents into a new Elasticsearch index.
Incremental Updates
When business data changes, the updater sends the index name and affected IDs to a message queue. The Indexer consumes the message, retrieves the latest documents via getDocsByIds , and updates Elasticsearch, achieving near‑real‑time incremental indexing.
Query Service
Applications request Elasticsearch queries through the Searcher, providing an appId and appKey for authentication. Each query is logged for monitoring, enabling usage statistics, rate‑limiting, and future throttling.
The Searcher supports:
Encapsulated Elasticsearch requests covering filters, fuzzy matching, aggregations, etc.
Direct Elasticsearch HTTP queries.
Elasticsearch Java API queries.
Future Plans
Generalize incremental interfaces to handle non‑sequential IDs.
Introduce safeguards for heavy aggregation queries to protect cluster resources.
Enhance real‑time monitoring and rate‑limiting for query endpoints.
Develop better operational and optimization tools for long‑term cluster maintenance.
Deployment Strategy
Different business scenarios are deployed on separate Elasticsearch clusters:
To‑C services run on isolated clusters to handle large public traffic and mitigate attacks.
To‑B services, serving enterprise customers, run on dedicated clusters with stricter SLA requirements.
Log‑analysis clusters (ELK) operate under a "write‑heavy, read‑light" pattern with distinct storage and query strategies.
Fangduoduo Tech
Sharing Fangduoduo's product and tech insights, delivering value, and giving back to the open community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.