Databases 31 min read

Designing and Deploying Elasticsearch for Large‑Scale Reading Records and Search in a .NET Platform

This article explains how to evaluate, select, and implement Elasticsearch as a scalable NoSQL search engine for handling tens of millions of reading‑record entries and full‑text work‑search, covering architectural trade‑offs, memory usage, indexing strategies, cluster sharding, pagination limits, server sizing, and .NET integration with code examples.

IT Architects Alliance

Feb 18, 2022

Designing and Deploying Elasticsearch for Large‑Scale Reading Records and Search in a .NET Platform

The author begins with a common interview question about introducing new technology against leadership resistance and outlines four practical steps: justify with concrete problems, build trust, provide data‑driven proposals, and manage people dynamics.

Background : The company stores tens of millions of rows in SQL Server. Certain use cases—such as detailed reading logs and keyword‑based work search—cannot be satisfied efficiently with relational tables due to full‑table scans and lack of horizontal scalability.

To address these issues the author evaluates NoSQL options and selects Elasticsearch for three reasons: the operations team is already familiar with it, its Elastic Stack meets upcoming reporting needs, and the workload is primarily read‑only, near‑real‑time search.

Elasticsearch Advantages (presented in a table): horizontal scalability, fast shard‑based indexing, rich full‑text capabilities, high availability via replicas, and easy RESTful API usage.

Key Drawbacks : high memory consumption, discussed with a memory‑vs‑disk performance table showing that in‑memory reads are orders of magnitude faster than SSD or HDD access.

Indexing Mechanics : The author explains inverted indexes, shows a sample term‑to‑document table, and describes how doc‑values provide column‑oriented storage for aggregations.

Cluster Sharding : Queries are broadcast to all shards, results are merged, and pagination depth is limited (max 10,000 hits) to avoid excessive data transfer; scroll and search_after are recommended for deep paging.

Server Sizing : JVM heap should be ≤32 GB and ≤½ of system RAM; the production environment uses three 16‑core, 64 GB SSD servers with a 32 GB heap per node.

Design Scheme (sections 8‑16):

Wrap Elasticsearch calls in a .NET 5 WebAPI to hide technical details.

Define a base ElasticsearchEntity with Id and microsecond‑precision Timestamp to avoid missing records during scroll.

Use DateTimeOffset for UTC storage and map to DateTime in DTOs via Mapster/AutoMapper.

Implement asynchronous write via RabbitMQ consumer:

public class UserViewDurationConsumer : BaseConsumer
{
    private readonly ElasticClient _elasticClient;
    public UserViewDurationConsumer(ElasticClient elasticClient) { _elasticClient = elasticClient; }
    public override void Excute(UserViewDurationMessage msg)
    {
        var document = msg.MapTo<UserViewDuration>();
        var result = _elasticClient.Create(document, a => a.Index(typeof(Entity.UserViewDuration).GetRelationName() + "-" + msg.CreateDateTime.ToString("yyyy-MM"))).GetApiResult();
        if (result.Failed) LoggerHelper.WriteToFile(result.Message);
    }
}

Startup registers the consumer with app.UseSubscribe(lifetime) so the same process handles both HTTP requests and message consumption.

Read‑record API example (excerpt):

[HttpGet]
[Route("record")]
public ApiResult<List<UserViewDuration>> GetRecord([FromQuery] UserViewDurationRecordGetRequest request)
{
    // build must‑queries based on request parameters
    var mustQuerys = new List<Func<QueryContainerDescriptor<UserViewDuration>, QueryContainer>>();
    // ... (terms, date ranges, etc.)
    var searchResult = _elasticClient.Search<UserViewDuration>(a => a
        .Index(typeof(UserViewDuration).GetRelationName() + "-" + dateTime)
        .Size(request.Size)
        .Query(q => q.Bool(b => b.Must(mustQuerys)))
        .SearchAfterTimestamp(request.Timestamp)
        .Sort(s => s.Field(f => f.Timestamp, SortOrder.Descending)));
    var apiResult = searchResult.GetApiResult<UserViewDuration, List<UserViewDuration>>();
    return ApiResult<List<UserViewDuration>>.IsSuccess(apiResult.Data);
}

For the search‑key use case the author defines a SearchKey document with a Text field KeyName that uses both the standard analyzer and a pinyin analyzer, enabling Chinese phonetic search.

Data synchronization is performed with Quartz.NET jobs that pull batches of rows from SQL Server, enrich them with tag data, and bulk‑index them into a time‑based index (e.g., searchkey-202112261121). After each bulk load the alias searchkey is atomically switched to the new index and the old index is deleted.

Bulk‑index API (excerpt):

[HttpPost]
public ApiResult Post(SearchKeyPostRequest request)
{
    if (!request.Items.Any()) return ApiResult.IsFailed("无传入数据");
    var date = DateTime.Now;
    var relationName = typeof(SearchKey).GetRelationName();
    var indexName = request.IndexName.IsNullOrWhiteSpace()
        ? relationName + "-" + date.ToString("yyyyMMddHHmmss")
        : request.IndexName;
    if (request.IndexName.IsNullOrWhiteSpace())
    {
        var createResult = _elasticClient.Indices.Create(indexName, a => a
            .Map(m => m.AutoMap()
                .Properties(p => p.Custom(new TextProperty {
                    Name = "key_name",
                    Analyzer = "standard",
                    Fields = new Properties(new Dictionary<PropertyName, IProperty> {
                        { new PropertyName("pinyin"), new TextProperty { Analyzer = "pinyin" } },
                        { new PropertyName("standard"), new TextProperty { Analyzer = "standard" } }
                    })
                }))));
        if (!createResult.IsValid && request.IndexName.IsNullOrWhiteSpace())
            return ApiResult.IsFailed("创建索引失败");
    }
    var document = request.Items.MapTo<List<SearchKey>>();
    var result = _elasticClient.BulkAll(indexName, document);
    return result ? ApiResult.IsSuccess(data: indexName) : ApiResult.IsFailed();
}

Alias‑switching API ensures zero‑downtime by adding the new index to the alias first, then removing the old one, and finally deleting the obsolete index.

Search API combines should (OR) and must (AND) clauses; minimumShouldMatch=1 is set so that at least one optional term must match, preventing false positives.

Monitoring is handled with Elastic APM + Kibana (v7.4), providing traceability for the .NET microservices.

Conclusion: The migration to Elasticsearch was performed smoothly, delivering high‑performance search and analytics for reading‑record and work‑search scenarios while keeping operational overhead low.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch NoSQL Search net

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.