Tag

data indexing

1 views collected around this technical thread.

Test Development Learning Exchange
Test Development Learning Exchange
Nov 9, 2024 · Fundamentals

Comprehensive Guide to Pandas Indexing Methods: loc, iloc, Boolean Indexing, Set/Reset Index, Multi‑Index, Alignment, Sorting, Dropping, and Advanced Techniques

This article provides a comprehensive guide to Pandas indexing in Python, covering basic loc and iloc selection, Boolean indexing, setting and resetting indices, multi‑level indexing, index alignment, sorting, dropping, and advanced methods such as at, iat, and query, with complete code examples.

boolean-indexingdata indexingdata-analysis
0 likes · 9 min read
Comprehensive Guide to Pandas Indexing Methods: loc, iloc, Boolean Indexing, Set/Reset Index, Multi‑Index, Alignment, Sorting, Dropping, and Advanced Techniques
DataFunTalk
DataFunTalk
Aug 21, 2024 · Big Data

Apache Paimon: Real‑Time Lakehouse Architecture, Core Technologies, Application Scenarios, and Frontier Features

This article presents a comprehensive overview of Apache Paimon, covering the concept of real‑time lakehouses, the underlying technologies such as LSM and merge‑on‑write, practical application cases across enterprises, and the latest frontier features like tags, branches, and advanced indexing, illustrating how Paimon bridges batch and streaming workloads in modern big‑data ecosystems.

Apache PaimonLSMbig data
0 likes · 16 min read
Apache Paimon: Real‑Time Lakehouse Architecture, Core Technologies, Application Scenarios, and Frontier Features
DataFunSummit
DataFunSummit
Oct 16, 2023 · Big Data

Bilibili's Iceberg‑Based Lakehouse Platform: Technical Practices for Sub‑Second Query Response

This article details Bilibili's implementation of an Iceberg‑based lakehouse platform that unifies storage and analytics, addressing Hive’s performance and latency issues through multidimensional sorting, various file‑level indexes, cube pre‑aggregation, star‑tree structures, and an automated Magnus service for intelligent optimization, achieving near‑second query responses.

OLAPQuery Accelerationbig data
0 likes · 14 min read
Bilibili's Iceberg‑Based Lakehouse Platform: Technical Practices for Sub‑Second Query Response
Weimob Technology Center
Weimob Technology Center
Aug 4, 2023 · Backend Development

How a Scalable Business Search Platform Powers Billions of Queries in WOS

The article outlines the background, design, challenges, and future roadmap of a business search platform within the Weimob Operating System, detailing its architecture, event ingestion, index building, and retrieval services that enable low‑cost, high‑performance search across multiple business domains.

Microservicesbackend architecturedata indexing
0 likes · 9 min read
How a Scalable Business Search Platform Powers Billions of Queries in WOS
JD Retail Technology
JD Retail Technology
Jul 19, 2022 · Backend Development

Design and Architecture of JD Retail Product Selection Platform

This article details the design and implementation of JD Retail’s product selection platform, covering its business background, core data retrieval capabilities, domain model, system architecture—including frontend configurability, backend query engine, ClickHouse indexing, and both offline and real-time data processing pipelines.

Backend DevelopmentSystem Architecturebig data
0 likes · 14 min read
Design and Architecture of JD Retail Product Selection Platform
DevOps Cloud Academy
DevOps Cloud Academy
Jan 2, 2020 · Big Data

Introduction, Use Cases, Installation, and Basic Operations of Elasticsearch

This article introduces Elasticsearch as a distributed search and analytics engine, outlines its common application scenarios, provides step‑by‑step installation commands, explains core concepts such as documents and indices, and demonstrates basic indexing, retrieval, bulk processing, and aggregation operations.

Bulk APIDistributedElasticsearch
0 likes · 4 min read
Introduction, Use Cases, Installation, and Basic Operations of Elasticsearch
Weidian Tech Team
Weidian Tech Team
Feb 24, 2017 · Big Data

How We Built a Scalable Dump Index Architecture for 60M Users and 1.3B Products

Facing the challenges of searching across 60 million users and 1.3 billion products, Weidian’s engineering team designed a dump‑based indexing pipeline—Ergate—that consolidates, transforms, version‑controls, and monitors data from MySQL to HBase, enabling fast, flexible, and reliable search across massive datasets.

HBaseSearch Enginebig data
0 likes · 7 min read
How We Built a Scalable Dump Index Architecture for 60M Users and 1.3B Products
Ctrip Technology
Ctrip Technology
Sep 2, 2016 · Big Data

Why Druid? Architecture, Indexing, Use Cases, and Lessons Learned

This article introduces Druid as an open‑source, distributed column‑store OLAP engine, explains its architecture and indexing mechanisms, discusses real‑time and batch data ingestion for order analytics at Qunar, compares it with other engines, and shares practical tips and pitfalls.

CaravelDruidOLAP
0 likes · 8 min read
Why Druid? Architecture, Indexing, Use Cases, and Lessons Learned
Qunar Tech Salon
Qunar Tech Salon
Feb 14, 2016 · Big Data

Accelerating Real‑Time Data Queries with Solr in Alibaba's Jushita Platform

This article explains how Alibaba's Jushita platform leverages Apache Solr with a wide‑table data model and a custom QParser plugin to achieve real‑time, multi‑dimensional buyer filtering that traditional relational databases cannot handle efficiently in big‑data scenarios.

Real-time QuerySearch Enginedata indexing
0 likes · 10 min read
Accelerating Real‑Time Data Queries with Solr in Alibaba's Jushita Platform