Tag

Impala

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Jan 14, 2024 · Big Data

Optimizing Object Storage and Impala Engine in NetEase NDH: Performance Enhancements and Feature Additions

This presentation outlines NetEase's NDH big‑data platform, detailing its background, object‑storage upload and rename optimizations, Impala engine adaptations—including file‑handle caching, transparent URI handling, and getFileBlockLocations improvements—and a suite of operational enhancements such as dynamic proxy user configuration and audit‑log extensions.

AlluxioImpalaNDH
0 likes · 14 min read
Optimizing Object Storage and Impala Engine in NetEase NDH: Performance Enhancements and Feature Additions
DataFunSummit
DataFunSummit
Aug 7, 2023 · Big Data

Performance Optimizations in Impala for Data Lake Queries: Iceberg and Codegen Enhancements

This article presents a comprehensive overview of Impala's high‑performance MPP query engine, its architecture for data‑lake workloads, and detailed performance optimizations including Iceberg table format improvements, manifest caching, and various Codegen techniques such as asynchronous compilation and caching.

CodegenImpalaQuery Optimization
0 likes · 17 min read
Performance Optimizations in Impala for Data Lake Queries: Iceberg and Codegen Enhancements
DataFunSummit
DataFunSummit
Dec 5, 2022 · Big Data

Impala Cluster Performance Optimization Based on Historical Queries: Practices and Solutions

This article presents a comprehensive overview of Impala cluster performance optimization using historical query analysis, covering background, high‑performance data‑warehouse construction principles, identified pain points, HBO implementation details, optimization techniques, and future development plans for the Impala ecosystem.

HBOHistorical QueriesImpala
0 likes · 16 min read
Impala Cluster Performance Optimization Based on Historical Queries: Practices and Solutions
DataFunSummit
DataFunSummit
Sep 24, 2022 · Big Data

Evolution of 37 Mobile Games' Multi-Dimensional Analysis Platform: From MySQL to StarRocks

The article details how 37 Mobile Games built and continuously evolved a multi-dimensional analytics platform—covering business background, data challenges, the migration from MySQL through Druid, Impala, ClickHouse to StarRocks, self‑service data tools, monitoring, and future roadmap—highlighting technical decisions and lessons learned.

AnalyticsClickHouseImpala
0 likes · 20 min read
Evolution of 37 Mobile Games' Multi-Dimensional Analysis Platform: From MySQL to StarRocks
DataFunSummit
DataFunSummit
Apr 9, 2022 · Big Data

Impala Deployment and Optimization: Practical Experience with Sensor Data Multi‑dimensional Analysis Platform

This article presents a comprehensive technical walkthrough of Sensor Data's multi‑dimensional analysis platform, covering product architecture, an Impala‑based real‑time query engine, query performance tuning, resource‑estimation strategies, and future plans, with concrete diagrams, test results, and community contributions.

Data ArchitectureImpalaQuery Optimization
0 likes · 19 min read
Impala Deployment and Optimization: Practical Experience with Sensor Data Multi‑dimensional Analysis Platform
DataFunTalk
DataFunTalk
Apr 4, 2022 · Big Data

Impala Deployment and Optimization in Sensors Data's Multi-Dimensional Analytics Platform

This article details the architecture of Sensors Data's analytics platform, the implementation of a real‑time Impala query engine, multiple query‑performance optimizations—including storage redesign, user‑behavior sequence tuning, join elimination and expression push‑down—and a resource‑estimation framework that dramatically reduces query failures and latency.

ImpalaQuery OptimizationResource Estimation
0 likes · 16 min read
Impala Deployment and Optimization in Sensors Data's Multi-Dimensional Analytics Platform
DataFunTalk
DataFunTalk
Oct 7, 2021 · Big Data

Impala Architecture, Concurrency, CBO Join Optimization, and Storage Layer in Tencent Financial Big Data Scenarios

This article introduces Impala's overall architecture, storage options, key features, concurrency mechanisms, CBO‑based join optimization techniques, storage‑layer principles and data‑filtering strategies, and summarizes practical performance‑tuning experiences from Tencent's financial big‑data platform.

CBOConcurrencyImpala
0 likes · 12 min read
Impala Architecture, Concurrency, CBO Join Optimization, and Storage Layer in Tencent Financial Big Data Scenarios
Tencent Tech
Tencent Tech
Sep 10, 2021 · Big Data

How Sohu Changyou Migrated 1 PB of Game Data to the Cloud Without Downtime

This article details how Sohu Changyou’s data team, together with Tencent Cloud engineers, planned and executed a seamless migration of over one petabyte of game data to Elastic MapReduce, Elasticsearch Service and Oceanus, achieving zero service impact and dramatically improving analytics performance.

EMRImpalaTencent Cloud
0 likes · 9 min read
How Sohu Changyou Migrated 1 PB of Game Data to the Cloud Without Downtime
DataFunTalk
DataFunTalk
Feb 14, 2021 · Big Data

Impala at NetEase: Architecture, Iceberg Integration, Management System, Optimizations and Future Roadmap

This talk presents NetEase's practical experience with Impala, covering its core architecture, new features in version 3.x, integration with Apache Iceberg, a custom management platform, profiling and statistics enhancements, as well as future plans involving Kubernetes, Alluxio caching and pre‑computation strategies.

Apache IcebergImpalaSQL Optimization
0 likes · 13 min read
Impala at NetEase: Architecture, Iceberg Integration, Management System, Optimizations and Future Roadmap
DataFunTalk
DataFunTalk
Oct 19, 2020 · Big Data

Impala Optimization and Practices at NetEase Big Data Platform

This article presents a comprehensive overview of NetEase's use of Impala as an OLAP query engine, detailing its architectural advantages, performance benefits, enhancements such as management servers, metadata synchronization, high‑availability via Zookeeper, expanded storage support, and real‑world deployment cases in the "Mammoth" platform and NetEase Cloud Music.

High AvailabilityImpalaMetadata Sync
0 likes · 11 min read
Impala Optimization and Practices at NetEase Big Data Platform
DataFunTalk
DataFunTalk
Sep 17, 2020 · Big Data

Design and Implementation of a Scalable User Tag Production Platform

The article explains how a flexible, high‑performance user‑tagging system is built on a batch‑stream integrated architecture using big‑data technologies such as Impala, HDFS, and Flink to support both offline and real‑time label generation for precise marketing, product improvement, and operational analytics.

FlinkImpalaReal-time Streaming
0 likes · 15 min read
Design and Implementation of a Scalable User Tag Production Platform
Big Data Technology Architecture
Big Data Technology Architecture
Feb 3, 2020 · Big Data

NetEase Data Foundation Platform Construction – Technical Sharing

This article, originally shared by NetEase’s data expert Jiang Hongxiang on DataFun, outlines the construction of NetEase’s data foundation platform, covering database kernel insights and the implementation of the ad‑hoc query engine Impala with the distributed storage system Kudu, offering valuable big‑data engineering practices.

Data InfrastructureImpalaKudu
0 likes · 4 min read
NetEase Data Foundation Platform Construction – Technical Sharing
DataFunTalk
DataFunTalk
Feb 18, 2019 · Big Data

Hulu’s Big Data Architecture and Sophon OLAP Cache Layer Overview

This article presents an in‑depth overview of Hulu’s big‑data platform, detailing its multi‑layer architecture, the design and functionality of the Sophon OLAP cache layer, and how Impala is employed for high‑performance query processing and integration with cloud‑native engines.

Data ArchitectureHuluImpala
0 likes · 16 min read
Hulu’s Big Data Architecture and Sophon OLAP Cache Layer Overview
DataFunTalk
DataFunTalk
Jan 16, 2019 · Big Data

NetEase Data Infrastructure: Database Technologies and Big Data Platform Overview

This article presents NetEase Hangzhou Research Institute's experience in building a data infrastructure, covering database innovations such as InnoSQL, NTSDB, and InnoRocks, as well as the integration of big‑data components like HDFS, Spark, Impala, and Kudu to enable efficient storage, processing, and real‑time analytics.

DatabaseImpalaInnoSQL
0 likes · 12 min read
NetEase Data Infrastructure: Database Technologies and Big Data Platform Overview
Architecture Digest
Architecture Digest
Sep 21, 2016 · Big Data

Log Platform Architecture and Scaling Lessons from Vipshop's 419 Promotion

This article presents a detailed case study of Vipshop's log platform during the 419 sales event, analyzing the 2013 architecture, bottlenecks in RabbitMQ and Storm, and the subsequent redesign using Kafka, Impala, and HBase to achieve scalable, reliable big‑data processing.

ImpalaKafkaarchitecture
0 likes · 16 min read
Log Platform Architecture and Scaling Lessons from Vipshop's 419 Promotion