Tag

adaptive query

1 views collected around this technical thread.

Big Data Technology Architecture
Big Data Technology Architecture
Apr 8, 2021 · Big Data

Managing Small Files in Spark SQL: Causes, Impact, and Practical Solutions

This article explains the small‑file problem in Spark SQL on HDFS, its impact on NameNode memory and query performance, describes how dynamic partition inserts and shuffle settings generate many files, and presents practical solutions such as partition‑based distribution, random bucketing and adaptive query execution to control file count.

HadoopPerformanceSmall Files
0 likes · 12 min read
Managing Small Files in Spark SQL: Causes, Impact, and Practical Solutions