Tagged articles

157 articles

Page 1 of 2

May 13, 2026 · Big Data

How Vivo Upgraded a Million‑Node YARN Cluster: Architecture, Scheduler Switch, and Performance Optimizations

This article details Vivo's end‑to‑end upgrade of a YARN 2.6.0 cluster to a modern version for a million‑node, hundred‑thousand‑tasks‑per‑day platform, covering architectural evolution, scheduler migration, compatibility fixes, performance tuning, and service‑continuity strategies.

Big DataCapacity SchedulerCluster Upgrade

0 likes · 28 min read

How Vivo Upgraded a Million‑Node YARN Cluster: Architecture, Scheduler Switch, and Performance Optimizations

Coder Trainee

Feb 28, 2026 · Frontend Development

Automating Front‑End Deployment with Jenkins and Yarn

This guide walks through installing Node plugins in Jenkins, configuring a NodeJS tool, creating a freestyle project, discarding old builds, setting up Git source, defining the build environment, running Yarn commands to compile the front‑end, and deploying the artifacts via SSH with a custom script.

Front-end AutomationJenkinsYARN

0 likes · 4 min read

Automating Front‑End Deployment with Jenkins and Yarn

Raymond Ops

Jan 30, 2026 · Big Data

Build an Enterprise‑Grade HDFS HA and YARN Scheduler from Scratch

This guide walks you through designing and deploying a highly available HDFS architecture with dual NameNodes, ZooKeeper‑based failover, and a tuned YARN resource scheduler, covering detailed configuration files, failover testing, performance tuning, monitoring, automated health checks, capacity planning, and best‑practice checklists for production‑grade big‑data platforms.

Big DataHAHDFS

0 likes · 28 min read

Build an Enterprise‑Grade HDFS HA and YARN Scheduler from Scratch

MaGe Linux Operations

Sep 8, 2025 · Big Data

Build Enterprise‑Grade HDFS HA and Optimize YARN Scheduling from Scratch

This comprehensive guide walks you through constructing a fault‑tolerant HDFS high‑availability architecture, configuring dual NameNodes with ZooKeeper and JournalNode clusters, fine‑tuning YARN resource schedulers, implementing monitoring, automated failover testing, and performance optimization, all backed by real‑world production experiences and code examples.

Big Data OperationsHDFSYARN

0 likes · 24 min read

Build Enterprise‑Grade HDFS HA and Optimize YARN Scheduling from Scratch

AI Algorithm Path

Jul 26, 2025 · Artificial Intelligence

Qwen3-Coder: Alibaba’s 480‑Billion‑Parameter Open‑Source Code Model Takes on Claude 4

Alibaba’s Qwen team has released Qwen3-Coder, a 480‑billion‑parameter open‑source LLM specialized for code, featuring a 1‑million‑token context via YaRN, extensive benchmark superiority over most open models, and performance that rivals Claude 4 Sonnet while remaining fully accessible.

APILarge Language ModelQwen3-Coder

0 likes · 12 min read

Qwen3-Coder: Alibaba’s 480‑Billion‑Parameter Open‑Source Code Model Takes on Claude 4

DataFunTalk

Jul 23, 2025 · Artificial Intelligence

Qwen3‑Coder: Open‑Source AI Programming Agent That Beats the Competition

Alibaba’s Tongyi team unveiled the open‑source Qwen3‑Coder, a massive 450‑billion‑parameter programming model that outperforms leading closed‑source solutions, supports up to 1 M token context, offers a free CLI tool, and demonstrates impressive code generation capabilities across animations, games, and real‑world tasks.

AI programmingLarge Language ModelOpen Source

0 likes · 5 min read

Qwen3‑Coder: Open‑Source AI Programming Agent That Beats the Competition

Big Data Tech Team

Jun 8, 2025 · Big Data

Master Hadoop: A Step-by-Step Learning Roadmap for Big Data Professionals

This guide outlines a comprehensive Hadoop learning roadmap, covering essential prerequisites, core concepts such as HDFS, MapReduce, and YARN, hands‑on projects, advanced ecosystem tools like Hive, Pig, HBase and Spark, plus curated resources and community channels for aspiring big‑data engineers.

Distributed computingHDFSHadoop

0 likes · 7 min read

Master Hadoop: A Step-by-Step Learning Roadmap for Big Data Professionals

iQIYI Technical Product Team

May 15, 2025 · Big Data

Introducing AMD and ARM Bare‑Metal Instances for iQIYI Big Data Computing: Cloud Selection, Performance Evaluation, and Heterogeneous Scheduling

To reduce costs and boost compute density, iQIYI's big data team migrated from aging private‑cloud Intel servers to public‑cloud AMD and ARM bare‑metal instances, establishing a systematic machine‑selection process, performance testing framework, and YARN‑based heterogeneous scheduling to fully leverage the new hardware.

AMDARMYARN

0 likes · 16 min read

Introducing AMD and ARM Bare‑Metal Instances for iQIYI Big Data Computing: Cloud Selection, Performance Evaluation, and Heterogeneous Scheduling

Rare Earth Juejin Tech Community

Feb 21, 2025 · Frontend Development

Understanding pnpm: Solving Dependency Management Issues in Modern Frontend Development

This article explains the evolution of JavaScript package managers, the shortcomings of npm and Yarn such as duplicated installations, phantom dependencies and unpredictable dependency trees, and demonstrates how pnpm’s content‑addressable store, hard‑link and symlink strategy provides faster installs, reduced disk usage, and more reliable dependency isolation for frontend projects.

YARNdependency managementfrontend development

0 likes · 22 min read

Understanding pnpm: Solving Dependency Management Issues in Modern Frontend Development

360 Zhihui Cloud Developer

Feb 17, 2025 · Cloud Native

Optimizing Offline Pod Scheduling with Koordinator and Yarn-Operator

To reduce resource contention and improve offline task reliability, this article examines the challenges of using Koordinator with Hadoop Yarn pods on Kubernetes, proposes real‑time resource reporting and task‑level eviction strategies, details community and custom solutions, and outlines future enhancements with Volcano.

Big DataCloud NativeKoordinator

0 likes · 9 min read

Optimizing Offline Pod Scheduling with Koordinator and Yarn-Operator

Rare Earth Juejin Tech Community

Jan 14, 2025 · Backend Development

Understanding npm, Yarn, and pnpm: Dependency Management, Flat Dependencies, and pnpm's Store Mechanism

This article examines the evolution of JavaScript package managers—from npm's nested node_modules structure to Yarn's flat dependencies and finally pnpm's global store with hard‑ and soft‑link mechanisms—highlighting how each approach addresses path length, disk‑space waste, installation speed, and ghost‑dependency issues.

Hard LinkYARNdependency management

0 likes · 8 min read

Understanding npm, Yarn, and pnpm: Dependency Management, Flat Dependencies, and pnpm's Store Mechanism

Full-Stack Cultivation Path

Dec 6, 2024 · Frontend Development

Corepack: The Next‑Generation Node.js Package Manager

The article reviews the evolution of JavaScript package managers, compares npm, Yarn, and pnpm, introduces Corepack as Node.js 16.9.0's experimental tool for consistent manager versions, explains its features and usage steps, and discusses remaining challenges such as version conflicts and limited advanced capabilities.

CorepackNode.jsYARN

0 likes · 8 min read

Corepack: The Next‑Generation Node.js Package Manager

360 Smart Cloud

Jul 9, 2024 · Big Data

Understanding Shuffle in Spark: From Native Shuffle to External and Remote Shuffle Services (Uniffle)

This article examines the critical role of shuffle in big‑data processing, compares Spark's native shuffle with the External Shuffle Service (ESS) and Remote Shuffle Service (RSS) solutions, introduces Uniffle's architecture and configuration, and shares practical deployment experiences and performance results within a 360 internal environment.

Big DataExternal Shuffle ServiceRemote Shuffle Service

0 likes · 15 min read

Understanding Shuffle in Spark: From Native Shuffle to External and Remote Shuffle Services (Uniffle)

Goodme Frontend Team

May 6, 2024 · Frontend Development

npm vs Yarn vs pnpm: Which JavaScript Package Manager Wins in Speed and Space?

This article traces the evolution of JavaScript package managers—from early manual inclusion methods to npm, Yarn, and pnpm—detailing their architectures, performance characteristics, version‑locking mechanisms, and trade‑offs, helping developers choose the most suitable tool for modern frontend projects.

Node.jsYARNfrontend development

0 likes · 12 min read

npm vs Yarn vs pnpm: Which JavaScript Package Manager Wins in Speed and Space?

Efficient Ops

Apr 23, 2024 · Big Data

How to Plan, Configure, and Launch a Hadoop 3.3.5 Cluster on Three Nodes

This guide walks through planning a three‑node Hadoop 3.3.5 cluster, explains default and custom configuration files, details core‑site, hdfs‑site, yarn‑site, and mapred‑site settings, shows how to distribute configs, start HDFS and YARN, and perform basic file‑system tests.

Big DataCluster SetupHDFS

0 likes · 11 min read

How to Plan, Configure, and Launch a Hadoop 3.3.5 Cluster on Three Nodes

Open Source Linux

Mar 11, 2024 · Big Data

Step‑by‑Step Guide to Deploying Flink on Standalone, Yarn, and Kubernetes

This tutorial explains how to install and configure Apache Flink in three deployment modes—Standalone, Hadoop YARN, and Kubernetes—covering node preparation, configuration files, package distribution, job submission, and monitoring through the Flink Web UI, with full command‑line examples and code snippets.

Big DataFlinkKubernetes

0 likes · 12 min read

Step‑by‑Step Guide to Deploying Flink on Standalone, Yarn, and Kubernetes

Huolala Tech

Jan 4, 2024 · Big Data

How HuoLala Cut Costs by Switching Big Data Workloads to ARM CPUs

This article details HuoLala's exploration of replacing x86 compute nodes with ARM servers in its big‑data platform, covering performance benchmarks, component adaptations for YARN, Tez/MR, security tools, a critical JDK de‑optimization issue, and the resulting production outcomes and future roadmap.

ARMBig DataJDK

0 likes · 14 min read

How HuoLala Cut Costs by Switching Big Data Workloads to ARM CPUs

Liangxu Linux

Dec 19, 2023 · Frontend Development

npm vs pnpm vs yarn: Which JavaScript Package Manager Should You Use?

This article compares npm, pnpm, and yarn—detailing their features, installation commands, speed, disk usage, concurrency, and stability—to help developers choose the most suitable JavaScript package manager for their projects.

JavaScriptYARNfrontend development

0 likes · 6 min read

npm vs pnpm vs yarn: Which JavaScript Package Manager Should You Use?

Alibaba Cloud Native

Nov 24, 2023 · Cloud Native

How Koordinator Boosts CPU Utilization and Cuts Costs in Large‑Scale Mixed Workloads

Koordinator, an open‑source cloud‑native mixed‑workload scheduler born from Alibaba’s internal container orchestration experience, enables Xiaohongshu to reclaim idle resources, improve CPU utilization beyond 45%, reduce resource costs by millions of core‑hours, and seamlessly integrate Kubernetes with YARN for batch and AI workloads.

Cloud NativeResource OptimizationYARN

0 likes · 18 min read

How Koordinator Boosts CPU Utilization and Cuts Costs in Large‑Scale Mixed Workloads

iQIYI Technical Product Team

Nov 17, 2023 · Big Data

Mixed Workload Co-location of Big Data and Online Services at iQIYI: Design, Implementation, and Results

iQIYI’s mixed‑workload system colocates Spark/Hive big‑data jobs with online video services by running YARN NodeManagers inside Kubernetes, using an Elastic YARN Operator, Koordinator‑driven CPU oversubscription, and remote shuffle, boosting online CPU utilization from ~9 % to over 40 % and saving tens of millions of RMB annually.

Big DataCloud NativeKubernetes

0 likes · 19 min read

Mixed Workload Co-location of Big Data and Online Services at iQIYI: Design, Implementation, and Results

DevOps

Jun 7, 2023 · Big Data

Deploying Apache Spark on YARN vs Kubernetes: Architecture, Benefits, and Comparison

This article explains how Apache Spark can be deployed using the traditional Hadoop YARN resource manager and the newer Kubernetes approach, detailing configuration steps, submission methods, and a comprehensive comparison of isolation, scalability, learning curve, logging, performance, and cost considerations.

Big DataKubernetesSpark

0 likes · 10 min read

Deploying Apache Spark on YARN vs Kubernetes: Architecture, Benefits, and Comparison

High Availability Architecture

May 26, 2023 · Big Data

Amiya: Dynamic Overcommit Component for Bilibili Offline Big Data Cluster Resource Scheduling

This article introduces Amiya, a self‑developed overcommit component that dynamically increases Yarn memory and vCore capacity on Bilibili's offline big‑data clusters, details its architecture, key implementation of overcommit, eviction and mixed‑deployment strategies, and evaluates its resource‑utilization impact.

Cluster ManagementOvercommitResource Optimization

0 likes · 22 min read

Amiya: Dynamic Overcommit Component for Bilibili Offline Big Data Cluster Resource Scheduling

Bilibili Tech

May 23, 2023 · Big Data

Amiya: Dynamic Overcommit Component for Bilibili Offline Big Data Cluster

Amiya, a self‑developed dynamic over‑commit component for Bilibili’s offline big‑data cluster, inflates reported resources on under‑utilized nodes and adjusts them when load rises, adding roughly 683 TB of memory and 137 k vCores, boosting per‑node memory by 15 % and CPU usage by over 20 % while keeping eviction rates below 3 %.

AmiyaBilibiliCluster Management

0 likes · 22 min read

Amiya: Dynamic Overcommit Component for Bilibili Offline Big Data Cluster

WeiLi Technology Team

May 6, 2023 · Big Data

How We Upgraded Our Flink Cluster from 1.10 to 1.14.6 and Overcame Common Pitfalls

This article details the background of a Flink 1.10 cluster on Huawei Cloud, the technical challenges that prompted an upgrade, a step‑by‑step migration plan to Flink 1.14.6, troubleshooting of frequent errors, precautionary measures, and the performance and operational benefits achieved after the upgrade.

Big DataCDCFlink

0 likes · 19 min read

How We Upgraded Our Flink Cluster from 1.10 to 1.14.6 and Overcame Common Pitfalls

Rare Earth Juejin Tech Community

May 6, 2023 · Backend Development

Monorepo Overview, Evolution, Pros & Cons, Pitfalls, and Tool Selection

This article explains what a monorepo is, traces its evolution from single‑repo monoliths to multi‑repo and back to a single repository with many modules, compares its advantages and disadvantages, lists common pitfalls, and evaluates major tooling options such as Turborepo, Rush, Nx, Lerna, Yarn and pnpm for different project sizes.

LernaNxYARN

0 likes · 21 min read

Monorepo Overview, Evolution, Pros & Cons, Pitfalls, and Tool Selection

政采云技术

Apr 18, 2023 · Big Data

Implementing Data Cost Governance: Quantifying Storage and Compute Expenses with Hive, Spark, and HDFS FsImage

This article explains how to perform task‑level data cost governance by collecting storage and compute metrics from Hive tables, Spark jobs, and HDFS FsImage files, then estimating monthly expenses using replication factors and resource‑usage rates, while providing practical SQL and shell examples.

Data Cost GovernanceHDFSHive

0 likes · 18 min read

Implementing Data Cost Governance: Quantifying Storage and Compute Expenses with Hive, Spark, and HDFS FsImage

ByteFE

Mar 6, 2023 · Frontend Development

Deep Dive into npm, Yarn, and pnpm Dependency Management

This article explains how npm, Yarn, and pnpm manage JavaScript dependencies, detailing installation processes, flat vs nested node_modules structures, lock files, and the hard-link mechanism that improves speed and saves disk space.

YARNdependency managementnpm

0 likes · 16 min read

Deep Dive into npm, Yarn, and pnpm Dependency Management

Big Data Technology & Architecture

Feb 24, 2023 · Big Data

Common Flink Task Submission Issues and Solutions on YARN

This article compiles frequent Flink job submission problems on YARN—including WordCount jar errors, HBase dependency conflicts, MySQL timeout, checkpoint restoration failures, parallelism limits, and unexpected container termination—provides root‑cause analysis and step‑by‑step remediation instructions.

Big DataCheckpointFlink

0 likes · 21 min read

Common Flink Task Submission Issues and Solutions on YARN

StarRing Big Data Open Lab

Feb 15, 2023 · Operations

How YARN and Kubernetes Solve Distributed Resource Management Challenges

This article explains how Apache YARN and Google Kubernetes address the three core problems of resource utilization, task responsiveness, and flexible scheduling in distributed environments, detailing their architectures, scheduling models, and practical implications for modern big‑data and cloud workloads.

KubernetesResource ManagementScheduling

0 likes · 8 min read

How YARN and Kubernetes Solve Distributed Resource Management Challenges

ByteFE

Nov 14, 2022 · Frontend Development

Evolution and Innovations of npm, Yarn, and pnpm Package Managers

This article examines the evolution of the three major JavaScript package managers—npm, Yarn, and pnpm—detailing their original designs, the problems they introduced such as nested node_modules, phantom dependencies and doppelgangers, and the innovative solutions like flattening, lock files, symbol/hard links, and PnP mode that each tool brought to improve dependency management.

YARNnode_modulesnpm

0 likes · 18 min read

Evolution and Innovations of npm, Yarn, and pnpm Package Managers

Open Source Linux

Nov 11, 2022 · Big Data

Deploy Hadoop on Kubernetes with Helm: A Complete Step‑by‑Step Guide

This guide walks through deploying Hadoop 3.x on a Kubernetes cluster using Helm, covering repository addition, Docker image creation, Helm chart configuration, service adjustments, installation, verification commands, and clean uninstallation, complete with code snippets and screenshots.

Big DataDockerHadoop

0 likes · 14 min read

Deploy Hadoop on Kubernetes with Helm: A Complete Step‑by‑Step Guide

ITPUB

Oct 21, 2022 · Big Data

Hadoop Explained: Architecture, Core Components, and Real-World Applications

This article provides a comprehensive overview of Hadoop, covering its historical development, key characteristics, the HDFS storage framework, the MapReduce processing engine, YARN resource manager, and a wide range of real-world application scenarios, as well as the broader Hadoop ecosystem and its major components.

Big DataDistributed computingEcosystem

0 likes · 20 min read

Hadoop Explained: Architecture, Core Components, and Real-World Applications

Python Crawling & Data Mining

Oct 16, 2022 · Big Data

What Makes Hadoop the Backbone of Modern Big Data Processing?

This article provides a comprehensive overview of Hadoop, covering its history, core features, the HDFS storage framework, MapReduce computation engine, YARN resource manager, real‑world application scenarios, and the surrounding ecosystem of tools such as Hive, Spark and Kafka.

Distributed computingHDFSHadoop

0 likes · 20 min read

What Makes Hadoop the Backbone of Modern Big Data Processing?

MaGe Linux Operations

Sep 26, 2022 · Big Data

Deploy Hadoop on Kubernetes with Helm: A Complete Step‑by‑Step Guide

This tutorial walks you through deploying Hadoop 3.x on a Kubernetes cluster using Helm, covering repository setup, Docker image creation, Helm chart customization, service configuration, installation, verification, and clean‑up, with all necessary commands and YAML snippets.

Big DataDockerHadoop

0 likes · 14 min read

DataFunSummit

Sep 25, 2022 · Big Data

Practical Optimizations and Resource Management of Hadoop YARN at Xiaomi

This article shares Xiaomi's internal practices of Hadoop YARN, covering scheduling and resource optimization, elastic scheduling, node overcommit handling, federation architecture, metadata warehouse construction, and future plans to improve cluster utilization and cost efficiency.

Big DataHadoopYARN

0 likes · 20 min read

Practical Optimizations and Resource Management of Hadoop YARN at Xiaomi

Bilibili Tech

Jul 5, 2022 · Big Data

Multi‑Datacenter Architecture for Offline Big Data Processing at Bilibili

To overcome rapid data growth and on‑premise capacity limits, Bilibili adopted a scale‑out, unit‑based multi‑datacenter architecture that isolates failures, intelligently places jobs, replicates data via an enhanced DistCp service, routes reads with an IP‑aware HDFS router, and throttles cross‑site traffic, enabling stable offline big‑data processing of hundreds of petabytes while preserving throughput.

HDFSYARNbandwidth optimization

0 likes · 28 min read

Multi‑Datacenter Architecture for Offline Big Data Processing at Bilibili

DataFunSummit

Jul 1, 2022 · Big Data

Exploring and Implementing Elastic Scheduling for Xiaomi Hadoop YARN

Shilong Fei from Xiaomi Data Platform presents an in‑depth exploration of elastic scheduling for Hadoop YARN, covering background, design of resource pools, auto‑scaling architecture, challenges such as job stability and user transparency, achieved cost reductions, and future plans for further optimization.

Auto ScalingBig DataHadoop

0 likes · 20 min read

Exploring and Implementing Elastic Scheduling for Xiaomi Hadoop YARN

DataFunTalk

Jun 12, 2022 · Big Data

Huya Offline Job Scheduling System: Design, Baseline Scheduling, and Cost Optimization

This article introduces Huya's offline job scheduling platform, covering its positioning, evolution, system architecture, baseline scheduling techniques, cost‑optimization strategies, resource‑balancing methods, and future intelligent data‑warehouse directions, illustrating how data‑driven automation improves YARN utilization and SLA compliance.

DAGYARNbaseline scheduling

0 likes · 12 min read

Huya Offline Job Scheduling System: Design, Baseline Scheduling, and Cost Optimization

DataFunTalk

May 21, 2022 · Big Data

Exploring and Implementing Elastic Scheduling for Xiaomi Hadoop YARN

This talk presents Xiaomi's design and deployment of an elastic scheduling system for Hadoop YARN, covering background analysis, resource‑pool strategy, auto‑scaling architecture, stability challenges, label‑based resource isolation, Spark shuffle handling, cost‑saving results and future plans.

AutoscalingBig DataHadoop

0 likes · 16 min read

DataFunSummit

May 4, 2022 · Big Data

NetEase Big Data Platform: HDFS Optimization and Practices

NetEase’s senior big‑data engineer shares how the company’s large‑scale data platform leverages Hadoop, HDFS, YARN and related technologies, detailing multi‑layer architecture, cross‑cloud deployment, storage optimizations, NameNode performance enhancements, RPC prioritization, and practical lessons from operating petabyte‑scale clusters.

Cluster OptimizationHDFSPerformance tuning

0 likes · 23 min read

NetEase Big Data Platform: HDFS Optimization and Practices

Big Data Technology & Architecture

Apr 14, 2022 · Big Data

Practical Guide to Monitoring Flink Performance, Detecting Backpressure, and Configuring Alerts

This article explains how to use Flink's Web UI, Kafka metrics, and YARN monitoring to observe performance, diagnose backpressure, and set alert thresholds, providing code examples and practical tips for reliable stream processing in production environments.

Big DataFlinkKafka

0 likes · 9 min read

Practical Guide to Monitoring Flink Performance, Detecting Backpressure, and Configuring Alerts

DataFunTalk

Mar 30, 2022 · Big Data

NetEase Big Data Platform: HDFS Optimization and Practice

This article presents NetEase's big data platform architecture, detailing multi‑layer storage and compute design, HDFS deployment challenges, NameNode and NameSpace performance optimizations, cluster scaling strategies, data tiering, hardware upgrades, and real‑world business use cases, illustrating practical large‑scale big data engineering.

Big DataCluster OptimizationData Management

0 likes · 23 min read

NetEase Big Data Platform: HDFS Optimization and Practice

Bilibili Tech

Mar 25, 2022 · Big Data

Bilibili's YARN Scheduling Optimization Practice: From Heartbeat-Driven to Global Scheduling

Bilibili transformed its YARN CapacityScheduler from a heartbeat‑driven design to a multi‑threaded global scheduler by separating lock handling, adopting Weighted Round‑Robin with DRF, adding batch node selection, fixing proposal inconsistencies, tuning GC and logging, and thereby reduced application allocation time by about 38 % on clusters of up to 8,000 nodes.

Big DataCapacitySchedulerHadoop

0 likes · 15 min read

Bilibili's YARN Scheduling Optimization Practice: From Heartbeat-Driven to Global Scheduling

DaTaobao Tech

Mar 23, 2022 · Frontend Development

Why npm, Yarn, pnpm and Deno Manage Dependencies Differently – A Deep Dive

This article analyses the evolution of front‑end package managers—from npm's early nested modules to Yarn's lockfile and Plug'n'Play, pnpm's hard‑link strategy, cnpm/tnpm adaptations, and Deno's URL‑based imports—highlighting their dependency resolution mechanisms, trade‑offs, and remaining challenges.

DenoFrontendYARN

0 likes · 19 min read

Why npm, Yarn, pnpm and Deno Manage Dependencies Differently – A Deep Dive

DataFunTalk

Mar 18, 2022 · Big Data

Scaling LinkedIn’s Hadoop YARN Cluster Beyond 10,000 Nodes: Challenges and Solutions

This article examines how LinkedIn tackled severe scheduling slowdowns when its Hadoop YARN cluster grew to nearly 10,000 nodes, analyzes the root causes of resource‑manager bottlenecks, and describes the fairness‑redefinition and scheduling‑logic patches that restored throughput and scalability.

Big DataHadoopResource Management

0 likes · 13 min read

Scaling LinkedIn’s Hadoop YARN Cluster Beyond 10,000 Nodes: Challenges and Solutions

Tencent IMWeb Frontend Team

Mar 14, 2022 · Fundamentals

Mastering Yarn Monorepo: A Step‑by‑Step Guide to Scalable SDK Development

This article walks through the evolution from simple SDK repositories to a full‑featured Yarn Berry monorepo, covering workspace setup, configuration, plugin integration, TypeScript settings, package scaffolding, release workflows, and practical tips for dependency management and linking.

JavaScriptTypeScriptWorkspace

0 likes · 11 min read

Mastering Yarn Monorepo: A Step‑by‑Step Guide to Scalable SDK Development

Taobao Frontend Technology

Mar 10, 2022 · Frontend Development

Why npm, Yarn, pnpm, and Deno Differ in Dependency Management – A Deep Dive

This article examines how npm, Yarn, pnpm, cnpm, tnpm and Deno handle dependency installation, version locking, flattening, and module resolution, highlighting the evolution from nested node_modules to lockfiles and Plug'n'Play, and discusses the trade‑offs of each approach.

YARNdependency managementfrontend-development

0 likes · 19 min read

Why npm, Yarn, pnpm, and Deno Differ in Dependency Management – A Deep Dive

Alibaba Terminal Technology

Mar 10, 2022 · Frontend Development

Why npm, Yarn, pnpm, cnpm, tnpm, and Deno Differ in Dependency Management – A Deep Dive

This article examines how npm, Yarn, pnpm, cnpm, tnpm, and Deno handle dependency installation, version locking, and node_modules structures, highlighting the evolution from nested to flattened layouts, the emergence of phantom and multiple dependencies, and the trade‑offs of each approach.

DenoYARNdependency management

0 likes · 21 min read

Why npm, Yarn, pnpm, cnpm, tnpm, and Deno Differ in Dependency Management – A Deep Dive

Tencent Cloud Developer

Feb 17, 2022 · Frontend Development

Exploring Monorepo Strategies and Practices for Front‑end Development

The article explains how adopting a monorepo—housing multiple independent front‑end packages in a single Git repository—simplifies code sharing, tooling, and documentation for Vue 3 component collections, compares it with monolith and multi‑repo approaches, outlines essential tools such as pnpm, Changesets, Turborepo, ESLint, and Vitepress, and provides step‑by‑step setup guidance, concluding that monorepos are effective for moderately sized front‑end projects despite potential scaling and permission challenges.

LernaYARNmonorepo

0 likes · 27 min read

Exploring Monorepo Strategies and Practices for Front‑end Development

IT Xianyu

Jan 28, 2022 · Big Data

Step-by-Step Guide to Installing and Configuring Hue on CentOS 7 with Hadoop, Hive, and YARN

This tutorial explains how to set up the Hue web UI on a CentOS 7 machine by installing required dependencies, compiling Hue, configuring HDFS, YARN and Hive integration files, starting Hive services, launching Hue, and accessing the interface, with all commands and configuration snippets provided.

Big DataCentOSHadoop

0 likes · 6 min read

Step-by-Step Guide to Installing and Configuring Hue on CentOS 7 with Hadoop, Hive, and YARN

Dada Group Technology

Jan 14, 2022 · Frontend Development

Optimizing Build and Dependency Installation for Dada's Large-Scale Frontend System

This article analyzes the slow build process of Dada's massive frontend platform, identifies bottlenecks in dependency installation and webpack compilation, and presents practical optimizations such as node_modules caching, cp command adjustments, Babel loader caching, and other webpack tweaks that reduced average build time from 600 seconds to around 100 seconds.

Build OptimizationCachingYARN

0 likes · 8 min read

Optimizing Build and Dependency Installation for Dada's Large-Scale Frontend System

TAL Education Technology

Jan 13, 2022 · Cloud Native

Offline Mixed Deployment with Kubernetes: Architecture, Implementation, and Performance Evaluation for Big Data Workloads

This article describes a cloud‑native offline mixed‑deployment solution that leverages Kubernetes to share resources between big‑data clusters and business services, outlines its implementation steps, presents detailed performance comparisons between Yarn and Kubernetes using TPC‑DS, Spark, and Terasort workloads, and discusses production experience and future plans.

Big DataCloud NativeKubernetes

0 likes · 8 min read

Offline Mixed Deployment with Kubernetes: Architecture, Implementation, and Performance Evaluation for Big Data Workloads

Practical DevOps Architecture

Jan 4, 2022 · Big Data

Step-by-Step Guide to Installing and Configuring Hadoop 2.9.2 Cluster on Three Nodes

This article provides a detailed, step-by-step tutorial for installing Hadoop 2.9.2, configuring environment variables, editing XML configuration files, formatting the NameNode, starting HDFS and YARN services, testing the cluster, and setting up the MapReduce history server on a three‑node Linux environment.

Big DataCluster SetupHadoop

0 likes · 9 min read

Step-by-Step Guide to Installing and Configuring Hadoop 2.9.2 Cluster on Three Nodes

ELab Team

Dec 31, 2021 · Fundamentals

Mastering Inodes, Hard & Soft Links: From Linux to Frontend Tooling

This article explains the fundamentals of inodes, sectors, and blocks, demonstrates how to retrieve file information with Node.js and Linux commands, compares hard and soft links, and shows practical applications of these links in frontend workflows such as yarn link and pnpm installation.

FilesystemFrontend toolingHard Link

0 likes · 14 min read

Mastering Inodes, Hard & Soft Links: From Linux to Frontend Tooling

DataFunTalk

Dec 27, 2021 · Big Data

Comprehensive Big Data Interview Q&A: Hadoop, Spark, Kafka, Hive, and Related Technologies

This article presents a detailed interview-style walkthrough covering Hadoop cluster setup, HDFS components, MapReduce workflow, YARN advantages, Spark fundamentals, Kafka replication, Hive table types, and related big‑data concepts, providing concise explanations and practical insights for data engineers.

Big DataHadoopHive

0 likes · 20 min read

Comprehensive Big Data Interview Q&A: Hadoop, Spark, Kafka, Hive, and Related Technologies

Tongcheng Travel Technology Center

Nov 2, 2021 · Big Data

Hadoop Cluster Cross-Data Center Migration Practice at Tongcheng Travel

This article details Tongcheng Travel’s month‑long, zero‑downtime migration of hundreds of petabytes of Hadoop HDFS and YARN clusters across data centers, describing the background, migration strategies, lessons learned, tool enhancements, and future plans to improve data locality, balance, and monitoring.

Big DataCluster MigrationData Center

0 likes · 16 min read

Hadoop Cluster Cross-Data Center Migration Practice at Tongcheng Travel

21CTO

Oct 14, 2021 · Big Data

How LinkedIn Scaled Hadoop to 11,000 Nodes and Solved YARN Delays

LinkedIn’s engineers detail how they repeatedly doubled their Hadoop cluster to over 11,000 nodes, tackled YARN scheduling delays caused by workload imbalances, and created the DynoYARN simulation tool to predict performance impacts of massive scaling.

Big DataDynoYARNHadoop

0 likes · 7 min read

How LinkedIn Scaled Hadoop to 11,000 Nodes and Solved YARN Delays

Big Data Technology Architecture

Sep 28, 2021 · Big Data

Integrating Apache Kyuubi with CDH 6 and Spark 3: Deployment, Configuration, and Performance Tuning

This guide explains how to deploy Apache Kyuubi on a CDH 6 cluster, replace HiveServer2 with Kyuubi, integrate Spark 3, apply necessary patches, configure environment and Spark settings, and optimize engine sharing for various workloads, providing complete code snippets and step‑by‑step instructions.

CDHHiveServer2Kyuubi

0 likes · 19 min read

Integrating Apache Kyuubi with CDH 6 and Spark 3: Deployment, Configuration, and Performance Tuning

Java Architect Essentials

Sep 21, 2021 · Big Data

Interview on Kuaishou's Billion‑Scale Big Data Architecture Evolution and Practices

The interview with Kuaishou senior architect Zhao Jianbo details the three‑phase evolution of its trillion‑scale big data platform, covering foundational Hadoop services, real‑time and OLAP extensions, deep customizations, Spring Festival Gala challenges, scheduling innovations, Hadoop usage, and the relationship between big data and cloud architectures.

Big DataFlinkHadoop

0 likes · 19 min read

Interview on Kuaishou's Billion‑Scale Big Data Architecture Evolution and Practices

Big Data Technology & Architecture

Sep 17, 2021 · Big Data

Key Reliability Mechanisms of HDFS, YARN Failover Strategies, and Hadoop Shuffle Process

This article explains HDFS reliability features such as replica policies, rack awareness, heartbeat, safe mode, checksums, trash, metadata protection and snapshots, then details YARN failover handling for ApplicationMaster, NodeManager and ResourceManager, and finally describes the Hadoop MapReduce shuffle workflow and tuning tips.

HDFSMapReduceReliability

0 likes · 13 min read

Key Reliability Mechanisms of HDFS, YARN Failover Strategies, and Hadoop Shuffle Process

ELab Team

Sep 3, 2021 · Frontend Development

Why Deleting node_modules and Reinstalling Works – Inside npm and Yarn’s Dependency Mechanics

This article explores common questions about npm and Yarn dependency management, explains the internal installation processes, lockfile roles, version resolution, and compares the two tools while offering practical tips for handling project dependencies in modern JavaScript development.

YARNdependency managementlockfile

0 likes · 18 min read

Why Deleting node_modules and Reinstalling Works – Inside npm and Yarn’s Dependency Mechanics

ByteFE

Aug 16, 2021 · Backend Development

Understanding yarn.lock: Why It Changes and How to Manage It

This article explains the purpose and structure of yarn.lock, why it may show unexpected diffs after dependency updates, and provides practical strategies—including using resolutions, frozen lockfiles, and preventive workflows—to keep package.json and yarn.lock in sync and avoid build issues.

YARNdependency-managementlockfile

0 likes · 12 min read

Understanding yarn.lock: Why It Changes and How to Manage It

The Dominant Programmer

Aug 2, 2021 · Big Data

How to Build a Beginner Hadoop Cluster on CentOS 7

This article introduces Apache Hadoop’s open‑source framework, explains its core components such as HDFS, MapReduce, ZooKeeper, HBase, Hive, Pig, Mahout, Sqoop, Flume, Chukwa, Oozie, Ambari and YARN, and outlines the steps to set up a beginner‑level Hadoop cluster on CentOS 7.

Big DataCentOS 7HBase

0 likes · 11 min read

How to Build a Beginner Hadoop Cluster on CentOS 7

Big Data Technology & Architecture

Jul 19, 2021 · Big Data

Understanding Hadoop: MapReduce, HDFS, YARN, and Core Big Data Concepts

This article provides a comprehensive overview of Hadoop’s core components—including MapReduce programming model, HDFS storage architecture, and YARN resource management—while discussing common challenges like data skew and small files, and offering learning resources for aspiring big‑data engineers.

Data SkewHDFSHadoop

0 likes · 9 min read

Understanding Hadoop: MapReduce, HDFS, YARN, and Core Big Data Concepts

Tencent Cloud Developer

Jun 21, 2021 · Industry Insights

How Hadoop YARN on Kubernetes Pods Supercharge Resource Utilization and Cut Costs

This article explains how Tencent Cloud EMR integrated Hadoop YARN with Kubernetes Pods to create a hybrid online‑offline deployment, implement elastic autoscaling and multi‑label resource allocation, and achieve several‑hundred‑percent improvements in CPU utilization while preserving cluster stability.

AutoscalingBig DataCloud Native

0 likes · 11 min read

How Hadoop YARN on Kubernetes Pods Supercharge Resource Utilization and Cut Costs

ELab Team

Jun 10, 2021 · Fundamentals

Why Your Monorepo Is Slowing Down and How pnpm & Rush Can Fix It

This article examines the scalability and reliability problems of a Yarn‑workspace based monorepo—such as command inconsistency, slow publishing, phantom dependencies, duplicate packages, and lockfile conflicts—and presents pnpm and Rush as comprehensive solutions with practical guidelines for package referencing and workspace protocols.

YARNdependency-issuesmonorepo

0 likes · 23 min read

Why Your Monorepo Is Slowing Down and How pnpm & Rush Can Fix It

Big Data Technology & Architecture

Jun 4, 2021 · Big Data

Comprehensive Spark Interview Questions and Answers

This article provides a detailed collection of Spark interview questions covering deployment modes, performance advantages over MapReduce, shuffle mechanisms, RDD characteristics, optimization techniques, resource management, and various practical aspects of Spark on YARN, Mesos, and Kubernetes.

InterviewOptimizationRDD

0 likes · 21 min read

Comprehensive Spark Interview Questions and Answers

58 Tech

May 28, 2021 · Big Data

Practical Upgrade Experience of Hadoop 3.2.1 in 58.com Data Platform: HDFS, YARN, and MR3

This article details the end‑to‑end upgrade of a 5000‑node Hadoop 2.6.0 cluster to Hadoop 3.2.1 at 58.com, covering HDFS migration, RBF and EC adoption, Yarn federation and rolling upgrades, MR3 integration, extensive compatibility testing, and operational lessons learned for large‑scale big‑data platforms.

Big DataCluster UpgradeHDFS

0 likes · 19 min read

Practical Upgrade Experience of Hadoop 3.2.1 in 58.com Data Platform: HDFS, YARN, and MR3

DataFunTalk

May 14, 2021 · Big Data

Real‑time Billion‑Scale Data Transmission and AI Pipeline Architecture at Bilibili

This article presents a technical deep‑dive into Bilibili’s evolution from offline to real‑time data processing, describing the challenges of timeliness, ETL, AI feature engineering, and the design of a Flink‑on‑YARN incremental pipeline that supports trillion‑scale message throughput and AI‑driven real‑time applications.

AIBig DataFlink

0 likes · 27 min read

Real‑time Billion‑Scale Data Transmission and AI Pipeline Architecture at Bilibili

Big Data Technology & Architecture

May 10, 2021 · Big Data

Understanding Flink TaskManager Memory Allocation on YARN (Per‑Job Mode)

This article explains how Flink on YARN allocates TaskManager memory, breaks down the JVM heap, network buffers, and Flink Managed Memory, and shows how to calculate each component using configuration parameters and source‑code analysis.

FlinkTaskManagerYARN

0 likes · 11 min read

Understanding Flink TaskManager Memory Allocation on YARN (Per‑Job Mode)

Practical DevOps Architecture

Apr 28, 2021 · Big Data

Step-by-Step Hadoop Environment Setup and Configuration on Three Linux Servers

This guide walks through preparing three Linux servers, installing JDK 1.8, configuring Hadoop core, HDFS, MapReduce, and YARN XML files, setting Java environment variables, formatting HDFS, and starting all services to access the Hadoop web UI.

Big DataHDFSHadoop

0 likes · 4 min read

Step-by-Step Hadoop Environment Setup and Configuration on Three Linux Servers

dbaplus Community

Mar 16, 2021 · Big Data

How Kuaishou Scales YARN to Tens of Thousands of Nodes with the Kwai Scheduler

This article explains how Kuaishou’s massive offline compute clusters—tens of thousands of machines processing hundreds of petabytes daily—are managed by a heavily customized YARN stack and the home‑grown Kwai Scheduler, detailing architecture, scheduler evolution, multi‑scenario optimizations, and future scaling plans.

Big DataCluster OptimizationKwai Scheduler

0 likes · 14 min read

How Kuaishou Scales YARN to Tens of Thousands of Nodes with the Kwai Scheduler

Taobao Frontend Technology

Mar 11, 2021 · Frontend Development

Mastering Monorepo: Boost Code Reuse and Collaboration in JavaScript Projects

This article explains the monorepo strategy, its advantages and drawbacks, and provides step‑by‑step guidance on setting up a project‑level monorepo using tools like Volta, Yarn workspaces, Lerna, scripty, and commitlint, helping developers streamline code reuse, dependency management, and version synchronization across multiple JavaScript packages.

LernaWorkspacesYARN

0 likes · 27 min read

Mastering Monorepo: Boost Code Reuse and Collaboration in JavaScript Projects

DataFunTalk

Mar 3, 2021 · Big Data

Kwai Scheduler: Scaling YARN for Ultra‑Large Clusters at Kuaishou

This article presents Kuaishou's large‑scale offline computing challenges and describes how the team customized YARN and built the Kwai scheduler to achieve multi‑threaded, pluggable resource scheduling for clusters of tens of thousands of nodes, supporting diverse workloads such as ETL, ad‑hoc queries, machine‑learning training, and real‑time Flink jobs.

Cluster OptimizationKwai SchedulerYARN

0 likes · 15 min read

Kwai Scheduler: Scaling YARN for Ultra‑Large Clusters at Kuaishou

ELab Team

Feb 9, 2021 · Frontend Development

Why Yarn Beats npm: Deep Dive into Its Architecture and Workflow

This article explores Yarn’s architecture and workflow, comparing it with npm, cnpm, and pnpm, detailing multi‑threaded installation, caching, dependency resolution, lockfile handling, and step‑by‑step processes from package fetching to linking, optimization, and common Q&A, illustrated with code snippets.

YARNdependency resolutionnpm

0 likes · 22 min read

Why Yarn Beats npm: Deep Dive into Its Architecture and Workflow

Full-Stack Internet Architecture

Jan 27, 2021 · Big Data

Introduction to Hadoop: Architecture, HDFS, MapReduce, and YARN Overview

This article provides a comprehensive overview of Hadoop, covering its origins, core components such as HDFS, MapReduce, and YARN, their architectures, data storage and processing mechanisms, fault‑tolerance features, scheduling strategies, and practical optimization techniques for large‑scale distributed computing.

Big DataDistributed computingHDFS

0 likes · 33 min read

Introduction to Hadoop: Architecture, HDFS, MapReduce, and YARN Overview

Big Data Technology & Architecture

Jan 22, 2021 · Big Data

Key New Features and Improvements in Hadoop 3.x

Hadoop 3.x upgrades the platform to JDK 1.8 and introduces a range of enhancements across common components, HDFS, YARN, and MapReduce, including erasure coding, multi‑NameNode high availability, cgroup‑based resource isolation, native map‑output collectors, and split client libraries, while also adding support for Azure and Aliyun distributed file systems.

HDFSHadoopMapReduce

0 likes · 7 min read

Key New Features and Improvements in Hadoop 3.x

Big Data Technology & Architecture

Jan 12, 2021 · Big Data

Hadoop Interview Questions and Topics – HDFS, MapReduce, YARN, and Optimization

This article compiles a comprehensive set of Hadoop interview questions covering HDFS write and read processes, architecture, fault‑tolerance, NameNode metadata management, MapReduce scheduling, combiner and partition roles, YARN scheduling strategies, and various optimization techniques for both MapReduce and HDFS.

HDFSHadoopInterview

0 likes · 5 min read

Hadoop Interview Questions and Topics – HDFS, MapReduce, YARN, and Optimization

New Oriental Technology

Jan 11, 2021 · Frontend Development

Understanding npm Dependency Management and Building a Publishable npm Package with Webpack

This article explains how npm flattens dependencies, compares npm, cnpm and yarn, outlines the evolution of package managers, details essential package.json fields, demonstrates using nrm, and provides a step‑by‑step guide to configure Webpack, create, build, and publish a custom npm package.

FrontendYARNdependency

0 likes · 11 min read

Understanding npm Dependency Management and Building a Publishable npm Package with Webpack

Big Data Technology & Architecture

Jan 5, 2021 · Big Data

Improving Spark Job Parallelism on YARN: Diagnosis, Configuration, and Performance Gains

This article details a real‑world investigation of Spark SQL job latency on a YARN cluster, explains how switching the scheduler to FAIR mode, creating resource pools, and consolidating small Parquet files dramatically reduced scheduler delay and cut execution time from over 100 seconds to under 20 seconds.

ParquetPerformance OptimizationScheduler

0 likes · 13 min read

Improving Spark Job Parallelism on YARN: Diagnosis, Configuration, and Performance Gains

Big Data Technology & Architecture

Dec 17, 2020 · Big Data

Running Flink on Kerberos-secured YARN: Authentication and Configuration Guide

This article explains why Kerberos is needed for Hadoop clusters, details the Kerberos authentication workflow, and provides step‑by‑step instructions for configuring Flink to run on a Kerberos‑protected YARN environment using delegation tokens or keytab files, along with proxy‑user settings.

Delegation TokenFlinkKerberos

0 likes · 12 min read

Running Flink on Kerberos-secured YARN: Authentication and Configuration Guide

Practical DevOps Architecture

Nov 27, 2020 · Big Data

Step-by-Step Guide to Install and Configure a Hadoop 2.8.2 Cluster

This tutorial provides a complete walkthrough for downloading Hadoop 2.8.2, setting up a three‑node master‑slave cluster, configuring core, HDFS, MapReduce and YARN settings, creating required directories, distributing the installation, starting the services, verifying the cluster status, and finally shutting it down.

Big DataCluster SetupHDFS

0 likes · 5 min read

Step-by-Step Guide to Install and Configure a Hadoop 2.8.2 Cluster

Tencent Cloud Developer

Nov 13, 2020 · Big Data

Apache Spark Core: Architecture, Components, and Execution Flow

Apache Spark Core is a high‑performance, fault‑tolerant engine that abstracts distributed computation through SparkContext, DAG and Task schedulers, supports in‑memory and disk storage, runs on various cluster managers (YARN, Kubernetes, etc.), and unifies batch, streaming, ML and graph processing via its rich ecosystem.

Apache SparkBig DataDAG scheduler

0 likes · 17 min read

Apache Spark Core: Architecture, Components, and Execution Flow

Big Data Technology & Architecture

Nov 6, 2020 · Big Data

Integrating Flink SQL with Apache Zeppelin: Installation, Configuration, and Usage

This guide explains how to set up Apache Zeppelin as an interactive notebook for Flink SQL, covering download, environment configuration, Zeppelin and Flink interpreter settings on YARN, Hive integration, and step‑by‑step testing of streaming SQL queries.

FlinkHiveSQL

0 likes · 11 min read

Integrating Flink SQL with Apache Zeppelin: Installation, Configuration, and Usage

Big Data Technology & Architecture

Aug 22, 2020 · Big Data

Integrating Kerberos with Spark on CDH: Configuration, Deployment, and Troubleshooting Guide

This guide explains how to prepare a CDH‑based Spark environment for Kerberos authentication, covering prerequisite knowledge, classpath adjustments, HBase configuration files, Spark‑Env settings, user permission grants, Spark‑Submit execution, and common troubleshooting steps.

Big DataCDHHBase

0 likes · 12 min read

Integrating Kerberos with Spark on CDH: Configuration, Deployment, and Troubleshooting Guide

Big Data Technology & Architecture

Aug 11, 2020 · Big Data

Consuming Kerberos‑Protected Kafka Data with Spark Streaming and Storing into Kudu

This guide demonstrates how to configure a Spark Streaming application running on YARN in cluster mode to securely consume Kerberos‑protected Kafka topics and write the processed data into Kudu tables, including necessary Java code, Kerberos keytab setup, Kafka client configuration, and spark‑submit commands.

Big DataJavaKafka

0 likes · 11 min read

Consuming Kerberos‑Protected Kafka Data with Spark Streaming and Storing into Kudu

Big Data Technology & Architecture

Jul 27, 2020 · Big Data

How to View Hadoop/YARN Application Logs via History Server and Yarn Commands

This guide explains how to retrieve Hadoop/YARN application logs using the History Server UI, Yarn command‑line tools, and direct HDFS log access, including commands for listing applications, fetching specific logs, and locating the remote log directory.

Big DataCLIHDFS

0 likes · 4 min read

How to View Hadoop/YARN Application Logs via History Server and Yarn Commands

DataFunTalk

Jul 5, 2020 · Big Data

ByteDance’s Optimizations to Hadoop YARN: Enhancing Utilization, Multi‑Load Scenarios, Stability, and Multi‑Region Active‑Active

This article describes ByteDance’s four‑year series of customizations to Hadoop YARN—covering utilization improvements, multi‑load scenario optimizations, stability enhancements, and multi‑region active‑active deployment—along with practical production experiences, architectural details, and future work directions.

ByteDanceCluster OptimizationHadoop

0 likes · 12 min read

ByteDance’s Optimizations to Hadoop YARN: Enhancing Utilization, Multi‑Load Scenarios, Stability, and Multi‑Region Active‑Active

Big Data Technology & Architecture

Jun 19, 2020 · Big Data

Comparison of Flink and Spark in Standalone and YARN Deployment Modes

This article compares Apache Flink and Apache Spark in both standalone and YARN deployment modes, detailing their architecture, job scheduling differences, and specific configurations such as Flink’s yarn‑cluster and yarn‑session modes versus Spark’s yarn‑client and yarn‑cluster modes.

Big DataFlinkSpark

0 likes · 4 min read

Comparison of Flink and Spark in Standalone and YARN Deployment Modes

Big Data Technology & Architecture

Jun 18, 2020 · Big Data

CPU Resource Isolation in YARN with Linux cgroups

This article introduces Linux cgroups, explains their CPU subsystem files and parameters, demonstrates how to create and configure cgroups, and details how YARN leverages cgroups for CPU resource isolation through configuration settings and code implementations, comparing soft and hard limit approaches.

HadoopLinuxYARN

0 likes · 10 min read

CPU Resource Isolation in YARN with Linux cgroups

Big Data Technology Architecture

Jun 4, 2020 · Big Data

58.com Big Data Offline Computing Platform: Architecture, Scaling, Optimization, and Cross‑Data‑Center Migration

This article presents a comprehensive case study of 58.com’s massive Hadoop‑based offline computing platform, detailing its architecture, scaling challenges, performance‑tuning measures, YARN and SparkSQL upgrades, and the systematic cross‑data‑center migration of thousands of nodes and petabytes of data.

Big DataData MigrationHadoop

0 likes · 23 min read

58.com Big Data Offline Computing Platform: Architecture, Scaling, Optimization, and Cross‑Data‑Center Migration

Big Data Technology Architecture

May 15, 2020 · Big Data

Performance Tuning of Hive on Spark in YARN Mode

This article explains how to optimize Hive on Spark running on YARN, covering YARN node resource configuration, Spark executor and driver memory settings, dynamic allocation, parallelism, and key Hive parameters to achieve superior performance compared to Hive on MapReduce.

Cluster ConfigurationHivePerformance tuning

0 likes · 11 min read

Performance Tuning of Hive on Spark in YARN Mode

Big Data Technology & Architecture

May 6, 2020 · Big Data

Step-by-Step Guide to Installing and Configuring a Hadoop Cluster on Three Virtual Machines

This article provides a comprehensive, hands‑on tutorial for preparing three VMs, installing JDK and Hadoop, configuring core‑site.xml, hdfs‑site.xml, mapred‑site.xml, yarn‑site.xml, setting environment variables, distributing the package, starting HDFS and YARN, and verifying the cluster via web UI and jps commands.

Big DataCluster SetupHDFS

0 likes · 14 min read

Step-by-Step Guide to Installing and Configuring a Hadoop Cluster on Three Virtual Machines

dbaplus Community

Apr 15, 2020 · Big Data

How Ctrip Scaled Hadoop Across Data Centers: Architecture and Lessons

This article details Ctrip's Hadoop evolution, the challenges of expanding across multiple data centers, the evaluation of multi‑cluster versus single‑cluster designs, and the concrete architectural changes, migration tools, bandwidth monitoring, and future plans that enabled a stable cross‑datacenter big‑data platform.

Big DataCross-DataCenterHDFS

0 likes · 19 min read

How Ctrip Scaled Hadoop Across Data Centers: Architecture and Lessons

DataFunTalk

Apr 9, 2020 · Big Data

Scaling and Optimizing 58.com’s Hadoop‑Based Offline Computing Platform: Architecture, Challenges, and Solutions

This article details how 58.com built a massive Hadoop‑based offline computing platform with over 4,000 servers and hundreds of petabytes of storage, addressing scaling, stability, GC, YARN scheduling, SparkSQL migration, storage operations, and a large‑scale cross‑datacenter migration.

Big DataData MigrationHadoop

0 likes · 24 min read

Scaling and Optimizing 58.com’s Hadoop‑Based Offline Computing Platform: Architecture, Challenges, and Solutions

Big Data Technology & Architecture

Apr 8, 2020 · Big Data

Common Apache Flink Exceptions and How to Resolve Them

This article enumerates typical Apache Flink deployment, job, and checkpoint errors—such as JDK version issues, resource shortages, task manager timeouts, and state migration problems—and provides practical troubleshooting steps and configuration tips to help engineers quickly diagnose and fix these failures.

Big DataCheckpointException

0 likes · 8 min read

Common Apache Flink Exceptions and How to Resolve Them

Big Data Technology & Architecture

Apr 8, 2020 · Big Data

Spark Job Execution Principles and Parameter Tuning for Hive on Spark

This article explains how Spark jobs run on YARN, describes the impact of stages, shuffle and task parallelism, and provides detailed recommendations for tuning Spark executor, memory, core, and parallelism settings to dramatically improve Hive‑on‑Spark TPCx‑BB benchmark performance on large datasets.

Big DataHiveParameter Tuning

0 likes · 12 min read

Spark Job Execution Principles and Parameter Tuning for Hive on Spark

Open Source Linux

Mar 12, 2020 · Big Data

Step-by-Step Guide to Build a Hadoop 2.9.2 Cluster on CentOS 7.5

This tutorial walks you through setting up a three‑node Hadoop 2.9.2 cluster on CentOS 7.5, covering environment preparation, password‑less SSH, user creation, JDK installation, Hadoop extraction, configuration file edits, directory setup, ownership changes, service startup, and verification via web UIs.

Big DataCentOSCluster Setup

0 likes · 13 min read

Step-by-Step Guide to Build a Hadoop 2.9.2 Cluster on CentOS 7.5

vivo Internet Technology

Mar 11, 2020 · Big Data

Understanding Spark Executor Memory Management and the Unified Memory Model

The article explains Spark’s executor memory layout under the UnifiedMemoryManager, detailing on‑heap and off‑heap divisions, the four memory regions, default fraction settings, how storage and execution memory share space, and provides heuristics and tuning tips for avoiding OOM and optimizing performance.

ExecutorPerformance tuningSpark

0 likes · 24 min read

Understanding Spark Executor Memory Management and the Unified Memory Model

Big Data Technology & Architecture

Mar 8, 2020 · Big Data

Hive on Spark Tuning Parameters and Best Practices

This article explains how to tune Hive on Spark by adjusting driver, executor, and Hive configuration parameters—including CPU cores, memory allocations, dynamic allocation, and join thresholds—to achieve optimal performance when running on YARN.

Big DataHivePerformance tuning

0 likes · 7 min read

Hive on Spark Tuning Parameters and Best Practices