Apache Flink Deployment with Pulsar Connector: Setup, Demos, and Best Practices

This guide shows how to deploy Apache Flink 1.17 in Docker, configure off‑heap memory, connect it to Pulsar via the 4.1.0‑1.17 connector, run example jobs that copy topics and perform windowed word‑count, and provides Maven dependencies, custom serialization tips, batching settings, and version‑specific best‑practice notes.

Apache FlinkDataStreamDocker deployment

0 likes · 20 min read

Apache Flink Deployment with Pulsar Connector: Setup, Demos, and Best Practices

ITPUB

Dec 14, 2023 · Big Data

How to Build a Python‑Hadoop Word Count on a Single‑Node Cluster

This step‑by‑step guide shows how to install and configure a single‑node Hadoop 3.2.0 environment on CentOS 7, set up Python 3.7, write MapReduce mapper and reducer scripts in Python, and run a word‑count job using Hadoop streaming, illustrating core Hadoop concepts and their relevance today.

HadoopMapReducePython

0 likes · 21 min read

How to Build a Python‑Hadoop Word Count on a Single‑Node Cluster

Big Data Technology & Architecture

Apr 2, 2019 · Big Data

Understanding Hadoop MapReduce: Programming Model, WordCount Example, and Job Execution Mechanism

The article explains Hadoop's MapReduce framework as both a programming model and execution engine, detailing its map and reduce phases, the WordCount example code, job startup components, data shuffling, partitioning, and how large‑scale distributed computations are orchestrated across a cluster.

Big DataDistributed computingHadoop

0 likes · 10 min read

Understanding Hadoop MapReduce: Programming Model, WordCount Example, and Job Execution Mechanism

Big Data Technology & Architecture

Jan 3, 2019 · Big Data

Deploying Apache Flink on YARN and Running Flink Jobs

This tutorial explains how to deploy Apache Flink on a Hadoop YARN cluster, covering both YARN session mode and direct job submission, and demonstrates running the built‑in WordCount example with command‑line options for input, output, and resource configuration.

Apache FlinkBig DataFlink Deployment

0 likes · 8 min read

Deploying Apache Flink on YARN and Running Flink Jobs

37 Interactive Technology Team

Jun 13, 2017 · Big Data

MapReduce Principles and Hadoop Execution Process with WordCount Example

The article explains MapReduce’s divide‑and‑conquer model and Hadoop’s execution pipeline—including map, partition, spill, merge, shuffle, and reduce phases—illustrated with a WordCount example that shows how mappers emit word‑1 pairs and reducers aggregate counts to produce final frequencies on HDFS.

Distributed computingHadoopMapReduce

0 likes · 7 min read

MapReduce Principles and Hadoop Execution Process with WordCount Example

Java High-Performance Architecture

Dec 13, 2016 · Big Data

What Is Apache Beam and How Does It Simplify Distributed Data Processing?

Apache Beam is an open‑source, unified programming model for distributed data processing that lets developers write pipelines once and run them on multiple execution engines such as Spark, Flink, or Dataflow, simplifying code reuse and easing migration between frameworks.

Apache BeamDistributed computingJava

0 likes · 5 min read

What Is Apache Beam and How Does It Simplify Distributed Data Processing?