Tagged articles

611 articles

Page 7 of 7

Nov 19, 2015 · Big Data

Beyond Hadoop: Modern Big Data Platforms and Technologies Explained

This article surveys the evolution of Hadoop and its ecosystem, explains core storage and processing concepts, and introduces contemporary big‑data technologies such as Spark, Flink, Kafka, Lambda architecture, NoSQL databases, and cloud‑native solutions, highlighting their roles and trade‑offs.

Big DataFlinkHadoop

0 likes · 17 min read

Beyond Hadoop: Modern Big Data Platforms and Technologies Explained

Art of Distributed System Architecture Design

Oct 29, 2015 · Big Data

TalkingData’s Journey to Building a Mobile Big Data Platform with Spark and YARN

This article recounts how TalkingData progressively introduced Spark into its Hadoop‑YARN based mobile big‑data platform, detailing early architectures, migration challenges, performance gains, the fully Spark‑centric redesign with Kafka and Spark Streaming, encountered pitfalls, and future plans for further optimization.

Data PlatformHadoopMachine Learning

0 likes · 16 min read

TalkingData’s Journey to Building a Mobile Big Data Platform with Spark and YARN

Architect

Oct 17, 2015 · Big Data

Designing an Agile Data Warehouse and Data Platform for Internet Companies

The article outlines the purposes, architecture, data ingestion, storage, analysis, sharing, application, real‑time processing, scheduling, monitoring, and best‑practice recommendations for building a fast, flexible, and reliable big‑data platform in the fast‑changing internet industry.

Big DataData WarehouseHadoop

0 likes · 12 min read

Designing an Agile Data Warehouse and Data Platform for Internet Companies

Efficient Ops

Oct 14, 2015 · Big Data

Spark vs Hadoop, Flink, HBase/Cassandra, Kafka & Tachyon: Expert Q&A

During a lively “Sit and Discuss” session, experts compared Spark and Hadoop, evaluated Flink against Spark, contrasted HBase with Cassandra, explained why Kafka (and sometimes Flink) is preferred for distributed messaging, and shared insights on Tachyon’s role in modern big‑data ecosystems.

FlinkHBaseHadoop

0 likes · 10 min read

Spark vs Hadoop, Flink, HBase/Cassandra, Kafka & Tachyon: Expert Q&A

Art of Distributed System Architecture Design

Oct 10, 2015 · Artificial Intelligence

Integrating Deep Learning with Apache Hadoop: Caffe-on-Spark on GPU‑Enhanced Clusters

This article describes how Yahoo integrated deep learning into its massive Hadoop ecosystem by adding GPU nodes, using YARN and Spark to run Caffe at scale, and presents performance results on AlexNet and GoogLeNet alongside open‑source contributions.

Big DataCaffeGPU

0 likes · 9 min read

Integrating Deep Learning with Apache Hadoop: Caffe-on-Spark on GPU‑Enhanced Clusters

Qunar Tech Salon

Aug 18, 2015 · Big Data

Overview of Spark Big Data Analytics Framework Components

Spark’s big‑data analytics ecosystem comprises core components such as the in‑memory RDD data structure, Streaming for real‑time processing, GraphX for graph analytics, MLlib for machine‑learning, Spark SQL for querying, the Tachyon file system, and SparkR, each enabling scalable, distributed computation.

Big DataGraphXMLlib

0 likes · 5 min read

Overview of Spark Big Data Analytics Framework Components

Suning Technology

Jul 29, 2015 · Big Data

Highlights from the 2015 Suning Big Data Meetup: Platforms, Spark, and Octopus

The 2015 Suning Big Data Meetup in Nanjing gathered industry experts and researchers to showcase Suning's data platform architecture, Intel's Spark advancements, ZTE's DAP system, and a unified Octopus programming model, emphasizing open knowledge sharing and practical big‑data solutions.

Octopus ModelSparkSuning

0 likes · 6 min read

Highlights from the 2015 Suning Big Data Meetup: Platforms, Spark, and Octopus

Art of Distributed System Architecture Design

Jun 1, 2015 · Big Data

Overview of Big Data Technologies and Architectures

This article provides a comprehensive overview of major big‑data platforms such as Hadoop, Spark, Flink, Kafka, and related ecosystem components, explaining their core concepts, storage models, processing frameworks, and architectural patterns for handling massive, distributed datasets.

HadoopKafkaNoSQL

0 likes · 18 min read

Overview of Big Data Technologies and Architectures

MaGe Linux Operations

Feb 3, 2015 · Big Data

Why Spark Beats Hadoop: Exploring RDDs, In‑Memory Computing, and Fault Tolerance

This article explains how Apache Spark improves on Hadoop MapReduce by keeping intermediate data in memory, introduces the core RDD abstraction, compares Spark’s transformations and actions with Hadoop, and shows how Spark can run on Standalone, YARN, and various programming languages such as Scala, Java, and Python.

Big DataJavaRDD

0 likes · 20 min read

Why Spark Beats Hadoop: Exploring RDDs, In‑Memory Computing, and Fault Tolerance

Baidu Tech Salon

Jan 13, 2015 · Big Data

Inside Spark 1.2: New APIs, In‑Memory Columnar Storage, and Baidu’s High‑Performance Shuffle

This article reviews Spark 1.2’s major enhancements—including the External Data Source API, column pruning, predicate pushdown, and in‑memory columnar storage—while also detailing Baidu’s large‑scale Spark deployments, its custom high‑performance Shuffle service, and the integration of Spark with the Tachyon memory file system.

BaiduBig DataExternal Data Source API

0 likes · 16 min read

Inside Spark 1.2: New APIs, In‑Memory Columnar Storage, and Baidu’s High‑Performance Shuffle

Qunar Tech Salon

Dec 4, 2014 · Big Data

Understanding Apache Spark: Architecture, Comparison with Hadoop, Features, and Use Cases

The article explains Apache Spark’s memory‑based distributed computing model, its advantages over Hadoop’s MapReduce, key features, fault tolerance, deployment modes, ecosystem components, and the scenarios where Spark is most effective for large‑scale data analytics.

Distributed computingHadoopSpark

0 likes · 7 min read

Understanding Apache Spark: Architecture, Comparison with Hadoop, Features, and Use Cases