Tag

data integration

1 views collected around this technical thread.

DevOps
DevOps
Jun 16, 2025 · Artificial Intelligence

Unlock AI’s Real‑World Power: 6 Must‑Have MCP Tools with Hands‑On Code

This article reviews six open‑source MCP servers—Bright Data, Graphiti, GitIngest, Terminal, Code Executor, and MindsDB—showing how each extends large language models with web scraping, long‑term memory, code navigation, command‑line control, sandboxed Python execution, and multi‑source data integration, complete with practical code examples.

AI toolsMCPcode execution
0 likes · 9 min read
Unlock AI’s Real‑World Power: 6 Must‑Have MCP Tools with Hands‑On Code
Java Captain
Java Captain
Jun 10, 2025 · Backend Development

Why Spring Batch? Real‑World Scenarios, Core Architecture and Hands‑On Guide

This article explains the necessity of batch processing, presents typical use cases such as daily interest calculation, e‑commerce order archiving, log analysis and medical data migration, then dives deep into Spring Batch's core components, provides step‑by‑step code examples, performance‑tuning tips, production‑grade fault‑tolerance, monitoring solutions and a comprehensive FAQ.

JavaSpring Batchbatch processing
0 likes · 20 min read
Why Spring Batch? Real‑World Scenarios, Core Architecture and Hands‑On Guide
DataFunSummit
DataFunSummit
Jun 2, 2025 · Artificial Intelligence

Enterprise Knowledge Brain Powered by Large Models and Knowledge Graphs

This article explains how the rapid development of large language models and knowledge graph technologies creates new opportunities for enterprise knowledge management, outlines the challenges of massive unstructured data, describes the architecture and core data flow of a corporate knowledge brain, and showcases key technologies and real‑world applications.

AI architectureLarge Modelsdata integration
0 likes · 13 min read
Enterprise Knowledge Brain Powered by Large Models and Knowledge Graphs
DataFunSummit
DataFunSummit
Apr 1, 2025 · Big Data

Understanding Flink CDC 3.3: Features, Improvements, and Future Plans

This article provides a comprehensive overview of Flink CDC 3.3, detailing its CDC fundamentals, new connectors, Transform module enhancements, asynchronous snapshot splitting, community adoption, and upcoming roadmap for broader ecosystem support and batch‑mode execution.

Big DataCDCChange Data Capture
0 likes · 15 min read
Understanding Flink CDC 3.3: Features, Improvements, and Future Plans
DataFunSummit
DataFunSummit
Feb 9, 2025 · Big Data

Modern Data Stack on Alibaba Cloud Using Flink CDC: Architecture, Features, and Use Cases

This article presents a comprehensive overview of Alibaba Cloud's modern data stack built on Flink CDC, detailing its core concepts, extended capabilities, typical application scenarios, performance optimizations, a live demo, and future development plans for large‑scale streaming data integration.

Alibaba CloudBig DataFlink CDC
0 likes · 13 min read
Modern Data Stack on Alibaba Cloud Using Flink CDC: Architecture, Features, and Use Cases
DataFunSummit
DataFunSummit
Dec 5, 2024 · Big Data

Ping An Financial Services' Big Data Platform Construction and Data Governance Practices

This article details Ping An Financial Services' journey in building a comprehensive big‑data platform, addressing fragmentation, low data timeliness, processing limits, and governance challenges through a four‑stage technical evolution, modular tool development, and a systematic data‑governance framework to support its digital transformation.

Big DataData EngineeringData Governance
0 likes · 16 min read
Ping An Financial Services' Big Data Platform Construction and Data Governance Practices
Bilibili Tech
Bilibili Tech
Nov 26, 2024 · Big Data

Bilibili’s Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practices

Bilibili migrated its massive user‑behavior, commercial AI training, and database synchronization pipelines from Hive and Kafka to an Iceberg‑based streaming‑batch architecture, using Flink and the Magnus optimizer to achieve minute‑level freshness, reduce CPU and memory usage by about 20‑22 %, save roughly 3.55 M CNY annually, and dramatically improve query latency and join performance.

Data LakeIcebergStreaming
0 likes · 20 min read
Bilibili’s Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practices
DataFunSummit
DataFunSummit
Nov 1, 2024 · Big Data

DataFun Summit Session Overview and E‑book Access Instructions

The article outlines how to obtain the DataFun Summit e‑book by following the public account instructions and provides concise English summaries of twelve technical sessions covering data lineage, integration, AI language models, multimodal content, game AI agents, lake‑warehouse governance, big‑data architecture, and cluster management.

AIBig DataData Governance
0 likes · 5 min read
DataFun Summit Session Overview and E‑book Access Instructions
DataFunSummit
DataFunSummit
Oct 27, 2024 · Artificial Intelligence

How Siemens Harnesses Generative AI to Build the Enterprise Knowledge Chatbot “XiaoYu”

This article describes Siemens' journey in applying generative AI and Retrieval‑Augmented Generation to create an internal knowledge chatbot, detailing the business challenges, technical architecture, data integration, multi‑modal capabilities, deployment outcomes, and strategic lessons for enterprise AI adoption.

AI chatbotEnterprise Knowledge ManagementRAG
0 likes · 21 min read
How Siemens Harnesses Generative AI to Build the Enterprise Knowledge Chatbot “XiaoYu”
macrozheng
macrozheng
Sep 27, 2024 · Big Data

Master DataX: Efficient Offline Data Sync for Heterogeneous Sources

This guide walks through the challenges of synchronizing massive datasets across heterogeneous databases, introduces Alibaba's open‑source DataX tool, explains its framework‑plugin architecture, and provides step‑by‑step instructions—including environment setup, installation, job configuration, and both full and incremental MySQL synchronization—complete with code examples and performance metrics.

Big DataDataXIncremental Sync
0 likes · 15 min read
Master DataX: Efficient Offline Data Sync for Heterogeneous Sources
DataFunTalk
DataFunTalk
Jul 10, 2024 · Big Data

Apache SeaTunnel: A Next‑Generation Data Integration Platform for ETL/ELT and OLAP

This article introduces Apache SeaTunnel, a modern data integration platform designed for the EtLT era, detailing its architecture, core connector APIs, checkpoint mechanism, model inference, multi‑table synchronization, the high‑performance SeaTunnel Zeta engine, OLAP use cases, community roadmap, and the commercial WhaleTunnel product.

Apache SeatunnelBig DataELT
0 likes · 22 min read
Apache SeaTunnel: A Next‑Generation Data Integration Platform for ETL/ELT and OLAP
DaTaobao Tech
DaTaobao Tech
Jul 8, 2024 · Big Data

ODPS (MaxCompute) SQL Basics, Data Integration and Hologres Import Guide

This guide provides a comprehensive, beginner‑to‑advanced reference for ODPS (MaxCompute) SQL, covering table creation, DDL/DML commands, query syntax, join hints, MySQL‑to‑ODPS synchronization, one‑click and custom imports into Hologres, and scheduling variables for automated data pipelines.

Big DataHologresODPS
0 likes · 37 min read
ODPS (MaxCompute) SQL Basics, Data Integration and Hologres Import Guide
DataFunSummit
DataFunSummit
Jun 14, 2024 · Big Data

JD Logistics One‑Stop Agile BI Solution: Architecture, Challenges, and Product Evolution

This article presents JD Logistics' one‑stop agile BI platform, detailing the complex data sources, rapid business demands, the UData solution architecture, performance and usability improvements, and future upgrade plans that together enable faster data integration, self‑service reporting, and enhanced decision‑making across the organization.

Agile AnalyticsBIBig Data
0 likes · 25 min read
JD Logistics One‑Stop Agile BI Solution: Architecture, Challenges, and Product Evolution
DataFunTalk
DataFunTalk
May 13, 2024 · Big Data

Data Integration Maturity Model: From ETL to EtLT

The article examines the evolution of data integration architectures—from traditional ETL through ELT to the emerging EtLT model—highlighting their advantages, disadvantages, industry trends, maturity stages, and practical guidance for enterprises and professionals navigating modern big‑data pipelines.

Big DataDataOpsELT
0 likes · 31 min read
Data Integration Maturity Model: From ETL to EtLT
DataFunTalk
DataFunTalk
May 8, 2024 · Big Data

Risk Control and Data Application in the Bulk Commodity Industry: Challenges, Solutions, and Core Capabilities

The article presents Ant Group's exploration of applying its data‑driven risk control and credit assessment capabilities to the traditional bulk commodity sector, detailing industry background, data pain points, core technical solutions, and the construction of a secure, explainable data‑model platform for digital transformation.

AIBig DataBulk Industry
0 likes · 13 min read
Risk Control and Data Application in the Bulk Commodity Industry: Challenges, Solutions, and Core Capabilities
DataFunSummit
DataFunSummit
Apr 7, 2024 · Big Data

Li Auto’s Flink on Kubernetes Data Integration Practice

This article presents Li Auto’s end‑to‑end data integration journey, detailing the evolution of its data platform, the challenges of heterogeneous sources, and how a unified Flink‑on‑K8s solution with cloud‑native architecture, operator management, monitoring, and checkpointing addresses batch‑stream convergence and future scalability.

Big DataKubernetesStreaming
0 likes · 12 min read
Li Auto’s Flink on Kubernetes Data Integration Practice
DataFunTalk
DataFunTalk
Mar 11, 2024 · Artificial Intelligence

Challenges and Future Directions for Knowledge Graph Construction in the Era of Large Models

The article examines the high construction cost and lack of unified standards in knowledge graphs, explains why large language models cannot fully solve core issues such as hallucination and multi‑hop reasoning, and argues that a new, unified semantic framework integrating large models is essential for future progress.

AIdata integrationgraph database
0 likes · 5 min read
Challenges and Future Directions for Knowledge Graph Construction in the Era of Large Models
DataFunTalk
DataFunTalk
Mar 1, 2024 · Big Data

Understanding Data Fabric and Data Virtualization: Concepts, Practices, and Real‑World Case Study

This article explains the fundamentals of Data Fabric and data virtualization, highlights the limitations of traditional centralized data warehouses, describes the three‑layer virtualization architecture, and presents a detailed securities‑industry case study that demonstrates cost, efficiency, and compliance benefits.

Big DataData VirtualizationLogical Data Warehouse
0 likes · 17 min read
Understanding Data Fabric and Data Virtualization: Concepts, Practices, and Real‑World Case Study
DataFunTalk
DataFunTalk
Feb 23, 2024 · Artificial Intelligence

Challenges and Opportunities in Applying Large‑Model AI to Healthcare

The article analyzes how large‑model medical AI is rapidly adopted yet struggles with implementation due to doctor shortages, behavioral resistance, data silos, safety regulations, and the need for strategic alignment, while contrasting the more supportive innovation ecosystem in the United States.

AI adoptionHealthcare InnovationLarge Models
0 likes · 6 min read
Challenges and Opportunities in Applying Large‑Model AI to Healthcare