Big Data 7 min read

LakeSoul Joins LF AI & Data Foundation as an Open‑Source Cloud‑Native Lakehouse Framework

LakeSoul, China's only open‑source lakehouse project, has been donated to the LF AI & Data Foundation, becoming its first lake‑warehouse framework and offering ACID‑guaranteed high‑concurrency upserts, a high‑performance Rust‑based I/O layer, real‑time data‑warehouse capabilities, and seamless AI/BI integration for modern big‑data applications.

DataFunSummit
DataFunSummit
DataFunSummit
LakeSoul Joins LF AI & Data Foundation as an Open‑Source Cloud‑Native Lakehouse Framework

Shuyuanling Technology announced its membership in the Linux Foundation AI & Data (LF AI & Data) Foundation, committing to contribute to global AI and big‑data open‑source innovation and ecosystem building.

Since its open‑source launch in December 2021, LakeSoul has attracted nearly 1,300 stars and over 290 forks on GitHub, reflecting strong community interest.

LakeSoul has been donated to the LF AI & Data Foundation and recognized by the Technical Advisory Committee as the foundation's first lake‑warehouse sandbox project, which will now be managed and governed by the Linux Foundation.

LakeSoul Open‑Source Lakehouse Framework Overview

LakeSoul is designed as a simple, high‑performance, cloud‑native data lakehouse that supports both BI and AI workloads.

Key Features

ACID‑compliant high‑concurrency updates via a two‑phase commit protocol and automatic conflict resolution.

Flexible upsert operations with range and hash partitioning, supporting row‑ and column‑level modifications and multi‑stream joins for real‑time machine‑learning scenarios.

High‑performance I/O layer implemented in Rust, optimizing asynchronous file reads/writes, upserts, and Merge‑on‑Read for object storage and HDFS.

Real‑time data‑warehouse capabilities: supports streaming and batch reads/writes, online CDC sync, DDL change propagation, Kafka ingestion, and incremental Flink changelog streams.

Integrated BI & AI support with native C, Java, and Python APIs, enabling unified data access for analytics and machine‑learning frameworks.

Performance Advantages

Public benchmarks show LakeSoul’s metadata concurrency and optimized I/O layer give it a clear edge in both write and read throughput, with upsert performance several times faster than comparable frameworks.

Typical Application Scenarios

1. Real‑time data middle‑platform: LakeSoul provides CDC, message‑queue ingestion, automatic table/topic discovery, and schema evolution to ingest heterogeneous upstream data into a unified lakehouse.

2. Real‑time machine‑learning sample generation: multi‑stream joins allow real‑time construction of training samples for recommendation systems and other AI models.

LakeSoul’s community‑driven development aims to foster an open‑source ecosystem in partnership with LF AI & Data, collaborating with other Linux Foundation projects and supporting AI frameworks such as PyTorch for direct lakehouse data access.

CEO Zhu Yadong expressed pride in joining LF AI & Data and donating LakeSoul, emphasizing the opportunity for China’s open‑source lakehouse technology to reach a global stage and inviting developers to contribute to the next‑generation lakehouse framework.

Real-timeBig DataAIData Warehouseopen sourcelakehouseLakeSoul
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.