Tagged articles
6 articles
Page 1 of 1
DataFunSummit
DataFunSummit
May 14, 2026 · Big Data

How Gravitino, Daft, and Lance Enable Secure, AI‑Driven Multimodal Lakehouse

The article examines the challenges of multimodal data in modern lakehouses and presents a three‑tool stack—Gravitino, Daft, and Lance—that provides unified metadata, distributed multimodal compute, and high‑performance storage, while detailing security governance, integration paths, and future directions.

DaftGravitinoLakehouse
0 likes · 11 min read
How Gravitino, Daft, and Lance Enable Secure, AI‑Driven Multimodal Lakehouse
DataFunSummit
DataFunSummit
May 11, 2026 · Artificial Intelligence

How Lance Powers Enterprise Multimodal AI Data Lakes

The article analyzes why 74% of AI projects fail due to feedback gaps and data silos, explains how the open‑source Lance format addresses these issues with unified multimodal storage, outlines a layered Lance‑on‑Ray architecture, and details three real‑world practices—implicit feedback loops, GPU‑accelerated self‑evolution, and semantic knowledge‑graph evolution—to boost R&D efficiency.

CAGRADaftGPU Indexing
0 likes · 13 min read
How Lance Powers Enterprise Multimodal AI Data Lakes
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 13, 2026 · Artificial Intelligence

How to Build a Scalable Multimodal Data Pipeline with Alibaba Cloud PAI and DataJuicer

This article details a step‑by‑step guide for constructing a high‑performance multimodal data pipeline—covering video segmentation, duration filtering, frame extraction, safety and aesthetic scoring, and caption generation—using Alibaba Cloud PAI, Paimon, DataJuicer, and distributed frameworks like Ray and Daft, with real‑world performance metrics.

AIAlibaba CloudDaft
0 likes · 30 min read
How to Build a Scalable Multimodal Data Pipeline with Alibaba Cloud PAI and DataJuicer
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 3, 2026 · Industry Insights

Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines

This article analyzes how the Daft‑Ray‑Lance stack tackles the challenges of multimodal AI workloads by offering a high‑performance Rust engine, adaptive back‑pressure, seamless Ray‑based distributed scheduling, and a storage format optimized for random access, vector indexing, and zero‑copy schema evolution, complete with benchmark comparisons and practical deployment guidance.

DaftLancePython
0 likes · 21 min read
Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines
ByteDance Data Platform
ByteDance Data Platform
Dec 23, 2025 · Artificial Intelligence

How Daft and Ray Supercharge Million‑Hour Video Processing for AI‑Powered Robotics

This article details a scalable, distributed pipeline that uses LAS AI Data Lake, Daft on Ray, and advanced video‑processing techniques—scene detection, splitting, frame sampling, filtering, and caption generation—to transform tens of millions of hours of robot‑captured video into high‑quality, searchable semantic data while dramatically boosting CPU and GPU utilization.

AI PipelineDaftDistributed computing
0 likes · 21 min read
How Daft and Ray Supercharge Million‑Hour Video Processing for AI‑Powered Robotics