Tagged articles
16 articles
Page 1 of 1
Big Data Technology Tribe
Big Data Technology Tribe
May 27, 2026 · Fundamentals

Understanding the Internals of Lance’s describe_indices() Method

The article walks through Lance’s describe_indices() workflow—from reading the manifest and caching index metadata, through optional filtering and grouping by logical index name, to building human‑readable index descriptions and highlighting differences from load_indices and index_statistics, while noting edge cases and limitations.

LancePythonRust
0 likes · 13 min read
Understanding the Internals of Lance’s describe_indices() Method
SuanNi
SuanNi
May 22, 2026 · Artificial Intelligence

All‑In‑One Image & Video: ByteDance’s Deployable Native Multimodal Model Lance

Lance, ByteDance’s newly open‑sourced 3‑billion‑parameter multimodal model, runs on a single 40 GB GPU, tops HuggingFace trend charts, and achieves leading scores on DPG Bench, GenEval, and video generation benchmarks while surpassing several state‑of‑the‑art single‑modal models.

AI researchByteDanceLance
0 likes · 3 min read
All‑In‑One Image & Video: ByteDance’s Deployable Native Multimodal Model Lance
DataFunSummit
DataFunSummit
May 14, 2026 · Big Data

How Gravitino, Daft, and Lance Enable Secure, AI‑Driven Multimodal Lakehouse

The article examines the challenges of multimodal data in modern lakehouses and presents a three‑tool stack—Gravitino, Daft, and Lance—that provides unified metadata, distributed multimodal compute, and high‑performance storage, while detailing security governance, integration paths, and future directions.

DaftGravitinoLakehouse
0 likes · 11 min read
How Gravitino, Daft, and Lance Enable Secure, AI‑Driven Multimodal Lakehouse
DataFunSummit
DataFunSummit
May 11, 2026 · Artificial Intelligence

How Lance Powers Enterprise Multimodal AI Data Lakes

The article analyzes why 74% of AI projects fail due to feedback gaps and data silos, explains how the open‑source Lance format addresses these issues with unified multimodal storage, outlines a layered Lance‑on‑Ray architecture, and details three real‑world practices—implicit feedback loops, GPU‑accelerated self‑evolution, and semantic knowledge‑graph evolution—to boost R&D efficiency.

CAGRADaftGPU Indexing
0 likes · 13 min read
How Lance Powers Enterprise Multimodal AI Data Lakes
DataFunSummit
DataFunSummit
May 10, 2026 · Big Data

How Lance File Format v2.2 Accelerates, Cuts Costs, and Governs Multimodal Data

Lance File Format v2.2 tackles the AI data explosion by delivering hundred‑fold random‑read performance, advanced two‑layer compression, zero‑cost schema evolution, Git‑style versioning, external blob handling, and a roadmap toward native media support and intelligent encoding, positioning it as a core infrastructure for large‑scale multimodal workloads.

File FormatIO performanceLance
0 likes · 14 min read
How Lance File Format v2.2 Accelerates, Cuts Costs, and Governs Multimodal Data
DataFunSummit
DataFunSummit
May 5, 2026 · Big Data

A New Data Lake Paradigm: Volcano Engine’s Multi‑Modal Data Lake Built on Lance

The article presents Volcano Engine’s AI‑focused data lake built on the Lance format, detailing why traditional lakes fall short for multimodal data, the engineering enhancements such as Binary Copy Compaction, Lance Insight, distributed vector indexing, JSON‑based tagging, Row‑ID shuffle optimization, and real‑world case studies that demonstrate significant performance and cost gains.

AIBinary Copy CompactionDistributed Vector Index
0 likes · 18 min read
A New Data Lake Paradigm: Volcano Engine’s Multi‑Modal Data Lake Built on Lance
DataFunSummit
DataFunSummit
May 2, 2026 · Cloud Native

GooseFS + Lance: Accelerating Vector Storage for the AI Era

The article explains how GooseFS integrates with the Lance vector format to overcome the IO bottlenecks of object storage, detailing native acceleration mechanisms such as namespace catalog services, event‑driven warm caching, automatic compaction, native transactions, and page‑level caching that together deliver up to three‑fold performance gains for AI workloads.

AICache AccelerationCloud Native
0 likes · 12 min read
GooseFS + Lance: Accelerating Vector Storage for the AI Era
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 3, 2026 · Industry Insights

Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines

This article analyzes how the Daft‑Ray‑Lance stack tackles the challenges of multimodal AI workloads by offering a high‑performance Rust engine, adaptive back‑pressure, seamless Ray‑based distributed scheduling, and a storage format optimized for random access, vector indexing, and zero‑copy schema evolution, complete with benchmark comparisons and practical deployment guidance.

DaftLancePython
0 likes · 21 min read
Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines
Big Data Technology Tribe
Big Data Technology Tribe
Mar 15, 2026 · Databases

How to Build Distributed Scalar Indexes with Lance and Ray

This guide explains the end‑to‑end workflow for constructing a distributed scalar index in Lance by orchestrating validation, fragment sharding, worker‑level indexing via Ray, and final metadata merging, complete with code snippets and detailed step‑by‑step instructions.

LancePythonRay
0 likes · 12 min read
How to Build Distributed Scalar Indexes with Lance and Ray
Big Data Technology Tribe
Big Data Technology Tribe
Feb 26, 2026 · Databases

How optimize_indices Improves Query Performance in Lance

The article explains the purpose and inner workings of Lance's optimize_indices function, detailing how it incorporates newly appended data into existing indexes, merges delta indexes, and manages partition adjustments to maintain fast vector and scalar query performance without full re‑training.

IVFLanceoptimize_indices
0 likes · 8 min read
How optimize_indices Improves Query Performance in Lance
Big Data Technology Tribe
Big Data Technology Tribe
Feb 25, 2026 · Databases

How Lance Implements MVCC Transactions with Optimistic Concurrency and Automatic Conflict Resolution

Lance uses Multi-Version Concurrency Control to provide ACID guarantees, creating immutable table versions on each commit and employing atomic storage primitives, rebase logic, and retry mechanisms to handle concurrent writes, conflict detection, and resolution across multiple transaction types.

Concurrency ControlDatabase InternalsLance
0 likes · 16 min read
How Lance Implements MVCC Transactions with Optimistic Concurrency and Automatic Conflict Resolution
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 5, 2025 · Artificial Intelligence

How DeepSeek Smallpond Powers AI Data Processing with Ray and DuckDB

This article introduces DeepSeek Smallpond, a lightweight yet high‑performance AI data‑processing engine built on Ray and DuckDB, explains its dual Dataframe and LogicalPlan APIs, showcases integration with Volcano Engine's AI Data Lake LAS, and provides practical code examples for distributed processing, multimodal storage, and RAG pipelines.

AI data processingDistributed computingDuckDB
0 likes · 18 min read
How DeepSeek Smallpond Powers AI Data Processing with Ray and DuckDB