Tagged articles

16 articles

Page 1 of 1

May 27, 2026 · Fundamentals

Understanding the Internals of Lance’s describe_indices() Method

The article walks through Lance’s describe_indices() workflow—from reading the manifest and caching index metadata, through optional filtering and grouping by logical index name, to building human‑readable index descriptions and highlighting differences from load_indices and index_statistics, while noting edge cases and limitations.

LancePythonRust

0 likes · 13 min read

Understanding the Internals of Lance’s describe_indices() Method

SuanNi

May 22, 2026 · Artificial Intelligence

All‑In‑One Image & Video: ByteDance’s Deployable Native Multimodal Model Lance

Lance, ByteDance’s newly open‑sourced 3‑billion‑parameter multimodal model, runs on a single 40 GB GPU, tops HuggingFace trend charts, and achieves leading scores on DPG Bench, GenEval, and video generation benchmarks while surpassing several state‑of‑the‑art single‑modal models.

AI researchByteDanceLance

0 likes · 3 min read

All‑In‑One Image & Video: ByteDance’s Deployable Native Multimodal Model Lance

DataFunSummit

May 14, 2026 · Big Data

How Gravitino, Daft, and Lance Enable Secure, AI‑Driven Multimodal Lakehouse

The article examines the challenges of multimodal data in modern lakehouses and presents a three‑tool stack—Gravitino, Daft, and Lance—that provides unified metadata, distributed multimodal compute, and high‑performance storage, while detailing security governance, integration paths, and future directions.

DaftGravitinoLakehouse

0 likes · 11 min read

How Gravitino, Daft, and Lance Enable Secure, AI‑Driven Multimodal Lakehouse

DataFunSummit

May 11, 2026 · Artificial Intelligence

How Lance Powers Enterprise Multimodal AI Data Lakes

The article analyzes why 74% of AI projects fail due to feedback gaps and data silos, explains how the open‑source Lance format addresses these issues with unified multimodal storage, outlines a layered Lance‑on‑Ray architecture, and details three real‑world practices—implicit feedback loops, GPU‑accelerated self‑evolution, and semantic knowledge‑graph evolution—to boost R&D efficiency.

CAGRADaftGPU Indexing

0 likes · 13 min read

How Lance Powers Enterprise Multimodal AI Data Lakes

DataFunSummit

May 10, 2026 · Big Data

How Lance File Format v2.2 Accelerates, Cuts Costs, and Governs Multimodal Data

Lance File Format v2.2 tackles the AI data explosion by delivering hundred‑fold random‑read performance, advanced two‑layer compression, zero‑cost schema evolution, Git‑style versioning, external blob handling, and a roadmap toward native media support and intelligent encoding, positioning it as a core infrastructure for large‑scale multimodal workloads.

File FormatIO performanceLance

0 likes · 14 min read

How Lance File Format v2.2 Accelerates, Cuts Costs, and Governs Multimodal Data

DataFunSummit

May 5, 2026 · Big Data

A New Data Lake Paradigm: Volcano Engine’s Multi‑Modal Data Lake Built on Lance

The article presents Volcano Engine’s AI‑focused data lake built on the Lance format, detailing why traditional lakes fall short for multimodal data, the engineering enhancements such as Binary Copy Compaction, Lance Insight, distributed vector indexing, JSON‑based tagging, Row‑ID shuffle optimization, and real‑world case studies that demonstrate significant performance and cost gains.

AIBinary Copy CompactionDistributed Vector Index

0 likes · 18 min read

A New Data Lake Paradigm: Volcano Engine’s Multi‑Modal Data Lake Built on Lance

DataFunSummit

May 2, 2026 · Cloud Native

GooseFS + Lance: Accelerating Vector Storage for the AI Era

The article explains how GooseFS integrates with the Lance vector format to overcome the IO bottlenecks of object storage, detailing native acceleration mechanisms such as namespace catalog services, event‑driven warm caching, automatic compaction, native transactions, and page‑level caching that together deliver up to three‑fold performance gains for AI workloads.

AICache AccelerationCloud Native

0 likes · 12 min read

GooseFS + Lance: Accelerating Vector Storage for the AI Era

Big Data Technology Tribe

Apr 21, 2026 · Databases

How Lance Implements Merge‑Insert: Upserts, Deletes, and Deduplication Explained

This article explains the merge‑insert operation in Lance, detailing its SQL‑like semantics, typical use‑cases such as bulk upserts and conditional deletes, the underlying DataFusion planning and execution flow, the generation of the __action column, and the handling of source‑side duplicate rows.

DataFusionDeduplicationLance

0 likes · 7 min read

How Lance Implements Merge‑Insert: Upserts, Deletes, and Deduplication Explained

Big Data Technology & Architecture

Apr 3, 2026 · Industry Insights

Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines

This article analyzes how the Daft‑Ray‑Lance stack tackles the challenges of multimodal AI workloads by offering a high‑performance Rust engine, adaptive back‑pressure, seamless Ray‑based distributed scheduling, and a storage format optimized for random access, vector indexing, and zero‑copy schema evolution, complete with benchmark comparisons and practical deployment guidance.

DaftLancePython

0 likes · 21 min read

Why Daft, Ray, and Lance Are Redefining Multimodal Data Pipelines

Big Data Technology Tribe

Mar 15, 2026 · Databases

How to Build Distributed Scalar Indexes with Lance and Ray

This guide explains the end‑to‑end workflow for constructing a distributed scalar index in Lance by orchestrating validation, fragment sharding, worker‑level indexing via Ray, and final metadata merging, complete with code snippets and detailed step‑by‑step instructions.

LancePythonRay

0 likes · 12 min read

How to Build Distributed Scalar Indexes with Lance and Ray

Big Data Technology Tribe

Mar 10, 2026 · Databases

How Lance Builds Scalar and Vector Indexes: A Deep Dive into create_index

This article explains how Lance's Python API creates scalar and vector indexes, walks through the internal Rust implementation of the create_index workflow, and details the transaction, commit, and error‑handling mechanisms that ensure atomic and consistent index creation.

DatabaseLanceRust

0 likes · 12 min read

How Lance Builds Scalar and Vector Indexes: A Deep Dive into create_index

Big Data Technology Tribe

Feb 26, 2026 · Databases

How optimize_indices Improves Query Performance in Lance

The article explains the purpose and inner workings of Lance's optimize_indices function, detailing how it incorporates newly appended data into existing indexes, merges delta indexes, and manages partition adjustments to maintain fast vector and scalar query performance without full re‑training.

IVFLanceoptimize_indices

0 likes · 8 min read

How optimize_indices Improves Query Performance in Lance

Big Data Technology Tribe

Feb 25, 2026 · Databases

How Lance Implements MVCC Transactions with Optimistic Concurrency and Automatic Conflict Resolution

Lance uses Multi-Version Concurrency Control to provide ACID guarantees, creating immutable table versions on each commit and employing atomic storage primitives, rebase logic, and retry mechanisms to handle concurrent writes, conflict detection, and resolution across multiple transaction types.

Concurrency ControlDatabase InternalsLance

0 likes · 16 min read

How Lance Implements MVCC Transactions with Optimistic Concurrency and Automatic Conflict Resolution

Big Data Technology Tribe

Oct 18, 2025 · Databases

How Adaptive Structural Encoding Boosts Random Access in Columnar Storage

This article examines how adaptive structural encoding in columnar formats like Lance dramatically improves random‑access performance on NVMe storage, compares it with Apache Parquet and Arrow, and discusses the trade‑offs between scan speed, memory usage, and compression.

Columnar StorageLanceNVMe

0 likes · 17 min read

How Adaptive Structural Encoding Boosts Random Access in Columnar Storage

ByteDance Data Platform

Sep 3, 2025 · Artificial Intelligence

Revolutionizing AI Data Lakes: How Daft + Lance Enable Multimodal Processing

This article explores how the LAS team's AI‑driven data lake solution, built on Daft for lake computing and Lance for lake storage, tackles the emerging challenges of multimodal data handling, offering faster I/O, heterogeneous CPU‑GPU scheduling, and seamless integration for AI workloads.

AIDaftDistributed computing

0 likes · 11 min read

Revolutionizing AI Data Lakes: How Daft + Lance Enable Multimodal Processing

Volcano Engine Developer Services

Mar 5, 2025 · Artificial Intelligence

How DeepSeek Smallpond Powers AI Data Processing with Ray and DuckDB

This article introduces DeepSeek Smallpond, a lightweight yet high‑performance AI data‑processing engine built on Ray and DuckDB, explains its dual Dataframe and LogicalPlan APIs, showcases integration with Volcano Engine's AI Data Lake LAS, and provides practical code examples for distributed processing, multimodal storage, and RAG pipelines.

AI data processingDistributed computingDuckDB

0 likes · 18 min read

How DeepSeek Smallpond Powers AI Data Processing with Ray and DuckDB