Key New Features of Apache Doris 3.0: Storage‑Compute Separation, Lakehouse Integration, Semi‑Structured Data, ETL Enhancements, Materialized Views, and Java UDTF

Apache Doris 3.0 introduces storage‑compute separation, native lakehouse write‑back, optimized Variant handling for semi‑structured data, stronger ETL transaction support, enhanced multi‑table materialized views, and Java UDTF capabilities, providing developers with more flexible, cost‑effective, and high‑performance analytics solutions.

Apache DorisData WarehouseETL

0 likes · 7 min read

Key New Features of Apache Doris 3.0: Storage‑Compute Separation, Lakehouse Integration, Semi‑Structured Data, ETL Enhancements, Materialized Views, and Java UDTF

Rare Earth Juejin Tech Community

Jan 31, 2024 · Artificial Intelligence

Advanced RAG with Semi‑Structured Data Using LangChain, Unstructured, and ChromaDB

This tutorial demonstrates how to build an advanced Retrieval‑Augmented Generation (RAG) system for semi‑structured PDF data by leveraging LangChain, the unstructured library, ChromaDB vector store, and OpenAI models, covering installation, PDF partitioning, element classification, summarization, and query execution.

AIChromaDBLangChain

0 likes · 11 min read

Advanced RAG with Semi‑Structured Data Using LangChain, Unstructured, and ChromaDB

DataFunTalk

Jan 1, 2024 · Big Data

MaxCompute Semi-Structured Data: Concepts, Solutions, and Benefits

This article explains the nature of semi‑structured data, compares traditional schema‑on‑read and schema‑on‑write approaches, and details MaxCompute's columnar storage solution that balances flexibility, performance, and cost for large‑scale data warehouses.

Big DataColumnar StorageData Warehouse

0 likes · 19 min read

MaxCompute Semi-Structured Data: Concepts, Solutions, and Benefits

Alibaba Cloud Big Data AI Platform

Sep 14, 2023 · Big Data

How MaxCompute Turns Semi‑Structured Data into High‑Performance Columnar Storage

This article explains the nature of semi‑structured data, compares schema‑on‑read and schema‑on‑write approaches, and shows how Alibaba Cloud MaxCompute leverages columnar storage and dynamic parsing to achieve low‑cost, high‑performance analytics for large‑scale data workloads.

Columnar StorageMaxComputeschema on write

0 likes · 20 min read

How MaxCompute Turns Semi‑Structured Data into High‑Performance Columnar Storage

DataFunSummit

Sep 7, 2023 · Big Data

MaxCompute Semi-Structured Data Solutions: Architecture, Comparison, and Performance Benefits

This article explains the concepts of semi‑structured data, compares traditional schema‑on‑read and schema‑on‑write approaches, and details MaxCompute's columnar storage solution—including AliORC, adaptive query processing, and handling of dirty or sparse data—to achieve high performance and low cost in big‑data warehousing.

MaxComputesemi-structured data

0 likes · 20 min read