Tagged articles
3 articles
Page 1 of 1
Big Data Technology Tribe
Big Data Technology Tribe
Feb 27, 2026 · Fundamentals

What Is pyarrow.Schema and How to Use It?

pyarrow.Schema is the Python representation of an Arrow table schema, describing column names, types, nullability, and other metadata, and it is essential for defining, inspecting, serializing, and interfacing data structures across libraries like Pandas, Polars, and Arrow‑based query engines.

Apache ArrowData StructuresPyArrow
0 likes · 4 min read
What Is pyarrow.Schema and How to Use It?
Data STUDIO
Data STUDIO
Nov 25, 2025 · Big Data

Why Parquet Is the Faster, Lighter, Safer Alternative to CSV in Python

The article explains why CSV becomes a bottleneck for large‑scale data, demonstrates how Parquet’s columnar, typed, and compressed format dramatically reduces storage, speeds up reads, and improves data safety, and provides step‑by‑step Python code for migrating and benchmarking the switch.

CSVData EngineeringDuckDB
0 likes · 18 min read
Why Parquet Is the Faster, Lighter, Safer Alternative to CSV in Python
Python Crawling & Data Mining
Python Crawling & Data Mining
Oct 26, 2024 · Databases

Export MongoDB Data to CSV, Excel, JSON and More with mongo2file

This article introduces the mongo2file Python library that converts MongoDB collections into various table formats such as CSV, Excel, JSON, Pickle, Feather, and Parquet, explains its PyArrow dependency, shows installation and usage examples, discusses performance bottlenecks, and provides API reference details.

CSVData ExportExcel
0 likes · 11 min read
Export MongoDB Data to CSV, Excel, JSON and More with mongo2file