GuanYuan Data Tech Team
Author

GuanYuan Data Tech Team

Practical insights from the GuanYuan Data Tech Team

20
Articles
0
Likes
48
Views
0
Comments
Recent Articles

Latest from GuanYuan Data Tech Team

20 recent articles
GuanYuan Data Tech Team
GuanYuan Data Tech Team
Jul 28, 2022 · Artificial Intelligence

Unlocking Reinforcement Learning: Core Concepts, Algorithms, and Real‑World Applications

This article introduces reinforcement learning by defining agents, environments, rewards, and policies, explains key concepts such as Markov Decision Processes and Bellman equations, and surveys major algorithms—including dynamic programming, Monte‑Carlo, TD learning, policy gradients, Q‑learning, DQN, and evolution strategies—while highlighting practical challenges and notable case studies like AlphaGo Zero.

Evolution StrategiesMDPMachine Learning
0 likes · 27 min read
Unlocking Reinforcement Learning: Core Concepts, Algorithms, and Real‑World Applications
GuanYuan Data Tech Team
GuanYuan Data Tech Team
Jul 14, 2022 · Big Data

How to Train Massive GBDT Models on Spark: A Complete Step‑by‑Step Guide

This article walks through using Apache Spark for large‑scale GBDT training, covering the challenges of massive data, Spark deployment, PySpark code examples, differences from Pandas, feature engineering, mmlspark installation, early‑stopping tricks, performance bottlenecks, and a systematic evaluation of alternative frameworks.

Big DataGBDTPerformance Optimization
0 likes · 38 min read
How to Train Massive GBDT Models on Spark: A Complete Step‑by‑Step Guide
GuanYuan Data Tech Team
GuanYuan Data Tech Team
Jun 30, 2022 · Big Data

Why Spark 3.2 OOMs After Upgrade: Deep Dive into AQE and StageMetrics

After upgrading Spark from 3.0.1 to 3.2.1 an ETL job began failing with OutOfMemory errors; this article examines the root causes, including AQE‑related metric accumulation, skipped stages, and stage‑metric growth, and presents a debugging process and a code‑level fix to mitigate memory pressure.

AQEBig DataOutOfMemory
0 likes · 13 min read
Why Spark 3.2 OOMs After Upgrade: Deep Dive into AQE and StageMetrics
GuanYuan Data Tech Team
GuanYuan Data Tech Team
Jun 16, 2022 · Artificial Intelligence

How Deepchecks Automates Data and Model Validation for Reliable AI Pipelines

This article introduces the open‑source Deepchecks library, explains its core concepts of checks, conditions, and suites, and provides step‑by‑step tutorials for data validation, train‑test validation, and model evaluation to help AI engineers build robust, data‑centric machine‑learning workflows.

Pythondata validationdeepchecks
0 likes · 15 min read
How Deepchecks Automates Data and Model Validation for Reliable AI Pipelines
GuanYuan Data Tech Team
GuanYuan Data Tech Team
May 12, 2022 · Backend Development

Why Playwright Beats Selenium for Modern Web Automation

This article compares Playwright and Selenium, highlighting Playwright's superior language support, driver‑less operation, faster startup, reliable auto‑waiting, stable code generation, asynchronous capabilities, and headless mode, then provides step‑by‑step environment setup, practical usage tips, and code examples for Java‑based UI testing.

JavaPlaywrightSelenium
0 likes · 16 min read
Why Playwright Beats Selenium for Modern Web Automation
GuanYuan Data Tech Team
GuanYuan Data Tech Team
May 5, 2022 · Artificial Intelligence

Why FLAML Is the Fast, Lightweight AutoML Framework You Should Try

This article introduces Microsoft’s FLAML, a fast and lightweight AutoML library, explains its design principles, cost‑aware search strategy, key observations, properties, and experimental results, and provides practical code examples for integrating FLAML into Python machine‑learning workflows.

AutoMLCost-aware SearchFLAML
0 likes · 15 min read
Why FLAML Is the Fast, Lightweight AutoML Framework You Should Try
GuanYuan Data Tech Team
GuanYuan Data Tech Team
Apr 14, 2022 · Artificial Intelligence

Mastering Time Series Forecasting: From Moving Averages to Transformers

Time series forecasting, essential across weather, finance, and commerce, involves tasks like classification, clustering, anomaly detection, and especially prediction; this article explores its definitions, evaluation metrics, traditional methods, machine‑learning approaches, deep‑learning models such as TFT, and emerging AutoML tools, offering practical insights and best practices.

AutoMLGBDTMachine Learning
0 likes · 27 min read
Mastering Time Series Forecasting: From Moving Averages to Transformers