Apache DolphinScheduler 3.0.0 Released: Biggest Changes Yet
On August 10, 2022 Apache DolphinScheduler 3.0.0 was officially released, introducing a brand‑new Vue3‑based UI that is dozens of times faster, extensive AWS support, custom time‑zone handling, task groups, native data‑quality checks, service splitting for container‑native deployment, numerous new task types, Python API enhancements, and a long list of bug fixes and documentation updates.
On 2022‑08‑10 Apache DolphinScheduler released version 3.0.0, the largest change since the project’s initial open‑source launch.
Keywords
Faster, stronger, more modern, easier to maintain summarize the upgrade.
1. Faster and more modern UI
The UI was rebuilt with Vue3, TSX and Vite. Compared with the previous UI, the new interface loads dozens of times faster for end users and hundreds of times faster for developers compiling locally, dramatically reducing debugging and packaging time. The UI supports language switching without page reload and provides a dark theme. Screenshots of the project‑management page, workflow definition page, shell task page and MySQL datasource page are included.
2. New features and enhancements
New UI : modern layout, updated icons and compile‑time interface parameter validation.
Service splitting : backend services are divided into master‑server, worker‑server, api‑server, alert‑server, standalone‑server, ui, bin and tools to align with container‑oriented micro‑service architecture.
Task groups : control concurrent task instances and set priority within a group. A task runs only when the group’s running count is below the configured pool size; otherwise it waits.
Data quality assurance : native support for data‑quality monitoring with threshold‑based alerts for weekly/monthly fluctuations and source‑row‑count accuracy.
Custom time‑zone : users can select their local time zone, solving cross‑region scheduling issues.
AWS support : new task types for Amazon EMR and Amazon Redshift; the resource center now supports Amazon S3 storage.
New task types : Flink (SQL), Zeppelin, and additional alert plugins (Telegram, WebexTeams).
Python API : PythonGatewayServer integrated into the API server, with CLI and configuration modules. Example configuration:
# environment variable
export PYDS_JAVA_GATEWAY_ADDRESS="192.168.1.1"
export PYDS_WORKFLOW_USER="custom-user"
# file change
Directly change ~/pydolphinscheduler/config.yaml
# CLI
pydolphinscheduler config --set java_gateway.address 192.168.1.1Bash parameter passing : dynamic variable injection into downstream tasks via setValue and Bash variables.
File upload without extension : the resource center now accepts files lacking a suffix.
3. Major optimizations
Task backend plugin refactor – plugins can be updated independently.
Cron validation of start and end times during workflow submission.
Dependent tasks can now select a global project.
AlertSender and MasterServer optimizations.
Slot‑based database query reduction.
Python gateway migrated to API server to shrink distribution size.
Task acknowledgment changed to a run‑callback mechanism.
Master task event thread pool added.
4. Bug fixes
Fixes include S3a Minio tenant creation failures, PostgreSQL connection issues, Spark plugin errors, MapReduce command‑parameter ordering, task‑group priority handling, worker resource exhaustion, time‑zone scheduling bugs, and numerous stability improvements across deployment modes, documentation and alert handling.
5. Documentation updates
Corrected deployment guides, AWS task type docs and Kubernetes FAQ.
Added sections for Telegram/WebexTeams alerts, Zeppelin tasks and Bash parameter usage.
Updated data‑quality, Spark, Flink and resource‑center documentation.
6. Release information
GitHub release page: https://github.com/apache/dolphinscheduler/releases/tag/3.0.0
Download page: https://dolphinscheduler.apache.org/en-us/download/download.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Past Memory Big Data
A popular big-data architecture channel with over 100,000 developers. Publishes articles on Spark, Hadoop, Flink, Kafka and more. Visit the Past Memory Big Data blog at https://www.iteblog.com. Search "Past Memory" on Google or Baidu.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
