ODPS Development Guide: Parameters, Built‑in Functions, UDF Creation, and Performance Optimization
This comprehensive ODPS (MaxCompute) development guide serves as a mini‑encyclopedia, detailing common parameter tuning, built‑in SQL functions, step‑by‑step Java UDF creation, job lifecycle insights, and practical performance‑optimization techniques such as parallelism adjustment, map‑join hints, and small‑file mitigation.
This article is a mini‑encyclopedia for ODPS (MaxCompute) development, covering both beginner and advanced topics.
Common Parameter Settings
Typical tuning focuses on the number and memory of map, join, and reduce tasks. Example settings include:
set odps.sql.mapper.cpu=100
set odps.sql.mapper.memory=1024
set odps.sql.mapper.split.size=256
set odps.sql.joiner.instances=-1
set odps.sql.joiner.cpu=100
set odps.sql.reducer.instances=-1
set odps.sql.reducer.cpu=100
Additional parameters control file merging, UDF resources, map‑join memory, dynamic partition handling, and data‑skew optimization.
Built‑in SQL Functions
The guide classifies functions into date, math, window, aggregation, string, complex‑type, encryption, and others, providing typical usage examples such as:
SELECT DATEADD(GETDATE(), -7, 'dd');
to_char('2018-01-11 10:00:00','yyyymmdd') as date_3
split(str, pat)
regexp_replace(msg_id, "\\[|\\]", "") as msg_id
These functions help with date calculations, string manipulation, JSON extraction, and more.
Custom Java UDF Development
Step‑by‑step instructions show how to install the MaxCompute Studio plugin in IDEA, create a Java project, add a UDF class, configure Maven assembly to package dependencies, and publish the JAR to the ODPS resource library.
Key commands:
set odps.sql.udf.jvm.memory=1024
set odps.sql.udf.timeout=1800
After packaging, the UDF is uploaded via “Deploy to server”, linked to a function name, and can be invoked directly in SQL.
Performance Analysis & Optimization
The article explains the job lifecycle (scheduling, optimization, physical plan generation, execution, and completion) and common bottlenecks such as resource shortage, data skew, excessive small files, and inefficient UDFs.
Typical solutions include adjusting parallelism ( set odps.sql.reducer.instances=xxx ), enabling HBO, using map‑join hints, dynamic filter hints, materialized views, and reducing small‑file generation ( set odps.merge.smallfile.filesize.threshold=64 ).
Sample SQL‑function creation for reuse:
CREATE SQL FUNCTION IF NOT EXISTS get_json_object_checkboxField(@a STRING,@b STRING) AS REPLACE(REPLACE(REPLACE(GET_JSON_OBJECT(@a,@b),'[\"',''),'\"]',''),'\"','');
Conclusion
After a month of consolidation, the author delivers a foundational ODPS development reference, emphasizing continuous learning, knowledge sharing, and community building.
DaTaobao Tech
Official account of DaTaobao Technology
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.