ODPS (MaxCompute) SQL Basics, Data Integration and Hologres Import Guide
This guide provides a comprehensive, beginner‑to‑advanced reference for ODPS (MaxCompute) SQL, covering table creation, DDL/DML commands, query syntax, join hints, MySQL‑to‑ODPS synchronization, one‑click and custom imports into Hologres, and scheduling variables for automated data pipelines.
This article collects beginner and advanced knowledge of ODPS (MaxCompute) development, providing a mini‑encyclopedia for newcomers.
Basic Functions
It introduces the MaxCompute module and shows DDL statements such as creating tables, adding comments, partitions, clustering, transactional properties, and examples of ALTER statements for changing owner, comments, renaming, and dropping tables.
CREATE TABLE IF NOT EXISTS xxx.xxx_xxxx_xxxx_hh (
xxxxx STRING COMMENT '商品',
xxxxx STRING COMMENT '名字'
) COMMENT 'xxx表' PARTITIONED BY (ds STRING COMMENT 'yyyymmddhh') LIFECYCLE 7;DML Operations
Typical DML commands are demonstrated, including INSERT/OVERWRITE, UPDATE, DELETE, MERGE, LOAD, UNLOAD and the use of ZORDER for data layout optimization.
INSERT INTO sale_detail PARTITION (sale_date='2013', region='china') VALUES ('s1','c1',100.1),('s2','c2',100.2); MERGE INTO target_table t USING source_table s ON t.id = s.id
WHEN MATCHED THEN UPDATE SET t.col = s.col
WHEN NOT MATCHED AND s.event_type='I' THEN INSERT VALUES (s.*);Query Language (DQL)
The guide covers SELECT syntax, WITH clause, subqueries, set operations (UNION, INTERSECT, EXCEPT), GROUP BY, HAVING, ORDER BY, DISTRIBUTE BY, SORT BY, and LIMIT.
SELECT region, SUM(price) FROM sale_detail GROUP BY region HAVING SUM(price) < 305 ORDER BY region LIMIT 10;Joins and Hints
Various join types are explained (INNER, LEFT, RIGHT, FULL, NATURAL) and advanced hints such as MAPJOIN and SKEWJOIN for performance tuning.
SELECT /*+ MAPJOIN(a) */ a.*, b.* FROM large_table a JOIN small_table b ON a.id = b.id; SELECT /*+ SKEWJOIN(a) */ * FROM T0 a JOIN T1 b ON a.c0 = b.c0;Data Synchronization (MySQL → ODPS)
Step‑by‑step instructions describe how to configure a DataWorks synchronization job: select source MySQL table, generate target ODPS table, set lifecycle and partitioning, configure scheduling (daily/hourly), and run the job.
ODPS → Hologres Import
Two methods are provided: (1) one‑click visual import via DataStudio, and (2) custom SQL import. The process includes creating an external table that maps the MaxCompute source, defining the target Hologres table (including partitions and indexes), inserting data into temporary partitions, and attaching them to the parent table.
IMPORT FOREIGN SCHEMA public_data LIMIT TO (dwd_product_movie_basic_info) FROM SERVER odps_server INTO public OPTIONS(if_table_exist='update'); BEGIN;
CREATE TABLE public.holo_dwd_product_movie_basic_info (
movie_name TEXT,
director TEXT,
ds TEXT
) PARTITION BY LIST (ds);
COMMIT;Scheduling Configuration
The guide explains how to set scheduling parameters using built‑in variables (${bizdate}, $[yyyymmddhh24miss]), time attributes, resource groups, and dependency configuration. It also lists built‑in variables such as $bizdate, $cyctime, $gmtdate, $bizmonth, $jobid, $nodeid, $taskid.
SET odps.sql.allow.fullscan=true;
INSERT INTO tmp_holo_table_${bizdate} SELECT * FROM dwd_table WHERE ds='${bizdate}';Overall, the article serves as a comprehensive reference for writing ODPS SQL, performing data integration, and automating data pipelines to Hologres.
DaTaobao Tech
Official account of DaTaobao Technology
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.