Big Data 37 min read

ODPS (MaxCompute) SQL Basics, Data Integration and Hologres Import Guide

This guide provides a comprehensive, beginner‑to‑advanced reference for ODPS (MaxCompute) SQL, covering table creation, DDL/DML commands, query syntax, join hints, MySQL‑to‑ODPS synchronization, one‑click and custom imports into Hologres, and scheduling variables for automated data pipelines.

DaTaobao Tech
DaTaobao Tech
DaTaobao Tech
ODPS (MaxCompute) SQL Basics, Data Integration and Hologres Import Guide

This article collects beginner and advanced knowledge of ODPS (MaxCompute) development, providing a mini‑encyclopedia for newcomers.

Basic Functions

It introduces the MaxCompute module and shows DDL statements such as creating tables, adding comments, partitions, clustering, transactional properties, and examples of ALTER statements for changing owner, comments, renaming, and dropping tables.

CREATE TABLE IF NOT EXISTS xxx.xxx_xxxx_xxxx_hh (
  xxxxx STRING COMMENT '商品',
  xxxxx STRING COMMENT '名字'
) COMMENT 'xxx表' PARTITIONED BY (ds STRING COMMENT 'yyyymmddhh') LIFECYCLE 7;

DML Operations

Typical DML commands are demonstrated, including INSERT/OVERWRITE, UPDATE, DELETE, MERGE, LOAD, UNLOAD and the use of ZORDER for data layout optimization.

INSERT INTO sale_detail PARTITION (sale_date='2013', region='china') VALUES ('s1','c1',100.1),('s2','c2',100.2);
MERGE INTO target_table t USING source_table s ON t.id = s.id
WHEN MATCHED THEN UPDATE SET t.col = s.col
WHEN NOT MATCHED AND s.event_type='I' THEN INSERT VALUES (s.*);

Query Language (DQL)

The guide covers SELECT syntax, WITH clause, subqueries, set operations (UNION, INTERSECT, EXCEPT), GROUP BY, HAVING, ORDER BY, DISTRIBUTE BY, SORT BY, and LIMIT.

SELECT region, SUM(price) FROM sale_detail GROUP BY region HAVING SUM(price) < 305 ORDER BY region LIMIT 10;

Joins and Hints

Various join types are explained (INNER, LEFT, RIGHT, FULL, NATURAL) and advanced hints such as MAPJOIN and SKEWJOIN for performance tuning.

SELECT /*+ MAPJOIN(a) */ a.*, b.* FROM large_table a JOIN small_table b ON a.id = b.id;
SELECT /*+ SKEWJOIN(a) */ * FROM T0 a JOIN T1 b ON a.c0 = b.c0;

Data Synchronization (MySQL → ODPS)

Step‑by‑step instructions describe how to configure a DataWorks synchronization job: select source MySQL table, generate target ODPS table, set lifecycle and partitioning, configure scheduling (daily/hourly), and run the job.

ODPS → Hologres Import

Two methods are provided: (1) one‑click visual import via DataStudio, and (2) custom SQL import. The process includes creating an external table that maps the MaxCompute source, defining the target Hologres table (including partitions and indexes), inserting data into temporary partitions, and attaching them to the parent table.

IMPORT FOREIGN SCHEMA public_data LIMIT TO (dwd_product_movie_basic_info) FROM SERVER odps_server INTO public OPTIONS(if_table_exist='update');
BEGIN;
CREATE TABLE public.holo_dwd_product_movie_basic_info (
  movie_name TEXT,
  director TEXT,
  ds TEXT
) PARTITION BY LIST (ds);
COMMIT;

Scheduling Configuration

The guide explains how to set scheduling parameters using built‑in variables (${bizdate}, $[yyyymmddhh24miss]), time attributes, resource groups, and dependency configuration. It also lists built‑in variables such as $bizdate, $cyctime, $gmtdate, $bizmonth, $jobid, $nodeid, $taskid.

SET odps.sql.allow.fullscan=true;
INSERT INTO tmp_holo_table_${bizdate} SELECT * FROM dwd_table WHERE ds='${bizdate}';

Overall, the article serves as a comprehensive reference for writing ODPS SQL, performing data integration, and automating data pipelines to Hologres.

Big DataSQLHologresschedulingETLdata integrationODPS
DaTaobao Tech
Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.