查询表基本语法：SELECT语法

总体格式

SQL代码格式

[with <cte>[, ...] ]
SELECT [all | distinct] <SELECT_expr>[, <except_expr>][, <replace_expr>] ...
       from <table_reference>
       [where <where_condition>]
       [group by {<col_list>|rollup(<col_list>)}]
       [having <having_condition>]
       [window <window_clause>]
       [order by <order_condition>]
       [distribute by <distribute_condition> [sort by <sort_condition>]|[ cluster by <cluster_condition>] ]
       [limit <number>];

源数据示例

--创建一张分区表sale_detail。
create table if not exists sale_detail
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);

--向源表增加分区。
alter table sale_detail add partition (sale_date='2013', region='china');

--向源表追加数据。
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);

--查询分区表sale_detail中的数据
SELECT * from sale_detail;
--返回结果
+------------+------------+------------+------------+------------+
| shop_name  | price      | customer   | sale_date  | region     |
+------------+------------+------------+------------+------------+
| s1         | 100.1      | c1         | 2013       | china      |
| s2         | 100.2      | c2         | 2013       | china      |
| s3         | 100.3      | c3         | 2013       | china      |
+------------+------------+------------+------------+------------+

WITH子句（Common Table Expression, cte）：可选

WITH子句包含一个或多个常用的表达式CTE，这些CTE充当当前运行环境中的临时表，可以在后续查询中引用。

CTE使用规则

在同一WITH子句中的CTE必须具有唯一的名字
在WITH子句中定义的CTE仅对在同一WITH子句中的其他CTE可以使用

错误命令示例

--A引用A，无效
with 
A as (SELECT 1 from A) 
SELECT * from A;

--A引用B，B引用A，无效
with 
A as (SELECT * from B ), 
B as (SELECT * from A ) 
SELECT * from B;

正确命令示例

with 
A as (SELECT 1 as C),
B as (SELECT * from A) 
SELECT * from B;
--返回结果
+---+
| c |
+---+
| 1 |
+---+

列表达式（SELECT_expr）：必填

SELECT_expr格式为col1_name, col2_name, 列表达式, ...，表示待查询的普通列、分区列或正则表达式。

列表达式使用规则

用列名指定要读取的列

--读取表的列
SELECT shop_name from sale_detail;

--返回结果
+------------+
| shop_name  |
+------------+
| s1         |
| s2         |
| s3         |
+------------+

用星号（*）代表查询所有的列，可以配合where子句指定过滤条件

--读取表中所有列
SELECT * from sale_detail;

--返回结果
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

--在where子句中指定过滤条件
SELECT * from sale_detail where shop_name='s1';

--返回结果
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

可以使用正则表达式

--选出表中所有列名以sh开头的列
SELECT `sh.*` from sale_detail;

--返回结果
+------------+
| shop_name  |
+------------+
| s1         |
| s2         |
| s3         |
+------------+

--选出表中所有列名不为shop_name的所有列
SELECT `(shop_name)?+.+` from sale_detail;

--返回结果
+-------------+-------------+------------+------------+
| customer_id | total_price | sale_date  | region     |
+-------------+-------------+------------+------------+
| c1          | 100.1       | 2013       | china      |
| c2          | 100.2       | 2013       | china      |
| c3          | 100.3       | 2013       | china      |
+-------------+-------------+------------+------------+

--选出表中排除shop_name和customer_id两列的其他列
SELECT `(shop_name|customer_id)?+.+` from sale_detail;

--返回结果
+-------------+------------+------------+
| total_price | sale_date  | region     |
+-------------+------------+------------+
| 100.1       | 2013       | china      |
| 100.2       | 2013       | china      |
| 100.3       | 2013       | china      |
+-------------+------------+------------+

--选出表中排除列名以t开头的其他列
SELECT `(t.*)?+.+` from sale_detail;

--返回结果
+------------+-------------+------------+------------+
| shop_name  | customer_id | sale_date  | region     |
+------------+-------------+------------+------------+
| s1         | c1          | 2013       | china      |
| s2         | c2          | 2013       | china      |
| s3         | c3          | 2013       | china      |
+------------+-------------+------------+------------+

选取列名前可以使用distinct去掉重复字段，只返回去重后的值

--查询表中region列数据，如果有重复值时仅显示一条
SELECT distinct region from sale_detail;

--返回结果
+------------+
| region     |
+------------+
| china      |
+------------+

--去重多列时，distinct的作用域是SELECT的列集合，而非单个列
SELECT distinct region, sale_date from sale_detail;

--返回结果
+------------+------------+
| region     | sale_date  |
+------------+------------+
| china      | 2013       |
+------------+------------+

--distinct可以对窗口函数的计算结果进行去重，即distinct可以配合窗口函数使用
SELECT distinct sale_date, row_number() over (partition by customer_id order by total_price) as rn from sale_detail;

--返回结果
+-----------+------------+
| sale_date | rn         |
+-----------+------------+
| 2013      | 1          |
+-----------+------------+

--distinct不支持与group by联合使用
--错误示例
SELECT distinct shop_name from sale_detail group by shop_name;
--报错信息： GROUP BY cannot be used with SELECT DISTINCT

排除列（except_expr）：可选

except_expr格式为except(col1_name, col2_name, ...)，表示读取表数据时会排除指定列（col1、col2）的数据。

--读取sale_detail表的数据，并排除region列的数据。
SELECT * except(region) from sale_detail;

--返回结果
+-----------+-------------+-------------+-----------+
| shop_name | customer_id | total_price | sale_date |
+-----------+-------------+-------------+-----------+
| s1        | c1          | 100.1       | 2013      |
| s2        | c2          | 100.2       | 2013      |
| s3        | c3          | 100.3       | 2013      |
+-----------+-------------+-------------+-----------+

修改列（replace_expr）：可选

replace_expr格式为replace(exp1 [as] col1_name, exp2 [as] col2_name, ...)，表示读取表数据时会将col1的数据修改为exp1，将col2的数据修改为exp2。

--读取sale_detail表的数据，并修改total_price、region两列的数据。
SELECT * replace(total_price+100 as total_price, 'shanghai' as region) from sale_detail;

--返回结果
+-----------+-------------+-------------+-----------+--------+
| shop_name | customer_id | total_price | sale_date | region |
+-----------+-------------+-------------+-----------+--------+
| s1        | c1          | 200.1       | 2013      | shanghai |
| s2        | c2          | 200.2       | 2013      | shanghai |
| s3        | c3          | 200.3       | 2013      | shanghai |
+-----------+-------------+-------------+-----------+--------+

目标表信息（table_reference）：必填

table_reference表示查询的目标表信息

直接指定目标表名

SELECT customer_id from sale_detail;

--返回结果
+-------------+
| customer_id |
+-------------+
| c1          |
| c2          |
| c3          |
+-------------+

嵌套子查询

SELECT * from (SELECT region,sale_date from sale_detail) t where region = 'china';

--返回结果
+------------+------------+
| region     | sale_date  |
+------------+------------+
| china      | 2013       |
| china      | 2013       |
| china      | 2013       |
+------------+------------+

WHERE子句（where_condition）：可选

where子句为过滤条件，如果表是分区表，可以实现列裁剪

配合关系运算符，筛选满足指定条件的数据。关系运算符包括：
- >、<、=、>=、<= 、<>
- like、rlike
- in、not in
- between...and

--where子句可以指定分区范围，只扫描表的指定部分
SELECT * 
from sale_detail
where sale_date >= '2008' and sale_date <= '2014';
--等价于如下语句。
SELECT * 
from sale_detail 
where sale_date between '2008' and '2014';

--返回结果
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

在列表达式中，如果被重命名的列字段（赋予列别名）使用了函数，则不能在where子句中引用列别名

--错误命令示例
SELECT  task_name
        ,inst_id
        ,settings
        ,GET_JSON_OBJECT(settings, '$.SKYNET_ID') as skynet_id
        ,GET_JSON_OBJECT(settings, '$.SKYNET_NODENAME') as user_agent
from    Information_Schema.TASKS_HISTORY
where   ds = '20211215' and skynet_id is not null
limit 10;

GROUP BY分组查询（col_list）：可选

group by和聚合函数配合使用，根据指定的普通列、分区列或正则表达式进行分组。

`group by` 使用规则

group by取值为正则表达式，必须使用列的完整表达式
SELECT语句中没有使用聚合函数的列必须出现在GROUP BY中

`group by`使用示例

--直接使用输入表列名作为group by的列
SELECT region from sale_detail group by region;

--返回结果
+------------+
| region     |
+------------+
| china      |
+------------+

--以某一列的值分组，返回计算结果
SELECT sum(total_price) from sale_detail group by region;

--返回结果
+------------+
| _c0        |
+------------+
| 300.6      |
+------------+

--同时获取列信息和计算结果
SELECT region, sum (total_price) from sale_detail group by region;

--返回结果
+------------+------------+
| region     | _c1        |
+------------+------------+
| china      | 300.6      |
+------------+------------+

--以`SELECT`列的别名分组
SELECT region as r from sale_detail group by r;
--等效于如下语句。
SELECT region as r from sale_detail group by region;

--返回结果
+------------+
| r          |
+------------+
| china      |
+------------+

--以列表达式分组
SELECT 2 + total_price as r from sale_detail group by 2 + total_price;

--返回结果
+------------+
| r          |
+------------+
| 102.1      |
| 102.2      |
| 102.3      |
+------------+

--SELECT的所有列中没有使用聚合函数的列，必须出现在GROUP BY中，否则会报错
--错误示例
SELECT region, total_price from sale_detail group by region;

--正确示例
SELECT region, total_price from sale_detail group by region, total_price;

--返回结果
+------------+-------------+
| region     | total_price |
+------------+-------------+
| china      | 100.1       |
| china      | 100.2       |
| china      | 100.3       |
+------------+-------------+

HAVING子句（having_condition）：可选

通常HAVING子句与聚合函数一起使用，实现过滤

--为直观展示数据呈现效果，向sale_detail表中追加数据。
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
--使用having子句配合聚合函数实现过滤。
SELECT region,sum(total_price) from sale_detail 
group by region 
having sum(total_price)<305;

--返回结果
+------------+------------+
| region     | _c1        |
+------------+------------+
| china      | 300.6      |
| shanghai   | 200.9      |
+------------+------------+

ORDER BY全局排序（order_condition）：可选

order by用于对所有数据按照指定普通列、分区列或指定常量进行全局排序

`order by`使用规则

默认对数据进行升序排序，如果降序排序，需要使用desc关键字
order by默认要求带limit数据行数限制，没有limit会返回错误

`order by`使用示例

--查询表的信息，并按照total_price升序排列前2条
SELECT * from sale_detail order by total_price limit 2;

--返回结果
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

--查询表的信息，并按照total_price降序排列前2条
SELECT * from sale_detail order by total_price desc limit 2;

--返回结果
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s3         | c3          | 100.3       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

DISTRIBUTE BY哈希分片（distribute_condition）：可选

distribute by用于对数据按照某几列的值做Hash分片。distribute by控制Map的输出在Reducer是如何划分的，如果不希望Reducer的内容存在重叠，或需要对同一分组的数据一起处理，可以使用distribute by来保证同组数据分发到同一个Reducer中。

--查询表sale_detail中的列region值并按照region值进行哈希分片。
SELECT region from sale_detail distribute by region;
--等价于如下语句。
SELECT region as r from sale_detail distribute by region;
SELECT region as r from sale_detail distribute by r;

SORT BY局部排序（sort_condition）：可选

`sort by`使用规则

sort by默认对数据进行升序排序，如果降序排序，需要使用desc关键字
如果sort by语句前有distribute by，sort by会对distribute by的结果按照指定的列进行排序

--查询表中的列region和total_price的值并按照region值进行哈希分片，然后按照total_price对哈希分片结果进行局部升序排序
--为直观展示数据呈现效果，向sale_detail表中追加数据。
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
SELECT region,total_price from sale_detail distribute by region sort by total_price;

--返回结果
+------------+-------------+
| region     | total_price |
+------------+-------------+
| shanghai   | NULL        |
| china      | 100.1       |
| china      | 100.2       |
| china      | 100.3       |
| shanghai   | 100.4       |
| shanghai   | 100.5       |
+------------+-------------+

--查询表中的列region和total_price的值并按照region值进行哈希分片，然后按照total_price对哈希分片结果进行局部降序排序
SELECT region,total_price from sale_detail distribute by region sort by total_price desc;

--返回结果
+------------+-------------+
| region     | total_price |
+------------+-------------+
| shanghai   | 100.5       |
| shanghai   | 100.4       |
| china      | 100.3       |
| china      | 100.2       |
| china      | 100.1       |
| shanghai   | NULL        |
+------------+-------------+

--如果sort by语句前没有`distribute by`，`sort by`会对每个Reduce中的数据进行局部排序。保证每个Reduce的输出数据都是有序的，从而增加存储压缩率，同时读取时如果有过滤，能够减少真正从磁盘读取的数据量，提高后续全局排序的效率。
SELECT region,total_price from sale_detail sort by total_price desc;

--返回结果
+------------+-------------+
| region     | total_price |
+------------+-------------+
| china      | 100.3       |
| china      | 100.2       |
| china      | 100.1       |
| shanghai   | 100.5       |
| shanghai   | 100.4       |
| shanghai   | NULL        |
+------------+-------------+

--`order by|distribute by|sort by`的取值必须是`SELECT`语句的输出列，即列的别名。列的别名可以为中文。
--在MaxCompute SQL解析中，`order by|distribute by|sort by`执行顺序在`SELECT`操作之后，因此它们的取值只能为`SELECT`语句的输出列。
--`order by`不和`distribute by`、`sort by`同时使用，`group by`也不和`distribute by、sort by`同时使用。

LIMIT限制输出行数（number）：可选

limit <number>中的number是常数，用于限制输出行数，取值范围为int32位取值范围，即最大值不可超过2,147,483,647。

子查询（SUBQUERY）

总体格式

--创建一张分区表sale_detail。
create table if not exists sale_detail
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);

--向源表增加分区。
alter table sale_detail add partition (sale_date='2013', region='china') partition (sale_date='2014', region='shanghai');

--向源表追加数据。
insert into sale_detail partition (sale_date='2013', region='china') values ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
insert into sale_detail partition (sale_date='2014', region='shanghai') values ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);

select * from sale_detail; 
--返回结果。
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
| null       | c5          | NULL        | 2014       | shanghai   |
| s6         | c6          | 100.4       | 2014       | shanghai   |
| s7         | c7          | 100.5       | 2014       | shanghai   |
+------------+-------------+-------------+------------+------------+

基础子查询

普通查询的对象是目标表，但是查询的对象也可以是另一个select语句，这种查询即是子查询。在from子句中，子查询可以被当作一张表，与其他表或子查询进行join操作。
####SQL代码格式

select <select_expr> from (<select_statement>) [<sq_alias_name>];

参数	是否必填	说明
select_expr	必填	格式为`col1_name, col2_name, 正则表达式, ...`，表示待查询的普通列、分区列或正则表达式
select_statement	必填	子查询语句
sq_alias_name	可选	子查询的别名
table_name	必填	目标表名称
####SQL代码示例

--子查询语法
select * from (select shop_name from sale_detail) a;

--返回结果
+------------+
| shop_name  |
+------------+
| s1         |
| s2         |
| s3         |
| null       |
| s6         |
| s7         |
+------------+

--在`from`子句中，子查询可以被当作一张表，与其他的表或查询进行`join`操作
--先新建一张表，再执行join操作。
create table shop as select shop_name,customer_id,total_price from sale_detail;
select a.shop_name, a.customer_id, a.total_price from
(select * from shop) a join sale_detail on a.shop_name = sale_detail.shop_name;

--返回结果
+------------+-------------+-------------+
| shop_name  | customer_id | total_price |
+------------+-------------+-------------+
| null       | c5          | NULL        |
| s6         | c6          | 100.4       |
| s7         | c7          | 100.5       |
| s1         | c1          | 100.1       |
| s2         | c2          | 100.2       |
| s3         | c3          | 100.3       |
+------------+-------------+-------------+

IN SUBQUERY

in subquery与left semi join用法类似

SQL命令格式

--格式1
select<select_expr1>from<table_name1>where<select_expr2>in(select<select_expr3>from<table_name2>);
--等效于leftsemijoin如下语句。
select<select_expr1>from<table_name1><alias_name1>leftsemijoin<table_name2><alias_name2>on<alias_name1>.<select_expr2>=<alias_name2>.<select_expr3>;

--格式二
select<select_expr1>from<table_name1>where<select_expr2>in(select<select_expr3>from<table_name2>where
<table_name1>.<col_name>=<table_name2>.<col_name>);

参数	是否必填	说明
select_expr1	必填	格式为`col1_name, col2_name, 正则表达式, ...`，表示待查询的普通列、分区列或正则表达式
table_name1、table_name2	必填	表的名称
select_expr2、select_expr3	必填	表示table_name1和table_name2互相映射的列名
col_name	必填	表的列名

SQL注意事项

使用IN的子查询时，在子查询的返回结果中会自动去除NULL值的记录

SQL代码示例

--使用格式1子查询语法
select * from sale_detail where total_price in (select total_price from shop);

--返回结果
+-----------+-------------+-------------+-----------+--------+
| shop_name | customer_id | total_price | sale_date | region |
+-----------+-------------+-------------+-----------+--------+
| s1        | c1          | 100.1       | 2013      | china  |
| s2        | c2          | 100.2       | 2013      | china  |
| s3        | c3          | 100.3       | 2013      | china  |
| s6        | c6          | 100.4       | 2014      | shanghai |
| s7        | c7          | 100.5       | 2014      | shanghai |
+-----------+-------------+-------------+-----------+--------+

--使用格式2子查询语法
select * from sale_detail where total_price in (select total_price from shop where customer_id = shop.customer_id);

--返回结果
+-----------+-------------+-------------+-----------+--------+
| shop_name | customer_id | total_price | sale_date | region |
+-----------+-------------+-------------+-----------+--------+
| s1        | c1          | 100.1       | 2013      | china  |
| s2        | c2          | 100.2       | 2013      | china  |
| s3        | c3          | 100.3       | 2013      | china  |
| s6        | c6          | 100.4       | 2014      | shanghai |
| s7        | c7          | 100.5       | 2014      | shanghai |
+-----------+-------------+-------------+-----------+--------+

NOT IN SUBQUERY

not in subquery与left anti join用法类似，但不完全相同。如果查询目标表的指定列名中有任意一行为NULL，则not in表达式值为NULL，导致where条件不成立，无数据返回，这点与left anti join不同。

--格式1
select <select_expr1> from <table_name1> where <select_expr2> not in (select <select_expr2> from <table_name2>);
--等效于left anti join如下语句。
select <select_expr1> from <table_name1> <alias_name1> left anti join <table_name2> <alias_name2> on <alias_name1>.<select_expr1> = <alias_name2>.<select_expr2>;

--MaxCompute不仅支持not in subquery，还支持Correlated条件。子查询中的where <table_name2_colname> = <table_name1>.<colname>即是一个Correlated条件。
select <select_expr1> from <table_name1> where <select_expr2> not in (select <select_expr2> from <table_name2> where <table_name2_colname> = <table_name1>.<colname>);

参数	是否必填	说明
select_expr1	必填	格式为`col1_name, col2_name, 正则表达式, ...`，表示待查询的普通列、分区列或正则表达式
table_name1、table_name2	必填	表的名称
select_expr2、select_expr3	必填	表示table_name1和table_name2互相映射的列名
col_name	必填	表的列名

SQL注意事项

使用NOT IN的子查询时，在子查询的返回结果中会自动去除NULL值的记录

SQL代码示例

--使用格式1子查询语法
--创建一张新表shop1并追加数据。
create table shop1 as select shop_name,customer_id,total_price from sale_detail;
insert into shop1 values ('s8','c1',100.1);

select * from shop1 where shop_name not in (select shop_name from sale_detail);

--返回结果
+------------+-------------+-------------+
| shop_name  | customer_id | total_price |
+------------+-------------+-------------+
| s8         | c1          | 100.1       |
+------------+-------------+-------------+

--使用格式2子查询语法
select * from shop1 where shop_name not in (select shop_name from sale_detail where customer_id = shop1.customer_id);

--返回结果
+------------+-------------+-------------+
| shop_name  | customer_id | total_price |
+------------+-------------+-------------+
| s8         | c1          | 100.1       |
+------------+-------------+-------------+

--not in subquery不作为join条件（where 中包含 and， 所以无法转换为anti join）
select * from shop1 where shop_name not in (select shop_name from sale_detail) and total_price < 100.3;

--返回结果
+------------+-------------+-------------+
| shop_name  | customer_id | total_price |
+------------+-------------+-------------+
| s8         | c1          | 100.1       |
+------------+-------------+-------------+

--假设查询表中有任意一行为NULL，则无数据返回
--创建一张新表sale并追加数据。
create table if not exists sale
(
shop_name     string,
customer_id   string,
total_price   double
)
partitioned by (sale_date string, region string);
alter table sale add partition (sale_date='2013', region='china');
insert into sale partition (sale_date='2013', region='china') values ('null','null',null),('s2','c2',100.2),('s3','c3',100.3),('s8','c8',100.8);

set odps.sql.allow.fullscan=true;
select * from sale where shop_name not in (select shop_name from sale_detail);

--返回结果
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
+------------+-------------+-------------+------------+------------+

EXISTS SUBQUERY

使用exists subquery时，当子查询中有至少一行数据时，返回True，否则返回False
MaxCompute只支持含有Correlated条件的where子查询。exists subquery实现的方式是转换为left semi join。

SQL代码格式

select <select_expr> from <table_name1> where exists (select <select_expr> from <table_name2> where <table_name2_colname> = <table_name1>.<colname>);

参数	是否必填	说明
select_expr	必填	格式为`col1_name, col2_name, 正则表达式, ...`，表示待查询的普通列、分区列或正则表达式
table_name1、table_name2	必填	表的名称
col_name	必填	表的列名

SQL注意事项

使用EXISTS的子查询时，在子查询的返回结果会自动去除NULL值的记录

SQL代码示例

select * from sale_detail where exists (select * from shop where customer_id = sale_detail.customer_id);
--等效于以下语句。
select * from sale_detail a left semi join shop b on a.customer_id = b.customer_id;

--返回结果
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| null       | c5          | NULL        | 2014       | shanghai   |
| s6         | c6          | 100.4       | 2014       | shanghai   |
| s7         | c7          | 100.5       | 2014       | shanghai   |
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

NOT EXISTS SUBQUERY

使用not exists subquery时，当子查询中无数据时，返回True，否则返回False
MaxCompute只支持含有Correlated条件的where子查询。not exists subquery实现的方式是转换为left anti join。

SQL代码格式

select <select_expr> from <table_name1> where not exists (select <select_expr> from <table_name2> where <table_name2_colname> = <table_name1>.<colname>);

参数	是否必填	说明
select_expr	必填	格式为`col1_name, col2_name, 正则表达式, ...`，表示待查询的普通列、分区列或正则表达式
table_name1、table_name2	必填	表的名称
col_name	必填	表的列名

SQL注意事项

使用NOT EXISTS的子查询时，在子查询的返回结果中会自动去除NULL值的记录。

SQL代码示例

select * from sale_detail where not exists (select * from shop where shop_name = sale_detail.shop_name);
--等效于以下语句。
select * from sale_detail a left anti join shop b on a.shop_name = b.shop_name;

--返回结果
+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
+------------+-------------+-------------+------------+------------+

ODPS 表的基本操作：查询表(1)

查询表基本语法：SELECT语法

总体格式

SQL代码格式

源数据示例

WITH子句（Common Table Expression, cte）：可选

CTE使用规则

错误命令示例

正确命令示例

列表达式（SELECT_expr）：必填

列表达式使用规则

排除列（except_expr）：可选

修改列（replace_expr）：可选

目标表信息（table_reference）：必填

WHERE子句（where_condition）：可选

GROUP BY分组查询（col_list）：可选

group by 使用规则

group by使用示例

HAVING子句（having_condition）：可选

ORDER BY全局排序（order_condition）：可选

order by使用规则

order by使用示例

DISTRIBUTE BY哈希分片（distribute_condition）：可选

SORT BY局部排序（sort_condition）：可选

sort by使用规则

LIMIT限制输出行数（number）：可选

子查询（SUBQUERY）

总体格式

基础子查询

IN SUBQUERY

SQL命令格式

SQL注意事项

SQL代码示例

NOT IN SUBQUERY

SQL注意事项

SQL代码示例

EXISTS SUBQUERY

SQL代码格式

SQL注意事项

SQL代码示例

NOT EXISTS SUBQUERY

SQL代码格式

SQL注意事项

SQL代码示例

`group by` 使用规则

`group by`使用示例

`order by`使用规则

`order by`使用示例

`sort by`使用规则