如何在使用窗口函数的时候使用groupby
如何在使用窗口函数的时候使用groupby
先说结论:groupby的执行先于窗口函数,也就是说对于非groupby字段而言,聚合函数先于窗口函数执行。过去写的sql很少在使用group by操作的同时使用窗口函数,主要是这种情况少见且在写sql时会刻意规避同时使用的情况。但是,同时使用两者在解决部分查询问题时能提供很大的遍历。今天回顾过去写的sql时发现这么一道必须同时使用窗口函数和groupby的题目。牛客网sql进阶中第五小结SQL1。
题目如下
现有试卷信息表examination_info(exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间):
id | exam_id | tag | difficulty | duration | release_time |
---|---|---|---|---|---|
1 | 9001 | SQL | hard | 60 | 2021-09-01 06:00:00 |
2 | 9002 | SQL | hard | 60 | 2021-09-01 06:00:00 |
3 | 9003 | 算法 | medium | 80 | 2021-09-01 10:00:00 |
试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):
id | uid | exam_id | start_time | submit_time | score |
---|---|---|---|---|---|
1 | 1001 | 9001 | 2021-09-01 09:01:01 | 2021-09-01 09:31:00 | 78 |
2 | 1002 | 9001 | 2021-09-01 09:01:01 | 2021-09-01 09:31:00 | 81 |
3 | 1002 | 9002 | 2021-09-01 12:01:01 | 2021-09-01 12:31:01 | 81 |
4 | 1003 | 9001 | 2021-09-01 19:01:01 | 2021-09-01 19:40:01 | 86 |
5 | 1003 | 9002 | 2021-09-01 12:01:01 | 2021-09-01 12:31:51 | 89 |
6 | 1004 | 9001 | 2021-09-01 19:01:01 | 2021-09-01 19:30:01 | 85 |
7 | 1005 | 9003 | 2021-09-01 12:01:01 | 2021-09-01 12:31:02 | 85 |
8 | 1006 | 9003 | 2021-09-07 10:01:01 | 2021-09-07 10:21:01 | 84 |
9 | 1003 | 9003 | 2021-09-08 12:01:01 | 2021-09-08 12:11:01 | 40 |
10 | 1003 | 9002 | 2021-09-01 14:01:01 | (NULL) | (NULL) |
找到每类试卷得分的前3名,如果两人最大分数相同,选择最小分数大者,如果还相同,选择uid大者。由示例数据结果输出如下:
tid | uid | ranking |
---|---|---|
SQL | 1003 | 1 |
SQL | 1004 | 2 |
SQL | 1002 | 3 |
算法 | 1005 | 1 |
算法 | 1006 | 2 |
算法 | 1003 | 3 |
解决方法
这道题其实逻辑很简单:需要先求出不同种类试卷中不同用户得分的最大值和最小值:
select exam_record.uid,examination_info.tag,
max(max(exam_record.score)) over(partition by examination_info.tag,exam_record.uid) max_score,
min(min(exam_record.score)) over(partition by examination_info.tag,exam_record.uid) min_score
from exam_record
inner join examination_info
on exam_record.exam_id = examination_info.exam_id
where exam_record.score is not null
group by exam_record.uid,examination_info.tag
此时你会发现这里必须同时使用group by和窗口函数,那么问题来了,是group by先执行还是窗口函数先执行呢?答案是groupby先执行。上面的sql中max(exam_record.score)将先执行一次,再由max(max(exam_record.score))执行窗口函数响应的操作。其实这不是最优写法,最优写法如下:
select exam_record.uid,examination_info.tag,
row_number() over (partition by tag order by tag, max(score) desc, min(score) desc, exam_record.uid desc) as ranking
from exam_record join examination_info
on exam_record.exam_id = examination_info.exam_id
group by examination_info.tag, exam_record.uid
这种写法更简洁,但不熟悉窗口函数的同学可能看到会很晕,然后当自己同时使用窗口函数和groupby时一写就报错。事实上,只需要在写sql时注意groupby后面没有出现的字段在select中使用聚合函数进行求解。聚合函数会先于窗口函数执行。
本题最终答案如下:
解法一(本人自己想的解法,主要是一开始没有掌握好聚合函数和窗口函数运行的时机):
select t2.tag,t2.uid,t2.trank from (
select t1.uid,t1.tag,row_number() over(partition by t1.tag order by t1.max_score desc ,t1.min_score desc,t1.uid desc) trank from(
select exam_record.uid,examination_info.tag,max(max(exam_record.score)) over(partition by examination_info.tag,exam_record.uid) max_score,
min(min(exam_record.score)) over(partition by examination_info.tag,exam_record.uid) min_scorefrom exam_record
inner join examination_info
on exam_record.exam_id = examination_info.exam_id
where exam_record.score is not null
group by exam_record.uid,examination_info.tag
) t1
)t2
where t2.trank <=3;
解法二:
select tag, uid, ranking
from(select tag, e_r.uid,row_number() over (partition by tag order by tag, max(score) desc, min(score) desc, e_r.uid desc) as rankingfrom exam_record e_r join examination_info e_ion e_r.exam_id = e_i.exam_idgroup by tag, e_r.uid
)ranktable
where ranking <= 3