经典面试题:部门最高工资的的员工在PostgreSQL(或openGuass)有了新SQL语法
子曰:“君子谋道不谋食。耕也,馁在其中矣;学也,禄在其中矣。君子忧道不忧贫。” , 回顾我的DBA生涯,从2006接触管理SQL Server 2000到2009年开始管理Oracle, 只问耕耘,错过了一些跳跃机会,清晰记的2009年1次想跳槽面试的一次经历,当时就有这道笔试题,我写了3种答案,记的答题的A4纸只有我多写了一张,最后听HR给我说他们技术主管上轮对我很满意(工资double) 。后来因为我一些个人原因拒绝了Offer, 主管和HR又给我打过几次电话,表示如果还想去随时直接可以联系他,HR表示那个主管是老板亲戚他看上的人在公司后期发展会不错(O(∩_∩)O哈!), 我是受宠若惊,至今都感激那位主管的信任, 被认可真的很幸福,次年再看到那公司的消息是国j总理去该公司视察工作。
回到这个题通常是用SQL列出部门最高的人或前几名员工(dept和emp表关联), 这里我们主要演示在PostgreSQL中的新语法,只列一张emp表
anbob=# \d emp Table "public.emp" Column | Type | Modifiers ------------+-----------------------+----------- id | integer | first_name | character varying(50) | last_name | character varying(50) | dept | character varying(50) | email | character varying(50) | salary | integer | anbob=# select * from emp limit 10; id | first_name | last_name | dept | email | salary ----+------------+-------------+-------------+-------------------------------+-------- 1 | Roxana | Spadelli | Garden | rspadelli0@house.gov | 7241 2 | Thorpe | Fiander | Jewelry | tfiander1@who.int | 6591 3 | Raffarty | Goranov | Health | rgoranov2@businessinsider.com | 6996 4 | Kinna | Nightingale | Health | knightingale3@techcrunch.com | 7922 5 | Sydel | Toleman | Books | stoleman4@forbes.com | 9000 6 | Roddie | Rappa | Movies | rrappa5@sakura.ne.jp | 8473 7 | Rodrick | MacMoyer | Tools | rmacmoyer6@bloglines.com | 9101 8 | Clare | Banisch | Grocery | cbanisch7@tumblr.com | 6504 9 | Joyan | Rooke | Electronics | jrooke8@slideshare.net | 9404 10 | Leoine | Aldham | Health | laldham9@ifeng.com | 5498 (10 rows)
求每个部门工资最高的员工
基础的可能是
SELECT * FROM emp WHERE (dept, salary) IN ( SELECT dept, MAX(salary) FROM emp GROUP BY dept ); id | first_name | last_name | dept | email | salary ----+------------+--------------+-------------+--------------------------------+-------- 2 | Thorpe | Fiander | Jewelry | tfiander1@who.int | 6591 4 | Kinna | Nightingale | Health | knightingale3@techcrunch.com | 7922 5 | Sydel | Toleman | Books | stoleman4@forbes.com | 9000 6 | Roddie | Rappa | Movies | rrappa5@sakura.ne.jp | 8473 7 | Rodrick | MacMoyer | Tools | rmacmoyer6@bloglines.com | 9101 8 | Clare | Banisch | Grocery | cbanisch7@tumblr.com | 6504 9 | Joyan | Rooke | Electronics | jrooke8@slideshare.net | 9404 11 | Chucho | Slark | Clothing | cslarka@so-net.ne.jp | 9753 14 | Lion | Duggleby | Baby | ldugglebyd@yellowbook.com | 9642 15 | Mordy | Carss | Computers | mcarsse@symantec.com | 7768 16 | Darin | Bernakiewicz | Music | dbernakiewiczf@istockphoto.com | 6802 17 | Michale | Rochford | Shoes | mrochfordg@nydailynews.com | 5402 19 | Doralynne | Fley | Garden | dfleyi@devhub.com | 9823 20 | Laurie | Hauxby | Toys | lhauxbyj@google.fr | 5412
Note:
注意这里如果同一个dept最大的值有相同的可能会出现多条记录。
如是你更进一步,学了分析Window函数可以这样
WITH v_rnk AS ( SELECT rank() OVER ( PARTITION BY dept ORDER BY salary DESC ) AS rn, * FROM emp ) SELECT * FROM v_rnk WHERE rn = 1; anbob-# ; rn | id | first_name | last_name | dept | email | salary ----+----+------------+--------------+-------------+--------------------------------+-------- 1 | 14 | Lion | Duggleby | Baby | ldugglebyd@yellowbook.com | 9642 1 | 5 | Sydel | Toleman | Books | stoleman4@forbes.com | 9000 1 | 11 | Chucho | Slark | Clothing | cslarka@so-net.ne.jp | 9753 1 | 15 | Mordy | Carss | Computers | mcarsse@symantec.com | 7768 1 | 9 | Joyan | Rooke | Electronics | jrooke8@slideshare.net | 9404 1 | 19 | Doralynne | Fley | Garden | dfleyi@devhub.com | 9823 1 | 8 | Clare | Banisch | Grocery | cbanisch7@tumblr.com | 6504 1 | 4 | Kinna | Nightingale | Health | knightingale3@techcrunch.com | 7922 1 | 2 | Thorpe | Fiander | Jewelry | tfiander1@who.int | 6591 1 | 6 | Roddie | Rappa | Movies | rrappa5@sakura.ne.jp | 8473 1 | 16 | Darin | Bernakiewicz | Music | dbernakiewiczf@istockphoto.com | 6802 1 | 17 | Michale | Rochford | Shoes | mrochfordg@nydailynews.com | 5402 1 | 7 | Rodrick | MacMoyer | Tools | rmacmoyer6@bloglines.com | 9101 1 | 20 | Laurie | Hauxby | Toys | lhauxbyj@google.fr | 5412 (14 rows)
Note:
这里使用了rank,同样如果有重名的会出现多个,如果你只想要一条记录,可以换成row_number(),如果是去前几名时还有dense_rank(), 要注意三者的区别:row_number排序唯一1,2,3,rank可能出现1,1,3,而dense_rank是1,1,2., 如果不需要rn列,记的SQL中列出字段名。
上面是SQL标准,在所有支持SQL的RDBMS中应该是都支持的,但是在PostgreSQL中有一个独特的写法”distinct on (…)”,又可以这样
anbob=# select distinct on (dept) * from emp order by dept,salary asc; id | first_name | last_name | dept | email | salary ----+------------+--------------+-------------+--------------------------------+-------- 12 | Dyanne | Petrou | Baby | dpetroub@accuweather.com | 8369 5 | Sydel | Toleman | Books | stoleman4@forbes.com | 9000 11 | Chucho | Slark | Clothing | cslarka@so-net.ne.jp | 9753 15 | Mordy | Carss | Computers | mcarsse@symantec.com | 7768 9 | Joyan | Rooke | Electronics | jrooke8@slideshare.net | 9404 1 | Roxana | Spadelli | Garden | rspadelli0@house.gov | 7241 18 | Ibrahim | Menichelli | Grocery | imenichellih@oakley.com | 5077 10 | Leoine | Aldham | Health | laldham9@ifeng.com | 5498 2 | Thorpe | Fiander | Jewelry | tfiander1@who.int | 6591 6 | Roddie | Rappa | Movies | rrappa5@sakura.ne.jp | 8473 16 | Darin | Bernakiewicz | Music | dbernakiewiczf@istockphoto.com | 6802 17 | Michale | Rochford | Shoes | mrochfordg@nydailynews.com | 5402 7 | Rodrick | MacMoyer | Tools | rmacmoyer6@bloglines.com | 9101 20 | Laurie | Hauxby | Toys | lhauxbyj@google.fr | 5412 (14 rows)
Note:
语法更加简洁, PostgreSQL在DB层做的更多,这是PG的非标准SQL. 难道也像之前《Oracle 12c new feature:OFFSET n FETCH n row-limit》测试oracle一样只是SQL文本简洁,还是会转换成类似WINDOW函数一样吗?我们看一下这三种写法的执行计划。
执行计划对比
anbob=# explain (analyze,buffers) SELECT * FROM emp WHERE (dept, salary) IN ( SELECT dept, MAX(salary) FROM emp GROUP BY dept ) anbob-# ; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Hash Semi Join (cost=1.79..3.27 rows=14 width=53) (actual time=0.213..0.224 rows=14 loops=1) Hash Cond: (((public.emp.dept)::text = (public.emp.dept)::text) AND (public.emp.salary = (max(public.emp.salary)))) (Buffers: shared hit=2) -> Seq Scan on emp (cost=0.00..1.20 rows=20 width=53) (actual time=0.012..0.013 rows=20 loops=1) (Buffers: shared hit=1) -> Hash (cost=1.58..1.58 rows=14 width=11) (actual time=0.037..0.037 rows=14 loops=1) Buckets: 32768 Batches: 1 Memory Usage: 1kB (Buffers: shared hit=1) -> HashAggregate (cost=1.30..1.44 rows=14 width=15) (actual time=0.028..0.028 rows=14 loops=1) Group By Key: public.emp.dept (Buffers: shared hit=1) -> Seq Scan on emp (cost=0.00..1.20 rows=20 width=11) (actual time=0.005..0.006 rows=20 loops=1) (Buffers: shared hit=1) Total runtime: 0.391 ms (14 rows) anbob=# explain WITH v_rnk AS ( SELECT rank() OVER ( PARTITION BY dept ORDER BY salary DESC ) AS rn, * FROM emp ) SELECT * FROM v_rnk WHERE rn = 1 anbob-# ; QUERY PLAN ----------------------------------------------------------------------- Subquery Scan on v_rnk (cost=1.63..2.28 rows=1 width=61) Filter: (v_rnk.rn = 1) -> WindowAgg (cost=1.63..2.03 rows=20 width=53) -> Sort (cost=1.63..1.68 rows=20 width=53) Sort Key: emp.dept, emp.salary DESC -> Seq Scan on emp (cost=0.00..1.20 rows=20 width=53) (6 rows) anbob=# explain (analyze,buffers) select distinct on (dept) * from emp order by dept,salary asc; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Unique (cost=1.63..1.73 rows=14 width=53) (actual time=0.039..0.047 rows=14 loops=1) -> Sort (cost=1.63..1.68 rows=20 width=53) (actual time=0.037..0.039 rows=20 loops=1) Sort Key: dept, salary Sort Method: quicksort Memory: 27kB (Buffers: shared hit=1) -> Seq Scan on emp (cost=0.00..1.20 rows=20 width=53) (actual time=0.008..0.010 rows=20 loops=1) (Buffers: shared hit=1) Total runtime: 0.109 ms (8 rows)
Note:
使用explain analyze对比,似乎使用with Window函数的没有列出关于time的stats, 但是从执行计划看PG的distinct on 并不是简单的语法转换,从cost或仅有的time看,PG的这种独特的写法似乎效率是最高的。
ok, 如果你再遇到这种面试题是不防写上这种答案,也可能会成为你的加分项! enjoy!
对不起,这篇文章暂时关闭评论。