Database Language: SQL Server
Difficulty: Medium
| Column Name | Type | | ----------- | ---- | | id | int | | salary | int |
`id` is the primary key (column with unique values) for this table. Each row of this table contains information about the salary of an employee.
Write a solution to find the second highest salary from the Employee table. If there is no second highest salary, return null (return None in Pandas).
The result format is in the following example.
Employee table:
| id | salary | | -- | ------ | | 1 | 100 | | 2 | 200 | | 3 | 300 |
| SecondHighestSalary | | ------------------- | | 200 |
Employee table:
| id | salary | | -- | ------ | | 1 | 100 |
| SecondHighestSalary | | ------------------- | | NULL |
CREATE TABLE [dbo].[Employee] (id INT PRIMARY KEY, salary INT); TRUNCATE TABLE [dbo].[Employee]; INSERT INTO [dbo].[Employee] (id, salary) VALUES ('1', '100'); INSERT INTO [dbo].[Employee] (id, salary) VALUES ('2', '200'); INSERT INTO [dbo].[Employee] (id, salary) VALUES ('3', '300');
Based on the requirement of finding the second highest salary, one of the first things that may come into mind is the `OFFSET ... FETCH NEXT` clause of the `SELECT` statement.
SELECT DISTINCT salary AS SecondHighestSalary FROM Employee ORDER BY salary DESC OFFSET 1 ROW FETCH NEXT 1 ROWS ONLY;
What this query does is to select all the distinct salary from the `Employee` table (`SELECT DISTINCT salary FROM Employee`), then sort the salary in descending order (`ORDER BY salary DESC`), then return just 1 row (`FETCH NEXT 1 ROWS ONLY`) but skipping the first row (`OFFSET 1 ROW`).
This query will return the second highest salary if there is more than 1 unique salary in the `Employee` table. But if there is only 1 unique salary in the `Employee` table, this query will return an empty set instead of a NULL value, which is what the question requires. To overcome this issue, a NULL salary can be included in the list of salaries using the UNION ALL statement:
SELECT DISTINCT salary FROM Employee UNION ALL SELECT NULL AS salary
Now that a second-highest salary will always exist, the original query above can be modified as follows:
# Final Solution 1 Query - Using OFFSET ... FETCH NEXT SELECT salary AS SecondHighestSalary FROM (SELECT DISTINCT salary FROM Employee UNION ALL SELECT NULL AS salary) emp_salaries ORDER BY salary DESC OFFSET 1 ROW FETCH NEXT 1 ROWS ONLY;
Another way of writing the query is with the use of a common-table expression (CTE) wherein the list of unique salaries are created as a CTE and referenced in the main `SELECT` statement:
# Final Solution 1 Query - Using OFFSET ... FETCH NEXT and CTE WITH emp_salaries AS ( SELECT DISTINCT salary FROM Employee UNION ALL SELECT NULL AS salary ) SELECT salary AS SecondHighestSalary FROM emp_salaries ORDER BY salary DESC OFFSET 1 ROW FETCH NEXT 1 ROWS ONLY;
The fastest run-time for this updated query that uses a common-table expression (CTE) was the same as the one above:
Runtime: 332ms
Beats: 98.01% as of July 16, 2024
This can be explained by the query plan generated by SQL Server, which was the same for both queries:
|--Top(OFFSET EXPRESSION:((1)),TOP EXPRESSION:((1))) |--Merge Join(Concatenation) |--Sort(DISTINCT ORDER BY:([leetcode].[dbo].[Employee].[salary] DESC)) | |--Clustered Index Scan(OBJECT:([leetcode].[dbo].[Employee].[PK_Employee])) |--Constant Scan(VALUES:((NULL)))
Another way of finding the second highest salary is with the use of `RANK`, `DENSE_RANK` or even the `ROW_NUMBER` window functions. If the salary values being "ranked" are all unique, then the output of the `RANK`, `DENSE_RANK` and `ROW_NUMBER` window functions will be the same.
Just like the case in the first solution, the first step is to get all the unique salaries from the `Employee` table:
SELECT DISTINCT salary FROM Employee
The next step is to rank these salaries from highest to lowest:
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) as ranking FROM (SELECT DISTINCT salary FROM Employee) emp_salaries
Last step is to return the second highest salary:
SELECT salary AS SecondHighestSalary FROM (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) as ranking FROM (SELECT DISTINCT salary FROM Employee) emp_salaries) salary_ranking WHERE ranking = 2;
Just like the issue in the first solution, if there is only 1 unique salary in the `Employee` table, this query will return an empty set instead of a NULL value, which is the required output. To overcome this issue, a NULL salary can be included in the list of salaries using the UNION ALL statement:
# Final Solution 2 Query - Using DENSE_RANK Window Function SELECT salary AS SecondHighestSalary FROM (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) as ranking FROM (SELECT DISTINCT salary FROM Employee UNION ALL SELECT NULL AS salary) emp_salaries) salary_ranking WHERE ranking = 2;
The previous query can be re-written using common-table expressions (CTEs), one for each of the sub-queries:
# Final Solution 2 Query - Using DENSE_RANK Window Function and CTE WITH emp_salaries AS ( SELECT DISTINCT salary FROM Employee UNION ALL SELECT NULL AS salary ), salary_ranking AS ( SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS ranking FROM emp_salaries ) SELECT salary AS SecondHighestSalary FROM salary_ranking WHERE ranking = 2
As mentioned earlier, if the salaries being "ranked" are all unique, which is the case here since only the `DISTINCT salary` is being ranked, then the `DENSE_RANK` in the query can be replaced by either the `RANK` or `ROW_NUMBER` window function and the result will be the same.
Here's the query plan generated by SQL Server for both of the queries above for the second solution:
|--Filter(WHERE:([Expr1004]=(2))) |--Sequence Project(DEFINE:([Expr1004]=dense_rank)) |--Segment |--Segment |--Merge Join(Concatenation) |--Sort(DISTINCT ORDER BY:([leetcode].[dbo].[Employee].[salary] DESC)) | |--Clustered Index Scan(OBJECT:([leetcode].[dbo].[Employee].[PK_Employee])) |--Constant Scan(VALUES:((NULL)))
Runtime: 328ms
Beats: 99.16% as of July 16, 2024
There is yet a third way of getting the second highest salary from the `Employee` table without the use of a window function such as `RANK`, `DENSE_RANK` or `ROW_NUMBER` and without the use of the `OFFSET ... FETCH NEXT` clause of the `SELECT` statement.
Another way of looking at the question is, what is the highest salary that is less than the highest salary. The highest salary from the `Employee` table can be derived by the following query:
SELECT MAX(salary) FROM Employee
This query just needs to be plugged into another query that will get the highest salary that is less than the output of this query and that query will look as follows:
# Final Solution 3 Query - Using MAX Aggregate Function and Subquery SELECT MAX(salary) AS SecondHighestSalary FROM Employee WHERE Employee.salary < (SELECT MAX(salary) FROM Employee)
If there is no salary that is less than the highest salary, then this query will return a NULL value and not an empty set. The advantage of this solution is that it does not need to use a common-table expression (CTE) and it doesn't need to "insert" a NULL value to the list of salaries to overcome the issue of having only 1 unique salary.
Here's the query plan for this query:
|--Stream Aggregate(DEFINE:([Expr1006]=MAX([leetcode].[dbo].[Employee].[salary]))) |--Nested Loops(Inner Join, WHERE:([leetcode].[dbo].[Employee].[salary]<[Expr1004])) |--Stream Aggregate(DEFINE:([Expr1004]=MAX([leetcode].[dbo].[Employee].[salary]))) | |--Clustered Index Scan(OBJECT:([leetcode].[dbo].[Employee].[PK_Employee])) |--Clustered Index Scan(OBJECT:([leetcode].[dbo].[Employee].[PK_Employee]))
And here's the faster runtime for the query:
Runtime: 337ms
Beats: 95.30% as of July 16, 2024
Here's the comparison of the fastest runtime for each of the solutions.
| Solution # | Runtime | Beats | | -------------------------------------------------------- | ------- | ------ | | 1 - Using OFFSET ... FETCH NEXT | 332ms | 98.01% | | 2 - Using RANK. DENSE_RANK or ROW_NUMBER Window Function | 328ms | 99.16% | | 3 - Using MAX Aggregate Function and Subquery | 337ms | 95.30% |
Interestingly, the one that generated the fastest runtime is the second solution. It would have been expected that the third solution would have generated the fastest runtime given that it didn't need to create a "temporary" table that includes an extra NULL salary.