How to display the presence and absence of nth-highest group-wise values in SQL

Digital creative of a browser on the internet

While skimming through SQL to prepare for interviews, I often come across this question: Find the employee with the highest or (second-highest) salary by joining a table containing employee information with another that contains department information. This raises a further question: What about finding the employee who earns the nth-highest salary department-wide?

Our favorite resources about open source

Git cheat sheet

Advanced Linux commands cheat sheet

Open source alternatives

Free online course: RHEL technical overview

Register for your free Red Hat account

Check out more cheat sheets

Now I want to pose a more complex scenario: What will happen when a department doesn't have an employee earning the nth-highest salary? For example, a department with only two employees will not have an employee earning the third-highest salary.

Here's my approach to this question:

Create department and employee tables

I create a table that includes fields such as dept_id and dept_name.

CREATE TABLE department (
    dept_id int,
    dept_name varchar(60)
);

Now I insert various departments into the new table.

INSERT INTO department (dept_id,dept_name)
VALUES (780,'HR');
INSERT INTO department (dept_id,dept_name)
VALUES (781,'Marketing');
INSERT INTO department (dept_id,dept_name)
VALUES (782,'Sales');
INSERT INTO department (dept_id,dept_name)
VALUES (783,'Web Dev');

A table showing the data from the earlier code snippets with the columns "Department ID" and "Department Name"

Image by:

Figure 1. The department table (Mohammed Kamil Khan, CC BY-SA 4.0)

Next, I create another table incorporating the fields first_name, last_name, dept_id, and salary.

CREATE TABLE employee (
    first_name varchar(100),
    last_name varchar(100),
    dept_id int,
    salary int
);

Then I insert values into the table:

INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Sam','Burton',781,80000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Peter','Mellark',780,90000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Happy','Hogan',782,110000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Steve','Palmer',782,120000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Christopher','Walker',783,140000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Richard','Freeman',781,85000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Alex','Wilson',782,115000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Harry','Simmons',781,90000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Thomas','Henderson',780,95000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Ronald','Thompson',783,130000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('James','Martin',783,135000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Laurent','Fisher',780,100000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Tom','Brooks',780,85000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Tom','Bennington',783,140000);

A table showing data from the earlier code snippets with first name, last name, dept ID, and salary columns, ordered by department ID number

Image by:

Figure 2. A table of employees ordered by department ID (Mohammed Kamil Khan, CC BY-SA 4.0)

I can infer the number of employees in each department using this table (department ID:number of employees):

780:4
781:3
782:3
783:4

If I want the view the second-highest-earning employees from different departments, along with their department's name (using DENSE_RANK), the table will be as follows:

A table with department ID, department name, first name, last name, and salary columns, listing the second-highest-earning employee in each of four departments, ordered from lowest to highest salary

Image by:

Figure 3. The second-highest-earning employee in each department (Mohammed Kamil Khan, CC BY-SA 4.0)

If I apply the same query to find the fourth-highest-earning employees, the output will be only Tom Brooks of department 780 (HR), with a salary of $85,000.

The table listing fourth-highest-earning employees lists only one employee.

Image by:

Figure 4. The fourth-highest-earning employee (Mohammed Kamil Khan, CC BY-SA 4.0)

Though department 783 (Web Dev) has four employees, two (James Martin and Ronald Thompson) will be classified as the third-highest-earning employees of that department, since the top two earners have the same salary.

Finding the nth highest

Now, to the main question: What if I want to display the dept_ID and dept_name with null values for employee-related fields for departments that do not have an nth-highest-earning employee?

The list of fourth-highest-earning employee by department, showing "null" in the first name, last name, and salary columns for departments that do not have a fourth-highest earner.

Image by:

Figure 5. All departments listed, whether or not they have an nth-highest-earning employee (Mohammed Kamil Khan, CC BY-SA 4.0)

The table displayed in Figure 5 is what I am aiming to obtain when specific departments do not have an nth-highest-earning employee: The marketing, sales, and web dev departments are listed, but the name and salary fields contain a null value.

The ultimate query that helps obtain the table in Figure 5 is as follows:

SELECT * FROM (WITH null1 AS (select A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
from (SELECT * FROM (
SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
FROM employee INNER JOIN department
ON employee.dept_id=department.dept_id) AS k
WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT null)),
nulled AS(SELECT
CASE WHEN null1.dept_id IS NULL THEN full1.dept_id ELSE null1.dept_id END,
CASE WHEN null1.dept_name IS NULL THEN full1.dept_name ELSE null1.dept_name END,
first_name,last_name,salary
FROM null1 RIGHT JOIN full1 ON null1.dept_id=full1.dept_id)
SELECT * from null1
UNION
SELECT * FROM nulled
ORDER BY dept_id)
B;

Breakdown of the query

I will break down the query to make it less overwhelming.

Use DENSE_RANK() to display employee and department information (not involving null for the absence of the nth-highest-earning member):

SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4

Output:

A table of the fourth-highest earners showing only the department with a fourth-highest earner

Image by:

Figure 6. The fourth-highest earner (Mohammed Kamil Khan, CC BY-SA 4.0)

Exclude the rank1 column from the table in Figure 6, which identifies only one employee with a fourth-highest salary, even though there are four employees in another department.

select A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
    from (SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4)A

Output:

The fourth-highest earner table (table six) without the rank 1 column

Image by:

Figure 7. The fourth-highest earner table without the rank 1 column (Mohammed Kamil Khan, CC BY-SA 4.0)

Point out the departments from the department table that do not have an nth-highest-earning employee:

SELECT * FROM (WITH null1 AS (select A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
    from (SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT null))
SELECT * FROM full1)B

Output:

The full1 table listing the departments without a fourth-highest earner by department ID and name: marketing, sales, web dev

Image by:

Figure 8. The full1 table listing the departments without a fourth-highest earner (Mohammed Kamil Khan, CC BY-SA 4.0)

Replace full1 in the last line of the above code with null1:

SELECT * FROM (WITH null1 AS (select A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
    from (SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT null))
SELECT * FROM null1)B

Image by:

Figure 9. The null1 table listing all departments, with null values for those without a fourth-highest earner (Mohammed Kamil Khan, CC BY-SA 4.0)

Now, I fill the null values of the dept_id and dept_name fields in Figure 9 with the corresponding values from Figure 8.

SELECT * FROM (WITH null1 AS (select A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
    from (SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT null)),
nulled AS(SELECT
CASE WHEN null1.dept_id IS NULL THEN full1.dept_id ELSE null1.dept_id END,
CASE WHEN null1.dept_name IS NULL THEN full1.dept_name ELSE null1.dept_name END,
first_name,last_name,salary
FROM null1 RIGHT JOIN full1 ON null1.dept_id=full1.dept_id)
SELECT * from nulled) B;

The table with department id, department name, first name, last name, and salary columns, with null values in the name and salary columns

Image by:

Figure 10. The result of the nulled query (Mohammed Kamil Khan, CC BY-SA 4.0)

The nulled query uses CASE WHEN on the nulls encountered in the dept_id and dept_name columns of the null1 table and replaces them with the corresponding values in the full1 table. Now all I need to do is apply UNION to the tables obtained in Figure 7 and Figure 10. This can be accomplished by declaring the last query in the previous code using WITH and then UNION-izing it with null1.

SELECT * FROM (WITH null1 AS (select A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
from (SELECT * FROM (
SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
FROM employee INNER JOIN department
ON employee.dept_id=department.dept_id) AS k
WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT null)),
nulled AS(SELECT
CASE WHEN null1.dept_id IS NULL THEN full1.dept_id ELSE null1.dept_id END,
CASE WHEN null1.dept_name IS NULL THEN full1.dept_name ELSE null1.dept_name END,
first_name,last_name,salary
FROM null1 RIGHT JOIN full1 ON null1.dept_id=full1.dept_id)
SELECT * from null1
UNION
SELECT * FROM nulled
ORDER BY dept_id)
B;

The complete table: department ID, department name, first name, last name, salary columns. The first row contains the information of the one fourth-highest earner, and the next three columns show the remaining departments, with ID, and null value in the other three columns.

Image by:

Figure 11. The final result (Mohammed Kamil Khan, CC BY-SA 4.0)

Now I can infer from Figure 11 that marketing, sales, and web dev are the departments that do not have any employees earning the fourth-highest salary.