LeetCode 584 - Find Customer Referee

Database Language: SQL Server

Difficulty: Easy

Problem Description

Input

Table: Customer

| Column Name | Type    |
| ----------- | ------- |
| id          | int     |
| name        | varchar |
| referee_id  | int     |

In SQL, `id` is the primary key column for this table.

Each row of this table indicates the id of a customer, their name, and the id of the customer who referred them.

Requirement

Find the names of the customer that are not referred by the customer with id = 2.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1

Input

Customer table:

| id | name | referee_id |
| -- | ---- | ---------- |
| 1  | Will | NULL       |
| 2  | Jane | NULL       |
| 3  | Alex | 2          |
| 4  | Bill | NULL       |
| 5  | Zack | 1          |
| 6  | Mark | 2          |
Output
| name |
| ---- |
| Will |
| Jane |
| Bill |
| Zack |

SQL Schema

CREATE TABLE Customer (id INT PRIMARY KEY, name VARCHAR(25), referee_id INT);

TRUNCATE TABLE Customer;
INSERT INTO Customer (id, name, referee_id) values ('1', 'Will', NULL);
INSERT INTO Customer (id, name, referee_id) values ('2', 'Jane', NULL);
INSERT INTO Customer (id, name, referee_id) values ('3', 'Alex', '2');
INSERT INTO Customer (id, name, referee_id) values ('4', 'Bill', NULL);
INSERT INTO Customer (id, name, referee_id) values ('5', 'Zack', '1');
INSERT INTO Customer (id, name, referee_id) values ('6', 'Mark', '2');

Solutions

There are three possible ways of finding the names of customers that are not referred by the customer with id = 2, namely:

Solution 1 - Using OR Operator

To find the names of customers that are not referred by the customer with id = 2, all that needs to be done is check if the `referee_id` is not equal to 2. This can be achieved by the following query:

SELECT name FROM Customer
WHERE referee_id != 2

Running this query returns the following result, which is not the required result:

| name |
| ---- |
| Zack |

The query is not returning those customers where the `referee_id` is NULL because `NULL != 2` returns an unknown value. According to SQL Server - NULL and UNKNOWN (Transact-SQL)

"NULL indicates that the value is unknown. A null value is different from an empty or zero value. No two null values are equal. Comparisons between two null values, or between a null value and any other value, return unknown because the value of each NULL is unknown."

To include customers that were not referred by any other customers, `referee_id IS NULL` needs to be added in the query. The condition `referee_id = NULL` cannot be used because as mentioned above, the result of any arithmetic comparison, such as `=`, with NULL is unknown. Adding the `referee_id IS NULL` condition to the query above now yields the following query:

# Final Solution Query
SELECT name FROM Customer
WHERE referee_id != 2 OR referee_id IS NULL

This query now returns the desired result. Here's the query plan generated by SQL Server for this query:

  |--Clustered Index Scan(OBJECT:([leetcode].[dbo].[Customer].[PK_Customer]),
     WHERE:([leetcode].[dbo].[Customer].[referee_id] IS NULL OR [leetcode].[dbo].[Customer].[referee_id]<>(2)))

And here's the fastest runtime for this query:

One may ask, does the order of the conditions used by the `OR` operator matter? So instead of `WHERE referee_id != 2 OR referee_id IS NULL`, will it be faster if the conditions were interchanged to `WHERE referee_id IS NULL OR referee_id != 2`? Will the following query be faster than the one earlier?

# Final Solution Query
SELECT name FROM Customer
WHERE referee_id IS NULL OR referee_id != 2

Comparing the query plan of this updated query with the one above, it can be seen that the same query plan was generated by SQL Server:

  |--Clustered Index Scan(OBJECT:([leetcode].[dbo].[Customer].[PK_Customer]),
     WHERE:([leetcode].[dbo].[Customer].[referee_id] IS NULL OR [leetcode].[dbo].[Customer].[referee_id]<>(2)))

In addition, the fastest runtime for the "interchanged" condition is somewhat close to the one above:

There have been a lot of discussions in the SQL Server community about whether SQL Server performs a "short-circuit" when evaluating the `WHERE` clause that involves the `OR` operator and no "official" documentation can be found that neither confirms nor denies that SQL Server performs a "short-circuit` when evaluating the `WHERE` clause that involves the `OR` operator.

One other question that may be asked, given that the `referee_id` is being compared between 2 values, namely the value 2 and NULL, can't the `NOT IN` comparison operator be used:

SELECT name FROM Customer
WHERE referee_id NOT IN (NULL, 2);

Running this query will return an empty set. The reason the query is returning an empty set is because according to SQL Server - IN Logical Operator (Transact-SQL)

"Any null values returned by subquery or expression that are compared to test_expression using IN or NOT IN return UNKNOWN. Using null values in together with IN or NOT IN can produce unexpected results."

Since one of the expressions or values in the list is a NULL value, the result of the condition returns an UNKNOWN value. Thus, the `referee_id NOT IN (NULL, 2)` condition returns False for each row in the table because in SQL Server, 0, NULL or UNKNOWN means false and anything else means true.

Solution 2 - Using ISNULL System Function

The second way of finding the names of customers who were not referred by the customer with id = 2 is with the use of the `ISNULL` system function. The `ISNULL` function accepts 2 parameters and returns the first parameter if it is not NULL; otherwise it returns the second parameter.

Here's how the query will now look using the `ISNULL` flow control function instead of the `OR` operator:

# Final Solution Query
SELECT name FROM Customer
WHERE ISNULL(referee_id, 0) != 2

With the `ISNULL(referee_id, 0)`, if the value of `referee_id` is not NULL, then the function returns the value of `referee_id`. Otherwise, if the value of `referee_id` is NULL, the function returns the second parameter, which is in this case, 0. The value passed as the second parameter of the `ISNULL` function, in this case, can be any number as long as it is not 2 so that if `referee_id` is NULL, the `WHERE` clause will return a true value.

Here's the query plan generated by SQL Server:

  |--Clustered Index Scan(OBJECT:([leetcode].[dbo].[Customer].[PK_Customer]),
     WHERE:(isnull([leetcode].[dbo].[Customer].[referee_id],(0))<>(2)))

And here's the fastest runtime for this query using the `ISNULL` function:

Solution 3 - Using COALESCE Comparison Function

The third way of finding the names of customers who were not referred by the customer with id = 2 is with the use of the `COALESCE` comparison function. The `COALESCE` function returns the first non-NULL value in the list or NULL if there are no non-NULL values in the list.

Here's how the query will look like using the `COALESCE` comparison function instead of the `ISNULL` flow control function or the `OR` operator:

# Final Solution Query
SELECT name FROM Customer
WHERE COALESCE(referee_id, 0) != 2

Similar to the `ISNULL` function in the second solution, with the `COALESCE(referee_id, 0)`, if the value of `referee_id` is not NULL, then the `COALESCE` function returns the value of `referee_id` because that's the first non-NULL value in the list of parameters passed to the `COALESCE` function. Otherwise, if the value of `referee_id` is NULL, the `COALESCE(referee_id, 0)` function returns 0 because that's the first non-NULL value in the list of parameters passed to the function. Just like the case for the `ISNULL` function in the second solution, the second parameter passed to the `COALESCE` function can be any number aside from 0 as long as it is not 2 so that the `WHERE` clause will return a true value if `referee_id` is NULL.

Here's the query plan generated by SQL Server:

  |--Clustered Index Scan(OBJECT:([leetcode].[dbo].[Customer].[PK_Customer]),
     WHERE:(CASE WHEN [leetcode].[dbo].[Customer].[referee_id] IS NOT NULL
                 THEN [leetcode].[dbo].[Customer].[referee_id] ELSE (0) END<>(2)))

And here's the fastest runtime for the query that uses the `COALESCE` function:

Solution Runtime Comparison

Here's the comparison of the fastest runtime for each of the solutions.

| Solution #                                | Runtime | Beats  |
| ----------------------------------------- | ------- | ------ |
| 1a - Using OR Operator - NULL Check Last  |  462ms  | 93.82% |
| 1b - Using OR Operator - NULL Check First |  462ms  | 93.82% |
| 2 - Using ISNULL Function                 |  462ms  | 93.82% |
| 3 - Using COALESCE Function               |  464ms  | 92.61% |

Related Articles: