Skip to content

LeetCode 584 - Find Customer Referee

Database Language: PostgreSQL

Difficulty: ⭐

Problem Description

Input

Table: Customer

Column Name Type
id int
name varchar
referee_id int

In SQL, id is the primary key column for this table. Each row of this table indicates the id of a customer, their name, and the id of the customer who referred them.

Requirement

Find the names of the customer that are not referred by the customer with id = 2.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1

Input

Customer table:

id name referee_id
1 Will NULL
2 Jane NULL
3 Alex 2
4 Bill NULL
5 Zack 1
6 Mark 2
Output
name
Will
Jane
Bill
Zack

SQL Schema

CREATE TABLE IF NOT EXISTS Customer (id INT PRIMARY KEY, name VARCHAR(25), referee_id INT);

TRUNCATE TABLE Customer;
INSERT INTO Customer (id, name, referee_id) values ('1', 'Will', NULL);
INSERT INTO Customer (id, name, referee_id) values ('2', 'Jane', NULL);
INSERT INTO Customer (id, name, referee_id) values ('3', 'Alex', '2');
INSERT INTO Customer (id, name, referee_id) values ('4', 'Bill', NULL);
INSERT INTO Customer (id, name, referee_id) values ('5', 'Zack', '1');
INSERT INTO Customer (id, name, referee_id) values ('6', 'Mark', '2');

Solutions

There are two possible ways of finding the names of customers that are not referred by the customer with id = 2, namely:

  • Using the OR operator
  • Using the COALESCE comparison function

Solution 1 - Using OR Operator

To find the names of customers that are not referred by the customer with id = 2, all that needs to be done is check if the referee_id is not equal to 2. This can be achieved by the following query:

SELECT name FROM Customer
WHERE referee_id != 2

Running this query returns the following result, which is not the required result:

name
Zack

The query is not returning those customers where the referee_id is NULL because NULL != 2 returns a NULL value. According to PostgreSQL - Comparison Functions and Operators,

Ordinary comparison operators yield null (signifying “unknown”), not true or false, when either input is null. For example, 7 = NULL yields null, as does 7 <> NULL.

To include customers that were not referred by any other customers, referee_id IS NULL needs to be added in the query. The condition referee_id = NULL cannot be used because as mentioned above, the result of any arithmetic comparison, such as =, with NULL is also NULL. Adding the referee_id IS NULL condition to the query above now yields the following query:

# Final Solution Query
SELECT name FROM Customer
WHERE referee_id != 2 OR referee_id IS NULL

This query now returns the desired result. Here's the query plan generated by PostgreSQL for this query:

Seq Scan on customer  (cost=0.00..19.75 rows=776 width=68)
  Filter: ((referee_id <> 2) OR (referee_id IS NULL))

And here's the fastest runtime for this query:

  • Runtime: 242ms
  • Beats: 94.28% as of July 26, 2024

One may ask, does the order of the conditions used by the OR operator matter? So instead of WHERE referee_id != 2 OR referee_id IS NULL, will it be faster if the conditions were interchanged to WHERE referee_id IS NULL OR referee_id != 2? Will the following query be faster than the one earlier?

# Final Solution Query
SELECT name FROM Customer
WHERE referee_id IS NULL OR referee_id != 2

According to PostgreSQL Value Expressions - Expression Evaluation Rules:

The order of evaluation of subexpressions is not defined. In particular, the inputs of an operator or function are not necessarily evaluated left-to-right or in any other fixed order.

Furthermore, if the result of an expression can be determined by evaluating only some parts of it, then other subexpressions might not be evaluated at all. For instance, if one wrote:

SELECT true OR somefunc();

then somefunc() would (probably) not be called at all. The same would be the case if one wrote:

SELECT somefunc() OR true;

Note that this is not the same as the left-to-right “short-circuiting” of Boolean operators that is found in some programming languages.

Comparing the query plan of this updated query with the one above, it can be seen that a similar query plan was generated by PostgreSQL:

Seq Scan on customer  (cost=0.00..19.75 rows=776 width=68)
  Filter: ((referee_id IS NULL) OR (referee_id <> 2))

In addition, the fastest runtime for the "interchanged" condition is somewhat close to the one above:

  • Runtime: 246ms
  • Beats: 89.63% as of July 26, 2024

One other question that may be asked, given that the referee_id is being compared between 2 values, namely the value 2 and NULL, can't the NOT IN comparison operator be used:

SELECT name FROM Customer
WHERE referee_id NOT IN (NULL, 2);

Running this query will return an empty set. The reason the query is returning an empty set is because according to PostgreSQL Subquery Expressions - NOT IN Operator:

Note that if the left-hand expression yields null, or if there are no equal right-hand values and at least one right-hand row yields null, the result of the NOT IN construct will be null, not true. This is in accordance with SQL's normal rules for Boolean combinations of null values.

Since one of the expressions or values in the list is a NULL value, the result of the condition returns a NULL value. Thus, the referee_id NOT IN (NULL, 2) condition returns False for each row in the table because in PostgreSQL, 0 or NULL means false and anything else means true.

Solution 2 - Using COALESCE Comparison Function

The second way of finding the names of customers who were not referred by the customer with id = 2 is with the use of the COALESCE comparison function. The COALESCE function returns the first non-NULL value in the list or NULL if there are no non-NULL values in the list.

Here's how the query will look like using the COALESCE comparison function instead of the IFNULL flow control function or the OR operator:

# Final Solution Query
SELECT name FROM Customer
WHERE COALESCE(referee_id, 0) != 2

With the COALESCE(referee_id, 0), if the value of referee_id is not NULL, then the COALESCE function returns the value of referee_id because that's the first non-NULL value in the list of parameters passed to the COALESCE function. Otherwise, if the value of referee_id is NULL, the COALESCE(referee_id, 0) function returns 0 because that's the first non-NULL value in the list of parameters passed to the function. The second parameter passed to the COALESCE function can be any number aside from 0 as long as it is not 2 so that the WHERE clause will return a true value if referee_id is NULL.

Here's the query plan generated by PostgreSQL:

Seq Scan on customer  (cost=0.00..19.75 rows=776 width=68)
  Filter: (COALESCE(referee_id, 0) <> 2)

And here's the fastest runtime for the query that uses the COALESCE function:

  • Runtime: 237ms
  • Beats: 98.23% as of July 26, 2024

Solution Runtime Comparison

Here's the comparison of the fastest runtime for each of the solutions.

Solution # Runtime Beats
1a - Using OR Operator - NULL Check Last 242ms 94.28%
1b - Using OR Operator - NULL Check First 246ms 89.63%
2 - Using COALESCE Function 237ms 98.23%