LeetCode 175 - Combine Two Tables

Database Language: SQL Server

Difficulty: Easy

Problem Description

Input

Table: Person

| Column Name | Type    |
| ----------- | ------- |
| personId    | int     |
| lastName    | varchar |
| firstName   | varchar |

`personId` is the primary key (column with unique values) for this table. This table contains information about the ID of some persons and their first and last names.

Table: Address

| Column Name | Type    |
| ----------- | ------- |
| addressId   | int     |
| personId    | int     |
| city        | varchar |
| state       | varchar |

`addressId` is the primary key (column with unique values) for this table. Each row of this table contains information about the city and state of one person with ID = PersonId.

Requirement

Write a solution to report the first name, last name, city, and state of each person in the Person table. If the address of a personId is not present in the Address table, report null instead.

Return the result table in any order.

The result format is in the following example.

Examples

Example 1

Input

Person table:

| personId | lastName | firstName |
| -------- | -------- | --------- |
| 1        | Wang     | Allen     |
| 2        | Alice    | Bob       |

Address table:

| addressId | personId | city          | state      |
| --------- | -------- | ------------- | ---------- |
| 1         | 2        | New York City | New York   |
| 2         | 3        | Leetcode      | California |
Output
| firstName | lastName | city          | state    |
| --------- | -------- | ------------- | -------- |
| Allen     | Wang     | NULL          | NULL     |
| Bob       | Alice    | New York City | New York |
Explanation

There is no address in the address table for the personId = 1 so we return null in their city and state. addressId = 1 contains information about the address of personId = 2.

SQL Schema

CREATE TABLE [dbo].[Person] (personId int PRIMARY KEY, firstName varchar(255), lastName varchar(255));
CREATE TABLE [dbo].[Address] (addressId int PRIMARY KEY, personId int, city varchar(255), state varchar(255));

TRUNCATE TABLE [dbo].[Person];
INSERT INTO [dbo].[Person] (personId, lastName, firstName) VALUES ('1', 'Wang', 'Allen');
INSERT INTO [dbo].[Person] (personId, lastName, firstName) VALUES ('2', 'Alice', 'Bob');

TRUNCATE TABLE Address;
INSERT INTO [dbo].[Address] (addressId, personId, city, state) VALUES ('1', '2', 'New York City', 'New York');
INSERT INTO [dbo].[Address] (addressId, personId, city, state) VALUES ('2', '3', 'Leetcode', 'California');

Solution

Given that the requirement is to return all rows from the `Person` table even if there are no corresponding row in the `Address` table, the `LEFT OUTER JOIN` (or `LEFT JOIN` for short) will be used to join the `Person` table with the `Address` table with the `Person` table on the left side of the join clause. The 2 tables will be joined on the `personId` column and here's how the join clause will look like:

FROM Person LEFT OUTER JOIN Address
  ON Person.personId = Address.personId

The requirement wants the person's `firstName` and `lastName` as well as the person's `city` and `state` if it exists in the `Address` table and return a NULL value to these 2 columns if they don't exist. Since we are using a `LEFT OUTER JOIN`, the `city` and `state` will automatically return NULL values if an address record does not exist for a person. Here's how the `SELECT` statement will look like:

SELECT Person.firstName, Person.lastName, Address.city, Address.state

It is a good practice to include the table name as part of the column name to avoid any ambiguity if you will be joining tables that has a column that exists in both tables. Putting these 2 together yields the following query:

# Final Query
SELECT Person.firstName, Person.lastName, Address.city, Address.state
FROM Person LEFT OUTER JOIN Address
  ON Person.personId = Address.personId

The query plan generated by SQL Server for this query is as follows:

  |--Nested Loops(Left Outer Join, WHERE:([leetcode].[dbo].[Person].[personId]=[leetcode].[dbo].[Address].[personId]))
   	|--Clustered Index Scan(OBJECT:([leetcode].[dbo].[Person].[PK_Person]))
   	|--Clustered Index Scan(OBJECT:([leetcode].[dbo].[Address].[PK_Address]))

And here's the fastest runtime for the query above:

Related Articles: