T-SQL Join Types
Understand the basic types of T-SQL joins
Feb 4, 2013Steve Stedman | SQL Server Pro
Advertisement
One of the most powerful features of the T-SQL SELECT statement is the ability to retrieve data from multiple tables using the JOIN clause. Although there are other ways of pulling data from more than one table (e.g., using subqueries, using the APPLY operator), the join is the most commonly used technique.
There are different types of joins—inner join, left outer join, right outer join, full outer join, and cross join—that let you to accomplish different objectives. Understanding all the different join types can help you make the right decision on which one to use so that you can get the results you're looking for.
One of the best ways I've found to show the differences between the join types is using Venn diagrams. With these diagrams, you can see the data that's included for each type of join.
For this discussion, I'll be using a small database, which you can think of as a contact-list database. It has two tables: One is named People and the other is name PhoneNumbers. There are some phone numbers that aren't associated with people, and there are some people who don't have phone numbers, simulating a real-world database. If you want to follow along with the examples that I'll present, you can run the script BasicJoins.sql, which creates a sample database. (Click the Download button near the top of the page to download this script.)
No Join
Having worked in several environments where junior programmers were thrown into database programming with limited knowledge of joins, I've seen some creative attempts to do join work inside the web application code (e.g., Active Server Pages—ASP—code), usually with no success. There are reasons why the join should be done in the database and not in the application code. First and foremost, SQL Server is really good at doing joins. Second, if you do the join outside the database, you have to transfer a great deal of data out of the database.
To serve as a baseline, let's look at two basic SELECT statements that retrieve data from the People and PhoneNumbers tables, without any type of join between them:
SELECT PeopleID, Name FROM dbo.People; SELECT PhoneNumberID ,PeopleID, Number FROM dbo.PhoneNumbers;
Figure 1 shows the Venn diagram for these SELECT statements.
Figure 1: No Join
Inner Join
The inner join is the most common of all the join types. Most DBAs use them when they're looking for data that matches up between two tables. The inner join connects two tables on the values that are matching. The result set that you get contains the rows from table A that match table B on the specified join predicate. If there are rows in either table that don't match, they aren't returned in the result set.
Queries that include inner joins are simple to write. You can use the keyword INNER JOIN or just the keyword JOIN. I prefer to use INNER JOIN so that it's explicitly called out as being an inner join to avoid confusion when someone else is reviewing the query at a later time. Here's an example of an inner join:
SELECT p.PeopleID, p.Name ,n.PhoneNumberID, n.PeopleID, n.Number FROM dbo.People p INNER JOIN dbo.PhoneNumbers n ON p.PeopleID = n.PeopleID;
Figure 2 shows the Venn diagram for this code.
Figure 2: Inner Join
The join predicate is typically based on the primary key–foreign key relationship between the tables involved in the join. Understanding the primary key–foreign key relationship is important in order to join on the correct columns. SQL Server won't prevent you from joining on columns that don't make sense, so it's possible to join on a primary key in one table and the wrong column in another table, which would produce incorrect results.
Outer Join
A common mistake made with inner joins is to join to a table that has optional data, in which case some of the rows that you expect to be returned don't show up. This is where you can use an outer join.
The outer join is similar to the inner join in that it includes all the rows that match. Where the outer join differs is that it also includes rows that exist in one table, but don't have a match in the other table. The rows that don't match are shown as nulls.
The LEFT, RIGHT, or FULL keyword is added to the OUTER JOIN keyword to specify the type of outer join to perform. Consider the following requests:
- List all the people with phone numbers, and show their phone number.
- List all the people, and if they have a phone number, show that number.
- List all the phone numbers, and if there are people associated with the phone numbers, list the people.
- List all the people who don't have a phone number.
- List all the phone numbers that don't have a person assigned to them.
How would you write the queries to get the answers? Request 1 can easily be accomplished with an inner join, but for the other requests, it's not so straightforward until you consider the outer join.
Left Outer Join
Let's start with request 2—obtaining a list of all the people, and if they have a phone number, show that number. You can't use an inner join because that will only list the people who have a phone number—and you need to list all the people, whether or not they have a phone number.
If you didn't know about outer joins, this request would be challenging. You could execute a SELECT statement against one table, then use UNION ALL to combine that result with the result of an inner join between the two tables, and finally do some grouping to eliminate duplicates. This might give you the right result, but it's going to be very confusing and probably won't be very efficient.
Fortunately, you can write a query that uses a left outer join. You can use the keyword LEFT OUTER JOIN or just LEFT JOIN, as is done in this query:
SELECT p.PeopleID, p.Name ,n.PhoneNumberID, n.PeopleID, n.Number FROM dbo.People p LEFT JOIN dbo.PhoneNumbers n ON p.PeopleID = n.PeopleID;
The left outer join is asking for all the rows from the People table and, if there's a match in the PhoneNumbers table, match it up; otherwise, display nulls. In the Venn diagram in Figure 3, the orange area includes all the rows in the People table and only those rows from the PhoneNumbers table that match. The query output shows that you have nine people, but you're missing phone numbers for three of them (Jack, Donna, and Karen). If you had used an inner join, it would've dropped those three people. That would spell trouble if the People table contained data about customers. Instead of returning all nine customers, the inner join would've returned only six customers—despite the fact that a customer is still a customer, even if you don't have the person's phone number. As this example demonstrates, you should use the left outer join when you know that you have data in one table, but you might not have a match in the second table.
Figure 3: Left Outer Join
Right Outer Join
Request 3 is asking for a list of all the phone numbers, and if there are people associated with the phone numbers, show those people. In this case, you can use a right outer join. The syntax for a right outer join is similar to a left outer join, except that the keyword RIGHT is used. You can use either RIGHT OUTER JOIN or just RIGHT JOIN in the query:
SELECT p.PeopleID, p.Name ,n.PhoneNumberID, n.PeopleID, n.Number FROM dbo.People p RIGHT JOIN dbo.PhoneNumbers n ON p.PeopleID = n.PeopleID;
Figure 4 shows the Venn diagram for this code.
Figure 4: Right Outer Join
Note that you could use a left outer join if you were to simply switch the tables in the FROM clause and JOIN clause. For example, these two queries produce the exact same results:
SELECT p.PeopleID, p.Name ,n.PhoneNumberID, n.PeopleID, n.Number FROM dbo.People p LEFT JOIN dbo.PhoneNumbers n ON p.PeopleID = n.PeopleID; SELECT p.PeopleID, p.Name ,n.PhoneNumberID, n.PeopleID, n.Number FROM dbo.PhoneNumbers n RIGHT JOIN dbo.People p ON p.PeopleID = n.PeopleID;
Right outer joins are commonly used to find data in one table that isn't in another table. For example, suppose that you have a database without proper foreign key relationships. In this database, the People table holds the names of your business contacts and the PhoneNumbers table contains phone numbers. Because there isn't any value in having a phone number without knowing whose phone number it is, you can use a right outer join to find all the phone numbers that aren't associated with a business contact name.
Right outer joins can also be useful if you're going to add a foreign key constraint to tables. Before applying the foreign key, you can use a right outer join to find all those values that would be considered a foreign key violation.
Left Outer Join with Exclusions
For Request 4—list all the people who don't have a phone number—you can use a left outer join with an exclusion. This type of join lets you find the data in one table that doesn't exist in another table. It's an alternative to using NOT IN or NOT EXISTS in a WHERE clause like this:
SELECT p.PeopleID, p.Name FROM dbo.People p WHERE p.PeopleID NOT IN (SELECT n.PeopleID FROM dbo.PhoneNumbers n WHERE n.PeopleID IS NOT NULL);
Here's how you can accomplish the same goal using a left outer join with an exclusion:
SELECT p.PeopleID, p.Name ,n.PhoneNumberID, n.PeopleID, n.Number FROM dbo.People p LEFT JOIN dbo.PhoneNumbers n ON p.PeopleID = n.PeopleID WHERE n.PhoneNumberID IS NULL;
Figure 5 shows the Venn diagram for this query. As you can see, it gives you everything from one table that isn't in the other table.
Figure 5: Left Outer Join with an Exclusion
In the query, notice that the WHERE clause is used to include every row in which the PhoneNumberID is null, meaning every row in which the person doesn't have a phone number. A common mistake when using a left outer join with an exclusion is to include the filtering of those rows that are null in the ON part of the JOIN clause, as shown here:
-- Left outer join with exclusion DONE WRONG SELECT p.PeopleID, p.Name ,n.PhoneNumberID, n.PeopleID, n.Number FROM dbo.People p LEFT JOIN dbo.PhoneNumbers n ON p.PeopleID = n.PeopleID AND n.PhoneNumberID IS NULL;
Let me explain the difference here. In the correct example, the query is joining all the People rows to PhoneNumbers rows, matching up the PeopleID. But since it's a left outer join, all the people are included, even if they don't have a phone number. After the left outer join is complete, the WHERE clause is applied to show only the rows that are null, giving you the result for which you're looking.
In the incorrect example, the query is joining all the People rows to PhoneNumbers rows, matching up the PeopleID and filtering the PhoneNumbers table when the PhoneNumberID is null. Because there are no rows in the PhoneNumbers table with a null PhoneNumberID, this is effectively left joining all the People rows and zero rows from PhoneNumbers. As Figure 6 shows, this gives you all nulls for all the phone numbers, which isn't accurate.
Figure 6: Correctly and Incorrectly Applying an Exclusion
Right Outer Join with Exclusions
For Request 5—list all the phone numbers that don't have a person assigned to them—you can use a right outer join with an exclusion. This query would be:
SELECT p.PeopleID, p.Name ,n.PhoneNumberID, n.PeopleID, n.Number FROM dbo.People p RIGHT JOIN dbo.PhoneNumbers n ON p.PeopleID = n.PeopleID WHERE p.PeopleID IS NULL;
As you can see, a right outer join with an exclusion is similar to the left outer join with an exclusion, except the exclusion in the WHERE clause is looking for people who have a PeopleID that's null. A null PeopleID means those phone numbers that don't match a person. Figure 7 shows the Venn diagram for this query.
Figure 7: Right Outer Join with an Exclusion
In this example, the right outer join with exclusions would be more useful if the database didn't follow best practices and was created without foreign key relationships. Given there are foreign key relationships, the only way you can end up with a value in the PhoneNumbers table that doesn't match a value in the People table would be if the PeopleID column in the PhoneNumbers table is null. In this case, finding the phone numbers that don't have a person assigned to them could be completed with a simple query:
SELECT n.PhoneNumberID, n.PeopleID, n.Number FROM dbo.PhoneNumbers n WHERE n.PeopleID IS NULL;
Full Outer Join
The full outer join is asking for everything from table A and everything from table B. If they have matching values, match them up. If they don't match, still display the row, but show nulls where they don't match.
You can use the keyword FULL OUTER JOIN or just FULL JOIN in the query. For the People and PhoneNumbers tables, the query would be:
SELECT p.PeopleID, p.Name ,n.PhoneNumberID, n.PeopleID, n.Number FROM dbo.People p FULL JOIN dbo.PhoneNumbers n ON p.PeopleID = n.PeopleID;
This query is selecting everything from the People table and everything from the PhoneNumbers table, matching up the rows based on the PeopleID foreign key relationship, and filling in nulls where the rows don't match. The Venn diagram for the full outer join is shown in Figure 8.
Figure 8: Full Outer Join
Full Outer Join with Exclusions
The full outer join with an exclusion is an interesting type of join. You would use it if you wanted everything from table A and everything from table B, where none of the rows match up. You can use the same query that you did for the full outer join, but you also include two checks for null in the WHERE clause:
SELECT p.PeopleID, p.Name ,n.PhoneNumberID, n.PeopleID, n.Number FROM dbo.People p FULL JOIN dbo.PhoneNumbers n ON p.PeopleID = n.PeopleID WHERE p.PeopleID IS NULL OR n.PeopleID IS NULL;
The full outer join is asking for all rows in which the ID is null from the People table and all rows in which the ID is null in the PhoneNumbers table. Then, the full outer join is grabbing only those rows that don't match up by checking for nulls. Figure 9 shows the Venn diagram for this query.
Figure 9: Full Outer Join with an Exclusion
Cross Join
When I explain a cross join to others, the typical response that I hear is, "When would you ever want to do that?" Let's take a look at the how the cross join works so you can decide. Some people find an immediate use for this type of join, whereas other people go their entire career without using it.
The cross join asks for all the rows in table A to be matched up with all of the rows in table B, without an ON clause, meaning that it creates the Cartesian product between the two tables. For the People and PhoneNumbers tables, the query would look like this:
SELECT p.PeopleID, p.Name ,n.PhoneNumberID, n.PeopleID, n.Number FROM dbo.People p CROSS JOIN dbo.PhoneNumbers n;
This query matches up everyone from the People table with all the rows from the PhoneNumbers table, producing 63 rows in the result set (9 rows from the People table × 7 rows from the PhoneNumbers table).
Quickly filling up tables with test data is the most common reason for using a cross join. One problem in database development is that most queries run fast when there are just a few rows in a table but then run into performance issues when a table is larger. I like to test with tables that have many rows to better simulate the performance that you'll actually see in a production environment. To achieve this, I create four small tables named FirstName, MiddleInitial, LastName, and Age. I then put about 20 items in each table and write a cross join between the 4 tables, producing 160,000 rows (20 × 20 × 20 × 20) of test data, which gets inserted into a table name People. If you run the query a few times, you can quickly have a table that's similar in size to a real production table. This technique can be applied to almost any table to quickly make it large enough to simulate production data.
Bringing It All Together
Some join types, such as the inner join, are much more commonly used than others. However, it's important to know how to use them all. Knowing the right time to use a left outer join versus a right outer join can save you hours when writing and debugging queries. You can think of all these join types as tools that are available to be used, but that don't necessarily need to be used unless there's a specific reason.