School of Computing, Engineering and Information Sciences
University of Northumbria
Joins in SQL
Aim:
To understand how to do the main forms of join in SQL, using SQL1 standard syntax and the SQL2 standard syntax.
Outline of Session:
To do Cartesian Products of relations in SQL1 and SQL2 syntax.
To do Generalised Joins of relations in SQL1 and SQL2 syntax.
To do Natural Joins of relations in SQL1 and SQL2 syntax.
To do several queries in SQL1 and SQL2 that require the equivalent of a Join with a Project and/or Restrict operation.
INTRODUCTION
All the exercises done last week by hand are to be repeated this week using Oracle SQL.
Every retrieval of a join is to be done twice, once using SQL1 syntax and once using SQL2 syntax. Check that you get the same result for both cases.
When you are using the same data in the tables as in last week’s exercises, check that you get the same join results as in the answers to last week’s exercises. If you’re using different data, work out what the answer should be and check that you do indeed get the correct answer. Don’t just assume your result is correct because the retrieval executed without returning an error message.
All the relations used last week were chosen to demonstrate the logic of the various kinds of join. The equivalent SQL tables of all these relations will not already exist in your database, although you should have (as a result of previous exercises) tables that are similar to some of them. Hence you will have to :
1. for similar tables, decide whether to create some new SQL tables to represent them, or alternatively use the existing tables and anticipate the slightly different answers that should result from the SQL joins;
2. for the remaining relations, create new SQL tables to represent them.
Whenever you create a new table for these exercises, do the minimum of work necessary to create the required table. As you are putting in specific data for a specific exercise, and therefore in any case you will have to manually check the data in the table so that you can manually compare the results of a join with the tables used in the join, there is never any need to put in referential integrity or ad hoc integrity constraints. Just put in valid column data types and a primary key. If a primary key is not obvious, make it comprise all the columns in the table – in this way you will ensure all your rows are unique. (For example, you will find that this applies to tables R and S).
Do not make the mistake of thinking there is a logical link between referential integrity and joins. There is none ! This mistake often arises because :
1. In a good database design, there are sometimes attributes/columns that are common to 2 or more relations - e.g. supplier number, employee ID - because this is the way data is related in a relational database[1].
2. It is often desirable to ensure that the common data is consistent in the different relations/tables, and so referential integrity is applied.
3. Because they are common, it is often of interest to join relations/tables on these attributes/columns.
4. Hence the mental association between referential integrity and joins.
Note that it is the fact that the common attributes/columns are used in 2 distinct ways that causes the mental association between these ways, not because joins are logically dependent on referential integrity, or vice versa. Sometimes it is useful to do joins on attributes/columns where there is no referential integrity between them.
CARTESIAN PRODUCT EXERCISES
For convenience, here are the two relations R and S from the previous exercises :-
R S
A / B1 / 2
3 / 5
6 / 8
7 / 4
P / Q / R / S
e / f / g / h
v / w / x / y
Exercise:
Create these tables (make the columns of R have type integer, and the columns of S have a type consisting of a single character) and insert the data above.
Retrieve the Cartesian Product of R and S.
For convenience, here are the two relations CAR and EMPLOYEE from the previous exercises :-
CAR EMPLOYEE
RegNo / Type / OwnerND56 ABC / Corsa 1.3 / E3
JON 1 / Jaguar XK / E1
J111 PQ / Volvo S80 / E6
ENo / EName / M-S / Sal
E3 / Smith / S / 12,500
E1 / Robson / D / 32,500
E6 / Blake / M / 54,000
E8 / Jones / W / 68,000
You will have to create a CAR table, but you should already have in your database a table called EMP that is similar to EMPLOYEE. Decide whether to use EMP instead of creating EMPLOYEE.
Exercise: retrieve the Cartesian Product of CAR and EMP / EMPLOYEE.
For convenience, here are the two relations SUPPLIER and SHIPMENT from the previous exercises :-
SUPPLIER SHIPMENT
SNo / Name / LocationS2 / DL Supplies / Morpeth
S3 / SHS Supplies / Hexham
PNo / SNo / Qty
P3 / S2 / 2
P1 / S3 / 5
P6 / S2 / 10
P1 / S2 / 1
You will have to create both these tables.
Exercise: retrieve the Cartesian Product of SUPPLIER and SHIPMENT.
How is it that you can do this in SQL when in principle the Cartesian Product operator does not permit it ?
GENERALISED JOIN EXERCISES
For convenience, here are the two relations R and S again. Note that they contain different data, and different kinds of data, to the Cartesian Product versions of these
tables :-
R S
A / B1 / a
3 / f
6 / g
7 / f
P / Q / R / S
4 / f / g / 3
6 / w / x / 8
Exercise: drop and re-create tables R and S, and retrieve the Generalised Join of them where the ‘Join Condition’ is
B = Q OR A <= P
Using the two tables CAR and EMP / EMPLOYEE again :-
Exercise: retrieve the Generalised Join of CAR and EMP / EMPLOYEE where the ‘Join Condition’ is
Owner > ENo AND Substr( RegNo, 1, 1 ) = Substr( EName, 1, 1 )
Note that it is not possible to write the second part of the ‘Join Condition’ in the same way as in the previous set of exercises, as SQL does not have the same operators on text. The above is logically equivalent given that we know that capital letters will only occur at the beginning of a person’s name. Substr extracts a substring from a column or text; the second parameter specifies where to start the extraction and the third parameter specifies how many characters to extract; so just the first character in each of the two cases are compared for equality.
Exercise: retrieve the Generalised Join of CAR and EMP / EMPLOYEE where the ‘Join Condition’ is
Owner = ENo
For convenience, here are the two relations STOCK and SHIPMENT from the previous exercises :-
STOCK SHIPMENT
PNo / PName / SLevelP2 / Widget (Bent) / 2
P3 / Widget (Warped) / 6
PNum / CustNo / Qty
P3 / C2 / 3
P2 / C3 / 3
P2 / C10 / 2
P2 / C12 / 1
You will have to create a STOCK table. Note that this SHIPMENT table is different to the previous one. You should create an additional version of the SHIPMENT table rather than replace the original version. Give it a different name, say CUST_SHIPMENT to distinguish it from the previous version.
Exercise: retrieve the Generalised Join of STOCK and CUST_SHIPMENT where the ‘Join Condition’ is
PNo = PNum AND SLevel Qty
NATURAL JOIN EXERCISES
For convenience, here are the two relations R and S from the previous exercises :-
R S
A / C1 / a
3 / f
6 / g
7 / f
A / Q / C / W
6 / f / g / 3
6 / w / x / 8
3 / f / g / 7
Relation R has the same data as in the previous case but has a different attribute name. Relation S has different attribute names and different data.
Exercise: drop and re-create tables R and S, and then retrieve the Natural Join of R and S.
Using the two relations SUPPLIER and SHIPMENT (original version !) again :-
Exercise: retrieve the Natural Join of SUPPLIER and SHIPMENT.
For convenience, here are the two relations ALLOC and PROJECT from the previous exercises. These tables should already exist in your database, but will contain different data, and have different table and column names, from that displayed here. Decide whether to change your database to the tables below, or leave them and use the database tables ‘as-is’, determining for yourself what the result of the join should contain. (The latter strategy is recommended).
ALLOC PROJECT
ENo / ProjNoE2 / P1
E3 / P1
E2 / P2
E4 / P2
E3 / P3
ProjNo / Start / Deadline
P1 / 20-Apr-2004 / 23-Feb-2008
P2 / 21-Jan-2005 / 14-May-2007
Exercise: retrieve the Natural Join of ALLOC and PROJECT.
Using the two relations CAR and EMP / EMPLOYEE again :-
Exercise: retrieve the Natural Join of CAR and EMPLOYEE.
How is it that you can do this in SQL when in principle the Natural Join operator does not permit it ? Are you actually carrying out a Natural Join in SQL ?
MORE ADVANCED RETRIEVALS
Given a relation that includes, among other data, that data which is the answer to a query, you have seen in previous exercises how to form a smaller relation from it that contains only the data that forms the answer to the query. This was done by using the SQL equivalent of Project and Restrict operations to pick out just the relevant attributes and tuples respectively,
However sometimes the answer to a query is spread out over 2 or more relations. In this case the strategy is to merge these relations into one relation, and then pick out from this merger the relevant attributes and/or tuples as before.
The various join operations form one means of merging two relations into one, a means that is very commonly used. (The set operations form another means - they will be covered later in the course).
Note that one can merge 3 relations into one by first merging two of them into one relation, and then merging the result of the merger with the third relation; and so on for 4 or more relations.
The following queries require you to join 2 relations into one, and then to pick the required attributes and/or tuples from the result of the join to get the answer. To achieve this in SQL, you will need to incorporate a suitable Join operation and a suitable Project and/or Restrict operation into one SQL statement. SQL will do the Join first and then the Project and/or Restrict.
Do the following queries twice, once in SQL1 syntax and once in SQL2 syntax. They all use tables DEPT and/or EMP and/or PROJ and/or ALLOC that you have already created.
Exercise: retrieve the names of all employees from the ‘Sales’ department.
Exercise: retrieve the names of those departments that employee people whose marital status is ‘single’.
Exercise: retrieve the employee numbers of those employees working on a project whose deadline is ’01-Jan-09’.
Exercise: which SQL syntax do you think more clearly expresses these queries, SQL1 or SQL2 ? Why ?
7
[1] Of course, it doesn’t mean that the data has to be physically replicated in physical storage, as this could be wasteful of storage space. Other ways of physically relating the data may be used, e.g. pointers.