CS263 Workshop 3: SUBQUERIES , JOINING TABLES and the UNION OPERATOR

We will be using the same data (the ‘company’ database) as in previous workshops. So, again log in to MySQL:Start\AllProgs\Departmental Software\Computing\MySQL

  • Trigger MySQL with root user privileges (allows you to do anything):
  • Mysql –u root

Then tell it to use the ‘company’ database::

USE company;

{If the 'company' database cannot be found, re-build it using the instructions at the beginning of Workshop 2}

Subqueries are always processed starting with the innermost SELECT statement and then working outwards. Here is an example where we want to retrieve the order numbers of all orders items (from the OrderItems table) which contain the item RGAN01 {notice this is a zero, not the letter ‘O’).

Retrieve the customer ID (from the Orders table) of all customers who have orders listed in the order numbers returned in the previous step.

SELECT cust_id, order_num

FROM Orders

WHERE order_num IN (SELECT order_num

FROM OrderItems

WHERE prod_id = ‘RGAN01’);

Note: Subquery SELECT statements can only retrieve a single column. There is no limit to the nesting of subqueries, though deep nesting may affect performance.

Here is an example of using a subquery to create a calculated field. Here we want to retrieve a list of customers and the state they live in from the Customers table, then for each customer retrieved we want to calculate the number of associated orders in the Orders table:

SELECT cust_name, cust_state,

(SELECT COUNT(*)

FROM Orders

WHERE Orders.cust_id = Customers.cust_id) AS number_ordered

FROM Customers

{Note that the indenting of the subquery makes it more readable. Also note the indexing of the table name and the column name (such as Orders.cust_id and Customers.cust_id) – this must be used wherever there is possible ambiguity about column names.}
Why use JOINS? - Breaking data into multiple tables enables more efficient storage, easier manipulation, and greater scalability. However, if data is stored in multiple tables, how can you retrieve that data with a single SELECT statement? To do this we use a JOIN, which associates tables within a SELECT statement, allowing us to join multiple tables so that a single set of outputs is returned, and the join associates the correct rows in each table ‘on the fly’. A join is created by the DBMS as needed, and it only persists for the duration of the query execution. Here is an example EQUIJOIN (INNER JOIN) - of joining the two tables Vendors and Products based on the testing of equality between the tables, and displaying vend_name, prod_name and prod_price.Make sure you have a WHERE clause with the JOIN – or your DBMS may return far more data than you want!

SELECT vend_name, prod_name, prod_price

FROM Vendors, Products

WHERE Vendors.vend_id = Products.vend_id;

In the following example we join multiple (3) tables:

SELECT prod_name, vend_name, prod_price, quantity

FROM OrderItems, Products, Vendors

WHERE Products.vend_id = Vendors.vend_id

AND OrderItems.prod_id = Products.prod_id

AND order_num = 20007;

In many situations, we can use a join where we would use a subquery. For example, we could do the following join:

SELECT cust_name, cust_contact

FROM Customers, Orders, OrderItems

WHERE Customers.cust_id = Orders.cust_id

AND OrderItems.order_num = Orders.order_num

AND prod_id = ‘RGAN01’;

However, joins are often more efficient than nested subqueries

Question: Can you think of your own query that can be done using either a select or a join? Try both out – did you get what you expected?

Self join - suppose we want to send a mailing list to customer contacts who work for the same company as 'Jim Jones'. To do this, we must first find out the company that Jim works for, and then which customers work for that company.

SELECT c1.cust_id, c1.cust_name, c1.cust_contact

FROM Customers as c1, Customers as c2

WHERE c1.cust_name = c2.cust_name

AND c2.cust_contact = 'Jim Jones';

{Notice here we create ALIASES c1 and c2 which both refer to the Customers table}

Natural Joins - the previous kinds of join you have investigated return all data - even multiple occurrences of the same column. A natural join eliminates those multiple occurrences. In this kind of join you select which columns are unique, usually by using 'SELECT *' from one table and just subsets of the columns in the other table(s).

SELECT C.*, O.order_num, O.order_date, OI.prod_id, OI.quantity, OI.item_price

FROM Customers AS C, Orders as O, OrderItems AS OI

WHERE C.cust_id = O.cust_id

AND OI.order_num = O.order_num

AND prod_id = 'RGAN01';

Outer Joins - Most joins relate rows in one table with rows in another. Sometimes you will want to include rows that have no related rows, ,for example count how many orders were placed by each customer, including customers who have yet to place an order.

SELECT Customers.cust_id, Orders.order_num

FROM Customers LEFT OUTER JOIN Orders

ON Customers.cust_id = Orders.cust_id;

{Here all the rows from the Customers table because we have a LEFT OUTER JOIN, to select all the rows from the Orders table we would use a RIGHT OUTER JOIN. If we wanted all rows selected from both tables, we would use a FULL OUTER JOIN - but this is not currently implemented in MySQL}. Here are two more examples using joins and aggregate functions:

SELECT Customers.cust_id, COUNT(Orders.order_num) AS num_ord

FROM Customers INNER JOIN Orders

ON Customers.cust_id = Orders.cust_id

GROUP BY Customers.cust_id;

another example:

SELECT Customers.cust_id, COUNT(Orders.order_num) AS num_ord

FROM Customers LEFT OUTER JOIN Orders

ON Customers.cust_id = Orders.cust_id

GROUP BY Customers.cust_id;

Combining queries - most SQL queries contain a single SELECT statement that returns data from one or more tables. We can also perform UNIONS (compound queries), which allow you to perform multiple SELECT statements and return them as a single result. We do this by specifying each SELECT statement and placing the keyword UNION between each.

Supposing that we need a report on all customers in Illinois, Indiana and Michigan and we also want to include all the 'Fun4All' toy shops (regardless of the state they are in):

SELECT cust_name, cust_contact, cust_email

FROM Customers

WHERE cust_state IN('IL','IN','MI')

UNION

SELECT cust_name, cust_contact, cust_email

FROM Customers

WHERE cust_name = 'Fun4All';

{the UNION statement tells the DBMS to execute both SELECTs and combine the output}

Rules for using UNIONs:

  • Each query in the UNION must contain the same columns or expressions
  • These must occur in the same order in each SELECT
  • Column datatypes must be compatible - they must be of the same type or capable of being converted (cast) into the same type

Duplicate rows - By default the UNION automatically removes duplicate rows from the query, if you want all occurrences of all matches returned, you can use UNION ALL

SELECT cust_name, cust_contact, cust_email

FROM Customers

WHERE cust_state IN('IL','IN','MI')

UNION ALL

SELECT cust_name, cust_contact, cust_email

FROM Customers

WHERE cust_name = 'Fun4All';

Question - is there a difference between the output of this SQL query and the previous UNION? If so, why does it occur?

Leave SQL - Quit

1