CS263 Workshop 3: SUBQUERIES , JOINING TABLES and the UNION OPERATOR
We will be using the same data (the ‘company’ database) as in previous workshops. So, again log in to MySQL:Start\AllProgs\Departmental Software\Computing\MySQL
- Trigger MySQL with root user privileges (allows you to do anything):
- Mysql –u root
Then tell it to use the ‘company’ database::
USE company;
{If the 'company' database cannot be found, re-build it using the instructions at the beginning of Workshop 2}
Subqueries are always processed starting with the innermost SELECT statement and then working outwards. Here is an example where we want to retrieve the order numbers of all orders items (from the OrderItems table) which contain the item RGAN01 {notice this is a zero, not the letter ‘O’).
Retrieve the customer ID (from the Orders table) of all customers who have orders listed in the order numbers returned in the previous step.
SELECT cust_id, order_num
FROM Orders
WHERE order_num IN (SELECT order_num
FROM OrderItems
WHERE prod_id = ‘RGAN01’);
Note: Subquery SELECT statements can only retrieve a single column. There is no limit to the nesting of subqueries, though deep nesting may affect performance.
Here is an example of using a subquery to create a calculated field. Here we want to retrieve a list of customers and the state they live in from the Customers table, then for each customer retrieved we want to calculate the number of associated orders in the Orders table:
SELECT cust_name, cust_state,
(SELECT COUNT(*)
FROM Orders
WHERE Orders.cust_id = Customers.cust_id) AS number_ordered
FROM Customers
{Note that the indenting of the subquery makes it more readable. Also note the indexing of the table name and the column name (such as Orders.cust_id and Customers.cust_id) – this must be used wherever there is possible ambiguity about column names.}
Why use JOINS? - Breaking data into multiple tables enables more efficient storage, easier manipulation, and greater scalability. However, if data is stored in multiple tables, how can you retrieve that data with a single SELECT statement? To do this we use a JOIN, which associates tables within a SELECT statement, allowing us to join multiple tables so that a single set of outputs is returned, and the join associates the correct rows in each table ‘on the fly’. A join is created by the DBMS as needed, and it only persists for the duration of the query execution. Here is an example EQUIJOIN (INNER JOIN) - of joining the two tables Vendors and Products based on the testing of equality between the tables, and displaying vend_name, prod_name and prod_price.Make sure you have a WHERE clause with the JOIN – or your DBMS may return far more data than you want!
SELECT vend_name, prod_name, prod_price
FROM Vendors, Products
WHERE Vendors.vend_id = Products.vend_id;
In the following example we join multiple (3) tables:
SELECT prod_name, vend_name, prod_price, quantity
FROM OrderItems, Products, Vendors
WHERE Products.vend_id = Vendors.vend_id
AND OrderItems.prod_id = Products.prod_id
AND order_num = 20007;
In many situations, we can use a join where we would use a subquery. For example, we could do the following join:
SELECT cust_name, cust_contact
FROM Customers, Orders, OrderItems
WHERE Customers.cust_id = Orders.cust_id
AND OrderItems.order_num = Orders.order_num
AND prod_id = ‘RGAN01’;
However, joins are often more efficient than nested subqueries
Question: Can you think of your own query that can be done using either a select or a join? Try both out – did you get what you expected?
Self join - suppose we want to send a mailing list to customer contacts who work for the same company as 'Jim Jones'. To do this, we must first find out the company that Jim works for, and then which customers work for that company.
SELECT c1.cust_id, c1.cust_name, c1.cust_contact
FROM Customers as c1, Customers as c2
WHERE c1.cust_name = c2.cust_name
AND c2.cust_contact = 'Jim Jones';
{Notice here we create ALIASES c1 and c2 which both refer to the Customers table}
Natural Joins - the previous kinds of join you have investigated return all data - even multiple occurrences of the same column. A natural join eliminates those multiple occurrences. In this kind of join you select which columns are unique, usually by using 'SELECT *' from one table and just subsets of the columns in the other table(s).
SELECT C.*, O.order_num, O.order_date, OI.prod_id, OI.quantity, OI.item_price
FROM Customers AS C, Orders as O, OrderItems AS OI
WHERE C.cust_id = O.cust_id
AND OI.order_num = O.order_num
AND prod_id = 'RGAN01';
Outer Joins - Most joins relate rows in one table with rows in another. Sometimes you will want to include rows that have no related rows, ,for example count how many orders were placed by each customer, including customers who have yet to place an order.
SELECT Customers.cust_id, Orders.order_num
FROM Customers LEFT OUTER JOIN Orders
ON Customers.cust_id = Orders.cust_id;
{Here all the rows from the Customers table because we have a LEFT OUTER JOIN, to select all the rows from the Orders table we would use a RIGHT OUTER JOIN. If we wanted all rows selected from both tables, we would use a FULL OUTER JOIN - but this is not currently implemented in MySQL}. Here are two more examples using joins and aggregate functions:
SELECT Customers.cust_id, COUNT(Orders.order_num) AS num_ord
FROM Customers INNER JOIN Orders
ON Customers.cust_id = Orders.cust_id
GROUP BY Customers.cust_id;
another example:
SELECT Customers.cust_id, COUNT(Orders.order_num) AS num_ord
FROM Customers LEFT OUTER JOIN Orders
ON Customers.cust_id = Orders.cust_id
GROUP BY Customers.cust_id;
Combining queries - most SQL queries contain a single SELECT statement that returns data from one or more tables. We can also perform UNIONS (compound queries), which allow you to perform multiple SELECT statements and return them as a single result. We do this by specifying each SELECT statement and placing the keyword UNION between each.
Supposing that we need a report on all customers in Illinois, Indiana and Michigan and we also want to include all the 'Fun4All' toy shops (regardless of the state they are in):
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_state IN('IL','IN','MI')
UNION
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_name = 'Fun4All';
{the UNION statement tells the DBMS to execute both SELECTs and combine the output}
Rules for using UNIONs:
- Each query in the UNION must contain the same columns or expressions
- These must occur in the same order in each SELECT
- Column datatypes must be compatible - they must be of the same type or capable of being converted (cast) into the same type
Duplicate rows - By default the UNION automatically removes duplicate rows from the query, if you want all occurrences of all matches returned, you can use UNION ALL
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_state IN('IL','IN','MI')
UNION ALL
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_name = 'Fun4All';
Question - is there a difference between the output of this SQL query and the previous UNION? If so, why does it occur?
Leave SQL - Quit
1