1

HANDS ON SECTION

a)Definition:A Subquery is a “query within a query” -- something like a nested query. An example follows:

►HANDS-ON: Type, save, and test this program.

--Subquery1.sql
--Find a list of all employee who earn a salary equal to the minimum salary
--in the Company

SELECT emp_ssn, emp_last_name, emp_salary
FROMemployee
WHERE emp_salary = (SELECTMIN(emp_salary)
FROMemployee); /* Subquery coded inside the WHERE clause */

The highlighted part is called a Subquery or an Inner query. The un-highlighted part is called an Outer query.

A Subquery runs “inside out”. In other words, a subquery runs first. Only after it completes, then the Outer query runs next.

After its run is complete, a Subquery generates a result for the Outer query. The Outer query uses the result provided to it by the Subquery as its input value.In the example above, the Subquery runs first and generates a minimum salary result of $25,000.00. The Outer query then uses this number to find and list all the employees who make this salary. After use, the result provided by the subquery is discarded.

The nature and data type of what the Outer query needs, and what the inner Subquery provides must be compatible.

The result from a Subquery can be a single scalar value (e.g., minimum salary 25000), a result set that contains a single columnwith multiple rows(e.g., a STATE field containing the values “IL”, “KY”, “TN”), or a result set that contains 2 or more columns.

The table names used in the FROM clause in the Outer and inner Subquery may the same, or they may be different based on data requirement. In the example above, both queries use the same table employee.

A Subquery can be coded inside the SELECT, FROM, WHERE or HAVING clauses. In the above example, the Subquery is coded inside the WHERE clause, and this is where subqueries are mostly written. It is also common to see subqueries coded inside the HAVING clause too.

The syntax for the Subquery is generally the same as for a standard query. However, a Subquery does not typically include the GROUP BY, HAVING or ORDER BY clauses inside it.

b)Nested Subqueries: A subquery can be nested within other subqueries for a maximum nesting of 32 levels. However, subqueries that are nested more than two or three levels deep can be difficult to read and can result in poor performance. Here is an example of subqueries nested two-level deep.

►HANDS-ON: Type, Save and Test this program.

--Subquery2.sql
--List the salary of all employees who work more than 10 hours at a project
--that is located onthe site called ‘Marina’.
SELECT emp_ssn, emp_last_name, emp_salary
FROMemployee
WHERE emp_ssn(SELECT work_emp_ssn
FROMassignment
WHERE work_pro_numberIN(SELECT pro_number
FROMproject
WHERE pro_location = 'Marina'));
/* Outer Query *//* Subquery 1 *//* Subquery 2 */

The Subquery 2 runs first and generates a result that is made up of one column with two project numbers: 10 and 30. Because the result set is a single column, we must use the IN operator before Subquery 2 instead of the equal “=” operator. The equal operator can only be used for a result that is a single scalarvalue like 25.

Next, the Subquery 1uses the single column result provided to it by Subquery 2 (i.e., 10, 30) to find the ssn of all employees who work in projects 10 and 30. The result is again a single column made up of a list of ssn values of ‘999222222’, ‘999222222’, ‘999444444’, ‘999555555’, ‘999887777’ , and ‘999887777’. Notice that some employee ssn are listed twice because the same employee works in two projects concurrently. You can test this out by running Subquery 2 separately all by itself and hard wiring the numbers (10, 30) after the IN operator. Once again, we need to use the operator IN.

Finally, the Outer Query uses the list of ssn provided by Subquery 1 to find and display the respective employee names and their salaries.

Notice the use of THREE different tables in the queries: employee, assignment, and project.

a)When a Query (outer) needsprocessed data that is generated in run-time.

It is often necessary to process a query based on other processed data that is generated “on the fly” or “in run time”. For example, in example <Subquery1.sql> the Outer query needs the MINIMUM salary value to find relevant employees who make that salary. One way to provide thisneeded datato the Outer query is to visually check the employee table, find that 25000 is the minimum salary, and physicallyplug that number in the WHERE clause of the Outer query. This is simply not practical in large tables and tables that are changing frequently!Also, this process is very manual.

Another alternative is run a small query separately on the side to find the minimum salary, and then plug this minimum value into the Outer query. This method is tedious and also not automated!

Therefore, the most practicalsolution is to physically attach a Subquery to the Outer query that needs the required data that can be provided by the subquery. Whenever you run the Outer query, the Subquery will run first and automatically generate the needed data for the Outer query “on the fly”.

b)When a Query needs to span across multiple tables to get needed data (called a multi-table query).

It is often necessary for a query to navigate across two or more tables to extract and display required data. One way to achieve this to write a JOIN query that will join 2 or more related tables as if there were one big table and then pull and display the required fields (see previous JOINS lab exercise). Some people prefer this style. For example, the example <Subquery2.sql>shown above can be rewritten in terms of the JOIN operation to get the same result as follows.

►HANDS-ON: Type, Save and Test this program.

--JOINquery1.sql
--List the salary of all employees who work more than 10 hours at a project that is located at
--the location ‘Marina’.
--Notice the use of DISTINCT. Why?
SELECT DISTINCT emp_ssn, emp_last_name, emp_salary
FROM employee JOIN assignment ON (emp_ssn = work_emp_ssn)
JOIN project ON (work_pro_number = pro_number)
WHERE pro_location = 'Marina';

On the other hand, you may also use nested subqueries to navigate across multiple tables to get the same result as the JOIN method.Some people prefer this subquery method because they think that it is more elegant!And, this is what we already did in the example <Subquery2.sql>shown before.

Remember that most subqueries can be recoded as JOINs and most JOINs can be recoded as subqueries. However, there are subtle differences as the following discussion shows.

  • A JOIN can be used for multi-table queries.
  • The result of a JOIN operation can include columns from ALL tables that are participating in the JOIN operation. On the other hand, the result of a query that includes a Subquery can only include columns from the table named in the Outer query. Be careful!
  • A JOIN tends to be more intuitive when it uses an ad hoc relationship among tables when the relationships are not necessarily nested. An ER diagram will come in very handy here.
  • A JOIN query typically performs faster than the same query with a Subquery.

►HANDS-ON: Reload <Subquery2.sql>, Modify, Save and Test the program.

--Subquery3.sql
--List the salary of all employees who work more than 10 hours at a project that is located
--at ‘Marina’.
--Notice the “pro_location” field from the “project” table

SELECT emp_ssn, emp_last_name, emp_salary, pro_location
FROMemployee
WHERE emp_ssn (SELECT work_emp_ssn
FROMassignment
WHERE work_pro_numberIN(SELECT pro_number
FROMproject
WHERE pro_location = 'Marina'));
/* Outer Query *//* Subquery 1 *//* Subquery 2 */

You should get an error message about the field “pro_location”. Why? Read the error message.

  • A Subquery can be used for multi-table queries.
  • You can use a Subquery to pass an aggregate value like SUM and AVG to the Outer query.
  • A Subquery tends to be more intuitive when it uses an ad hoc relationship among tables when the relationships appear to be nested. An ER diagram will come in very handy here.
  • Long, complex queries can sometimes be easier to code using Subqueries.

The type of result (i.e., a table) returned by the subquery determines how it may be used, and which operators may be used by the outer query, to interact with the table returned by the subquery.

There are three basic types of subqueries.

a)Scalar Subqueries: Subqueries that return a single scalar value (1 row with 1 column) to the Outer query (e.g., 25). All of the comparison operators =, >, <, >=, <=, >can be usedin the WHERE clause of the Outer query. (See example Subquery1.sql above).

b)1 Col with N Rows Subqueries: Subqueries that return a single-column, multiple rows result to the Outer query. Only the following operators are allowed in the WHERE clause of the Outer query:

  • The operator IN (See example Subquery2.sql above).
  • One of the comparison operator =, >, <, >=, <=, >followed by the optional keywords ANY or ALL. (See examples below).

c)Correlated Subqueries which (unlike regular queries) depend on the data provided by the Outer query to the Subquery. Many a time, a correlated subquery is preceded by an EXISTS operator in the WHERE clause of the Outer query. If the correlated subquery returns some data (any type of data) to EXISTS, the outcome of EXISTS is true. If the correlated subquery returns NULL to EXISTS, the EXISTS evaluates to false. Therefore, EXISTS tests the existence of some data. (See examples below).

Since you are already familiar with the IN operator from previous exercises, we will jump right into hands-on without further ado. Remember, IN can only work with a result set that is a list (single columnwithmultiple rowsresult).

►HANDS-ON: Type, Save and Test this program.

--Subquery4.sql
--Find the employees who have "male" dependents using a subquery
SELECT emp_ssn, emp_last_name
FROM employee
WHERE emp_ssn IN (SELECT dep_emp_ssn
FROM dependent
WHERE dep_gender = 'M');

►HANDS-ON: Type, Save and Test this program.

--Subquery5.sql
--Find the list of employees who do not have dependents using a subquery
SELECT emp_ssn, emp_last_name
FROM employee
WHERE emp_ssn NOT IN (SELECT dep_emp_ssn
FROM dependent);
  • You can use the keyword ANY to test that a condition is true for one or more of the values returned by a Subquery.
  • If no rows are returned by the Subquery or all the rows returned by the subquery contain a NULL value, a comparison that uses ANY keyword is always false.
  • ANY and SOME are equivalent. SOME is the ANSI standard keyword.
  • The following table will help you understand the concept better.

Condition / Equivalent Expression / Description
X > ANY (1, 2) / X > 1 / X must be greater than at least one of the values returned by the subquery list, which means that it must be greater than the minimumvalue returned by the subquery.
X < ANY (1, 2) / X < 2 / X must be less than at least one of the values returned by the subquery list, which means that it must be less than the maximum value returned by the subquery.
X = ANY (1, 2) / (X = 1) OR (X = 2) / This condition is equivalent to X IN (1, 2).
X < >ANY (1, 2) / (X > 1) OR (X > 2) / This condition will evaluate to true for any non-empty result set containing at least one non-null value that is not equal to X

►HANDS-ON: Reload <Subquery1.sql>, Modify, Save and Test the program. Notice that the IN operator is substituted by an equivalent “= ANY”.You should get the same resultas before.

--Subquery6.sql
--List the salary of all employees who work more than 10 hours at a project that is located in
--‘Marina’.
SELECT emp_ssn, emp_last_name, emp_salary
FROM employee
WHERE emp_ssn = ANY(SELECT work_emp_ssn
FROM assignment
WHERE work_pro_number =ANY (SELECT pro_number
FROM project
WHERE pro_location = 'Marina'));

►HANDS-ON: Type, Save and Test this program.

--Subquery7.sql
--Find a list of any employee who has a salary
--greater than that of "any" employee with a salary that exceeds $30,000.
SELECT emp_ssn, emp_last_name, emp_salary
FROM employee
WHERE emp_salary ANY (SELECT emp_salary
FROM employee
WHERE emp_salary > 30000);
  • The Subquery returns the following list of salary that make more than 30000:

(38000, 43000, 43000, and 55,000).

  • In the Outer query“> ANY” finds employees whose salaries are greater than the minimum in the list (i.e., 38000).
  • You can use the ALL keyword to test that a comparison is true for all of the returned values by a Subquery.
  • If no rows are returned by the Subquery, a comparison that uses the ALL keyword is always true.
  • If all of the rows returned by the Subquery contain a null value, a comparison that uses the ALL keyword is always false.
  • The following table will help you understand the concept better.

Condition / Equivalent Expression / Description
X > ALL (1, 2) / X > 2 / X must be greater than all of the values returned by the subquery list, which means that it must be greater than the maximumvalue returned by the subquery.
X < ALL (1, 2) / X < 1 / X must be less than all of the values returned by the subquery list, which means that it must be less than the minimum value returned by the subquery.
X = ALL (1, 2) / (X = 1) AND (X = 2) / This condition can evaluate totrue only if the subquery returns a single value or if all the values returned by the subquery are thesame.
X < >ALL (1, 2) / (X > 1) AND (X > 2) / This condition is equivalent to NOT IN (1, 2)

►HANDS-ON: Type, Save and Test this program.

--Subquery8.sql
--Find a list of employees whose salaries are
--greater than the salary of all the employees in department 7.
SELECT emp_ssn, emp_last_name, emp_salary
FROM employee
WHERE emp_salary ALL (SELECT emp_salary
FROM employee
WHERE emp_dpt_number = 7);
  • EXISTS can be used whenever there is a requirement to execute an Outer query based on the result of a Subquerywithout regard to the actual quantity of the result. If a Subquery returns any data (scalar or vector), run the Outer query; otherwise, don’t. In other words, if something (anything) is returned by the subquery, run the outer query. Otherwise, don’t run.
  • If one or more rows are returned by the subquery, EXISTS evaluates to true. Consequently, you can use NOT EXISTS to test if no rows are returned.
  • If the subquery returns NULL to EXIST, the EXISTS evaluates to false.
  • Therefore, EXISTS tests the existence of some data. Since it only tests for existence of data, it does not matter what columns you specify in the SELECT clause of the Subquery. Typically, it is simpler and faster to use SELECT * in the subquery.
  • The EXISTS operator can be used with any type of query, but it is most often used with a Correlated Subquery (explained later).
  • EXISTS (any data)  true

EXISTS (null)  false.

►HANDS-ON: Type, Save and Test this program.

--subquery9.sql
--List all the project assignments, but only if projects exist in the project table
--May not be a very useful query, but it demonstrates the use of EXISTS.
--EXISTS makes very much more sense with a “Correlated Subquery” explained later
SELECT *
FROM assignment
WHERE EXISTS (SELECT *
FROM project);

COMPARISON

To understand Correlated Subqueries better, it is helpful to compare them with a regular Subquery.

Regular Subquery / Correlated Subquery (CorrSub)
a)Executes independently of the Outer query. It could be executed as a query in its own right. / a)It always refers to a value that is provided by a column in the table of Outer query. Therefore, a Correlated Subquery isdependent on the Outer Query.
b)Executes only once. / b)Executes repeatedly, once for each row in the table in Outer query, testing for some kind of a match in the WHERE clause of the CorrSub. Therefore, if the Outer table has 1000 rows, the CorrSub will run 1000 times. And, if the table used in the CorrSub has 500 rows of its own, the total number of executions can be (1000 x 500 = 50000)!
c)Returns one specificresult(scalar or vector) to the Outer query and then it becomes inactive. / c)Since it executes once for each row in the Outer query, and since it uses a value that is provided by a column in the row in outer query in question, and since this value varies depending on the row that is being processed, each execution of the CorrSub returns a different result to the Outer query. This is where the EXISTS operator will come in handy.
d)The tables in the Outer query and the Subquery may be the same or different. / d) The tables in the Outer query and the CorrSub may be the same or different.
e)Even if the tables are the same, no ALIAS is necessary because the Outer query and Subquery runs independently. / e) If the tables are the same, an ALIAS is necessary in one of the tables to remove ambiguity because the outer and inner query runs in a dependent fashion.

PROCESS

The execution process of a Correlated Subquery is similar to the typical loop in a programming language. For example:

FOR X = 1 TO 2

FOR Y = 1 TO 3

PRINT “X = “ X, “Y = “ Y

END

END

The loop will yield the following output:

X = 1 / Y = 1
X = 1 / Y = 2
X = 1 / Y = 3
X = 2 / Y = 1
X = 2 / Y = 2
X = 2 / Y = 3

Total number of iterations = 2 * 3 = 6. Note that the outer loop FOR X = 1 TO 2begins the processing by setting X = 1; then the inner loop FOR Y = 1 TO 3is completed for each X outer loop value.

The CorrSub uses the same sequence to produce results.

a) The DBMS initiates the Outer query by placing the pointer at very first row of the table in the Outer query.

b)For each row of the Outer query, the DBMS executes the CorrSub repeatedly by passing the value of a column in the current row of the table in Outer query to the inner CorrSub. This value is used in the CorrSub for matching purposes.