Featherman’s T-SQL Primer©– Run all queries using the AdventureWorksDW 2012 data warehouse
The goal of business intelligence is to build an IT infrastructure, set of integrated systems, scripted procedures and staff that can reliably and repeatedly to turn mountains of data into actionable information. Often the information is formatted into routine and exception reports, trend reports, dashboards, charts, and scorecards. The easy part is makingthe charts. The work (and therefore higher payscale) is in understanding the data, creating data models and like a sculptor or blacksmith, pounding it into shape until it shines in its insightfulness.
The insights needed to make better decisions starts with data management - capturing, cleaning, filtering, and aggregating the data. This document is designed to provide you an introduction to this process.It’s very fun to gain mastery over data, if you don’t the alternative is a lot of copying and pasting in Excel! Your goal is to gain mastery over data, and become self-sufficient so that the hands-on process of extracting and compiling data does not slow down your inquisitive mind.

We have created database tables, learned about data types, and even implemented primary and foreign keys with different graphical designers. We have learned that there are different ways to generate and run SQL queries:

  • In Access you can use the GUI Query by Example (QBE) facility
  • In SSRS you can use the GUI query designer to build a SQL query
  • In SQL Server Management Studio you can use the Query |Design query in Editor option
  • You can use a Microsoft Query inside Excel’s external data connection in a QBE facility
  • In Access you can write SQL code
  • In SQL Server Management Studio you can write SQL code
  • In a Microsoft Query inside Excel’s external data connection you can write SQL
  • Using an External data connection in Excel, and adding some filters is running SQL behind the scenes

So what’s so great about writing SQL code? It is time-consuming to learn and you must follow the syntax rules, so no one will ever say its easy to learn.
But SQL is powerful, it’s the lingua franca of data, and if you don’t learn how to use it, you are stunting your growth as a BI analyst. So now it is time to learn more SQL tricks and tips. This document is designed to assist you to move to the next step of control and mastery of SQL SELECT statements. Learning SQL is a path to a higher pay grade for the business analyst.
Note: Please google ‘SQL Statements’ to see helpful readings from your friendly online DBA’s. Also a Youtube search for ‘useful SQL Select statements will reap many good resources. Please open up a new query, select the AdventureWorks2012DW database and paste in each query syntax one at a time. Next create your own queries on different tables, using these examples. Practice reduces errors, and increases talent level.

Simplified AdventureWorks 2012 data warehouse logical data model


Retrieving and formatting filtered data – you are asked to copy each query from the first column and run them, then experiment using different field names, filters, etc. It is important to keep the blue capitalized keywords in the order that you see them.

USE [AdventureWorksDW2012];
SELECT [ProductAlternateKey], [EnglishProductName], [StandardCost]
FROM [dbo].[DimProduct] / The SELECTline specifies what columns to retrieve (no typos please). The FROM specifies what table to retrieve the data from.
Let’s take a look at the product line of a bicycle company called Adventureworks. This query on the left selects three columns from the Dim Product table. When you press F5 or the Execute button o the toolbar you should see a list of 509 products.
We should filter the products though, since many of them are not for sale (ie nuts, bolts, decals), rather they are used internally in the production process. The next query solves this problem.
USE [AdventureWorksDW2012];
SELECT [ProductAlternateKey] as [Product ID]
, [EnglishProductName] as [Product Name]
,FORMAT([StandardCost],'N2')as [Product Cost]
FROM [dbo].[DimProduct]
WHERE [StandardCost] ISNOTNULL
ORDERBY [Product ID] / A few improvements are provided
a)We can filter out the rows with NULL values(now 395 products are displayed which are sold to customers).
b)We can provide better field names using the as [field name] syntax. If you are using spaces in field names be sure to encase the field name with brackets. These [field names] will become the column headers if you copy the results to Excel, or create a dataset in SSRS.
c)We also can us the FORMAT(fieldname, format code) statement to format the number. Here a comma notation with 2 decimals is effected with the ‘N2’ syntax.
d)Sorting the output is handled by the ORDER BY statement. You can use different levels of sorting.
USE [AdventureWorksDW2012];
SELECT [ProductAlternateKey] as [Product ID]
, [EnglishProductName] AS [Product Name]
,FORMAT([StandardCost],'N2')as [Product Cost]
FROM [dbo].[DimProduct]
WHERE [StandardCost] ISNOTNULL
AND [ProductSubcategoryKey] = 3
AND [StandardCost] 500
ORDERBY [Product ID]
Notice that you can include fields in the WHERE filter clause, and you do not have to display these fields. You can just use them to filter.Also the more filters you set the fewer records will be retrieved as the search is more specific. / You can add one or many different filters to the WHERE statement, which is pretty important because you rarely want ALL the data rows from a table.
For specifying filters in the where clause, dates and words have to be in 'single quotes' and numbers do not. You have to know the datatype of the column that you are filtering on first, before you craft your WHERE statement. You can do that by using SSMS's Object explorer to find the database and table then right-clicking the table name and columns to see the column names and their datatypes.
The BETWEEN statement can also be used to filter by a range such as BETWEEN 500 AND 1000. This is the same as typing fieldname > 500 AND fieldname < =1000
Try changing the subcategory filter criteria to see that the records retrieved and displayed will change. How many expensive products are in each sub-category?
USE [AdventureWorksDW2012];
SELECT [ProductSubcategorykey]
, [ProductAlternateKey] AS [Part #]
, [EnglishProductName] AS [Product],[Color]
FROM [dbo].[DimProduct] as p
WHEREEnglishProductNameLIKE'%Mountain%'
------you can also substitute
WHEREEnglishProductNameNOTLIKE'%Mountain%'
to see all products notwith the word Mountain in the name / Here the WHERE clause uses a wild card to retrieve all records that are close to a search term. Here any product with the word mountain in it will be selected. You can for example retrieve all items in a product line that has a similar naming convention.
You can also use WHEREEnglishProductNameNOTLIKE'%Mountain%'to get all the other products without the word mountain.
USE [AdventureWorksDW2012];
SELECT [EnglishProductName], [Class]
,Format([ListPrice],'N0')as [List Prce]
FROM [dbo].[DimProduct]
WHERE [EnglishProductName] LIKE'Road%Red%'
---you can also substitute in the following line
WHERE [EnglishProductName] LIKE'%52%' / What if you want to see the product line for the Road bikes?
a)What if you want to just see the red road bikes?
b)What if the customer has been measured and sized for the 52 cm model? Can you show a list of products that they can further examine?
c)Can you retrieve a list of seats? Of bicycle helmets?
Take some time to try out different wild card search terms.
USE [AdventureWorksDW2012];
SELECTCONCAT([FirstName],' ', [LastName])AS Employee
, [Title],[EmailAddress]
,FORMAT([BaseRate],'N2')asBaserate
FROM [dbo].[DimEmployee]
WHERE [LastName] BETWEEN'A%'AND'M%'
ORDERBYLastName / How can HR send an email to approximately half of the employees? Here we pull a list of employees whose last names are in the range A thru L. Notice the use of M as the ending range, the ending range is not inclusive.
The outcome of the query is a list of employees. Presumably the list can be created within a stored procedure that is fed to SMTP server for email distribution. Of course the list could be further sorted by job title or other HR information.
Notice also that you can order (sort) by a column that isn’t in the resultset. Can you retrieve a list of employees whose last name begins with the same letter as your last name?
USE [AdventureWorksDW2012];
SELECT [EnglishProductName], Color, [ListPrice]
FROM [dbo].[DimProduct]
WHERE Color 'silver'ANDListPriceISNOTNULL / Sometimes you want to exclude one (or more) category or value. You can use this fieldname > ‘value’ to accomplish this.
USE [AdventureWorksDW2012];
SELECT [EnglishProductName], Color, [ListPrice]
FROM [dbo].[DimProduct]
WHERE Color IN('Black','Red','Blue')
ANDListPriceISNOTNULL
ORDERBY Color / Rather than write the filter:
WHERE Color ='Black'OR Color ='Red'or Color ='Blue'
You can the most excellent WHERE fieldname IN (criteria list) syntax.
We will use this IN statement later in SSRS reporting. Use the IN statement
to specify several criteria to be used as filters for the same field (here color).
Try a different field to filter on. First look at the values for the field, then build your filter.
USE [AdventureWorksDW2012];
SELECT MIN([ListPrice])AS [Min Sales Price]
,MAX([ListPrice])AS [Max Sales Price]
,AVG([ListPrice])AS [Average Sales Price]
,COUNT([ListPrice])AS [Num Products]
FROM [dbo].[DimProduct]
WHERE [ProductSubcategoryKey] = 3 / You can perform some interesting statistics such as the minimum, maximum values in a column, the column average value and the count of the rows
For the count, you can choose any column or use COUNT(*) from fieldname.
Take a look at the measures for several of the product sub-categories. Wouldn’t it be helpful to see the name of the Product Subcategory rather than just a number?
USE [AdventureWorksDW2012];
SELECT DISTINCTPromotionKey
FROMFactResellerSales
ORDERBYPromotionKey / Sometimes you have a large table and you want to quickly pull out a list of unique values from it. You can use the DISTINCT keyword to do the trick.
Can you retrieve unique values for other fields?

Creating new columns of data from existing columns of data
Youwill eventually need more analytical measures (analytics) than are currently created in your database tables. You can create calculated columns of derived data. The calculation is performed once for every row in the resultset. Note new columns of data are virtual and therefore not stored in a database table, rather calculated ‘on-the-fly’. You can create calculated fields within a report or spreadsheet, but that methodology introduces errors. For governance and consistency, its best to drive the calculations down into the SQL and then used as the datasource for reports and spreadsheets (SQL gets packaged up as a stored procedure, is tested, and reused). Packaging up SQL SELECT statement that performs complex calculations and creates new columns of measures helps with automation of the business analytics creation process. Also the calculation is performed once, rather than recreated in different spreadsheets. Tables never have all the measures and values that are needed by analysts, so creating new columns is very important and common task.

USE [AdventureWorksDW2012];
SELECT [ProductAlternateKey], [EnglishProductName]
,FORMAT([StandardCost],'N0')AS Cost
,FORMAT([ListPrice],'N0')AS [Sales Price]
,FORMAT(((ListPrice-StandardCost)/Standardcost),'P2')AS [ProfitPct]
FROM [dbo].[DimProduct]
WHERE [ProductSubcategoryKey] = 3
AND [StandardCost] BETWEEN 500 and 1000 / The FORMAT clause takes the form FORMAT(expression, ‘formatcode’) where the popular format codes are P2, N0, C0 which are percent with 2 decimals, numeric with 0 decimals and currency with 0 decimals respectively.
Notice the 4th line you can multiply and divide columns to create a new calculated column. Here the profit margin for each product is calculated. Be sure to count the number of left and right parentheses.
Go ahead and experiment by changing the sub-category to a different number in the range 1 – 35. You can also change the standard cost range to see a different set of products in the resultset.
USE [AdventureWorksDW2012];
SELECT ResellerKey,[SalesOrderNumber]
,[SalesOrderLineNumber]AS[Line #]
,(([UnitPrice]*[OrderQuantity])-[DiscountAmount])
AS[Line Total]
FROM[dbo].[FactResellerSales]
WHERE [OrderDate]BETWEEN'1/1/2008'AND'12/31/2009'
ORDER BY[SalesOrderNumber], [SalesOrderLineNumber] / This query is another example of creating a calculated column of derived data. The line total for each sale order is calculated once for each row.
Can you think of other calculated fields that would be insightful for an analyst to use?

Performing Aggregating Functions– to date we have pulled filtered lists of records, and added new columns of data which are helpful to analysis. You have learned so much useful and time saving material. We introduce the very powerful SUM(fieldname), COUNT(fieldname) and AVG(fieldname) aggregating functions. These aggregating functions work together with the GROUP BY() function. So for example a column can be totaled and return one value for an entire group of records (ie average sales price for a product sub-category). You can use different combinations of fields in the GROUP BY statement to total up a column based on different criteria. This functionality is similar to Excel’s SUMIF() and SUMIFS() functions.

USE [AdventureWorksDW2012]; / Can we perform a sales analysis by sales representative?
SELECTrs.EmployeeKey
,COUNT([SalesOrderNumber])as [Num Orders]
,SUM([OrderQuantity])as [Total Units Sold]
,FORMAT(SUM([SalesAmount]),'N0')as [Total Revenue]
FROM [dbo].[FactResellerSales] asrs
GROUPBYrs.EmployeeKey
ORDERBY [Total Revenue] DESC / This is a report that aggregates sales in the reseller network by sales rep. A count of the orders is provided, a count of the number of units moved, and total revenue generated is also calculated.
This first sales by employee list is sorted by revenue with the top seller at the top of the list.
Many improvements are possible. The name of the employee can be included. The results can be sliced by time period, and it would be informative to filter by product category as well. Further there are no benchmarks or sales goals provided to identify if the sales performance is acceptable. Adding calculated columns to compare actual sales to goal is covered later in the course.
The second query adds a geography component. It would also be helpful if the names of the sales territories were included (or even if the data was demonstrated on a map).
Remember that when you add an extra non-aggregating column to the select statement (here SalesTerritoryKey), you must add it to the GROUP BY statement also.
USE [AdventureWorksDW2012];
SELECTrs.EmployeeKey, [SalesTerritoryKey]
,COUNT([SalesOrderNumber])as [Num Orders]
,SUM([OrderQuantity])as [Total Units Sold]
,SUM([SalesAmount])as [Total Revenue]
FROM [dbo].[FactResellerSales] asrs
GROUPBYrs.EmployeeKey, [SalesTerritoryKey]
ORDERBYrs.EmployeeKey
USE [AdventureWorksDW2012];
SELECT [OrderDate]
,COUNT([SalesOrderNumber])as [Num Orders]
,SUM([OrderQuantity])as [Total Units Sold]
,FORMAT(SUM([SalesAmount]),'N0')as [Total Revenue]
FROM [dbo].[FactInternetSales] asi
GROUPBY [OrderDate]
ORDERBY [OrderDate] / Can we calculate the total Internet channel sales by date? Go ahead and run this query, and copy the resultset into Excel and draw some line charts.
Add some improvements to this query and resultset. What else can you filter the results by? What else can you group the totals on? Can you think of another filed to add sub-groups?
USE [AdventureWorksDW2012];
SELECT Top 10 CustomerKey
,STR(SUM(SalesAmount))ASTotals
FROM[dbo].[FactInternetSales]
WHERE [OrderDate]BETWEEN'1/1/2008'AND'12/31/2009'
GROUPBY CustomerKey
ORDER BY Totals DESC / The SELECT TOP N syntax just cuts down the # of rows that are returned in the resultset after the GROUP BY and ORDER BY syntax do their job. It’s probably a good idea to use the TOP N syntax on the same column as the GROUP BY. You can perhaps select the top 2, top 5, top 10 records or any number you wish. You can see that this query would be the datasource for a pareto chart of top 10 manufacturing defects.
This query performs the grouping and aggregating then just keeps the top10. You can also use TOP N Percent (ie TOP 5PERCENT). IF you want the bottom 10 then perform a reverse sort (DESC) in your ORDER BY statement. Each of theGROUP BY queries to date can be further filtered using the TOP function, such as generating lists of top selling products, top performing customers, etc.
TheSTR(field) is an old-school way to apply a little formatting to the number. This is fine for simple printing of resultsets from SSMS or Excel, but it actually converts the field to a string, which renders it useless in some reporting programs for subsequent calculations.
USE [AdventureWorksDW2012];
SELECT ResellerKey,[SalesOrderNumber],STR(SUM(SalesAmount))ASTotals
FROM[dbo].[FactResellerSales]
WHERE [OrderDate]BETWEEN'1/1/2008'AND'12/31/2009'
GROUPBY ROLLUP(ResellerKey,SalesOrderNumber)
ORDERBYResellerKeyDESC / This query adds totals on the end of each grouping, which is useful if you are copying the data to Excel for formatting.
The ROLLUP syntax can provide a subtotal for the terms specified in the ROLLUP term. Therefore the rollup totals can occur once for the resultset, several times, or many times.
Again include all non-aggregating fields from the SELECT line into the GROUP BY line. You probably want to limit the number of these fields to enable useful groupings
USE [AdventureWorksDW2012];
SELECT [SalesOrderNumber]
,FORMAT([OrderDate],'d','en-US')as [Order Date]
,FORMAT(SUM(SalesAmount),'N0')AS Totals
FROM [dbo].[FactInternetSales]
WHERE [OrderDate] BETWEEN'1/1/2007'AND'12/31/2007'
GROUPBY [OrderDate], [SalesOrderNumber]
HAVINGSUM(SalesAmount) 1000
ORDERBY [OrderDate], [SalesOrderNumber] DESC / Ok. After you make the groups and calculate totals for the groups (or counts or averages) you might not want to keep all the groups. There is another type of filtering that you can perform on the groups.
This formula
a)first filters out some of the rows using the WHERE statement,
b)next performs the grouping statement requested, and calculates the aggregating columns
c)After the groups are made thentheHAVINGfilter is applied which filters the groups.
Here a filter was applied based on the same grouping statement, you can research other methods at a later date.

Getting more columns from the database – The data we need to create reports is usually in multiple tables, and often in multiple databases. We use JOIN functions to connect the tables. Usually you will use an INNER JOIN to select only the data from a fact table that has values in the dimension table, for example showing production orders for only those products that had production activity. (Use a LEFT JOIN for example when you want to show all the products whether they had production activity or not).