ML with circles

Mearns Learns

SQL

SQL recently entered my daughter’s vocabulary. She is not using it yet but has an awareness that it exists and an interest in it. I also had a friend that wanted a quick SQL crash course. This article covers the basics of SQL and provides a way to try them.

Setup

The examples use the Northwind database, developed by Microsoft for Access, that has been migrated to a SQLite database.

The Northwind database is for a fictional company called Northwind Traders. They import and export food from around the world. The database tracks their products, suppliers, customers, employees and orders using thirteen tables.

Source: github.com repository of JP White
Source direct link: Download database

The SQL is executed using DB Browser for SQLite.

For my local install, the database engine and browser are saved at: C:\Program Files\SQLite
The database is saved at: C:\Users\ [username] \Northwind

Tables

The tables holding data in the Northwind database are listed below.

Northwind

  • Categories
  • CustomerCustomerDemo
  • CustomerDemographics
  • Customers
  • EmployeeTerritories
  • Employees
  • Order Details
  • Orders
  • Products
  • Regions
  • Shippers
  • Suppliers
  • Territories

Getting started

Open DB Browser for SQLite.
From the toolbar click on Open Database.

SQL SQL

Navigate the folder holding the northwind.db file.

SQL SQL

Select the northwind.db file and click on the OK button.

SQL SQL

Click on the Execute SQL tab.

Enter the sample SQL into the SQL sub-tab.
Highlight the SQL statement that you want to run.
Click on the Execute icon. It looks like a Play button.
Alternately press Ctrl + Enter to execute the SQL.
The results are returned in the sub-tab below.

Basic selection

Select all rows from a table.

SELECT  
	*  
FROM  
	Employees;  

SQL SQL

Basic selection limit to 5 rows

Select a limited number of rows from a table.
This syntax is particular to SQLite.

SELECT 
	*
FROM
	Employees
LIMIT 5;

In other SQL databases, you would use this ANSI compliant SQL.

SELECT 
	*
FROM
	Employees
FETCH FIRST 5 ROWS ONLY;

You tend to only use this command when exploring, to understand what data is in a table, without having to return everything.

SQL SQL

I included this to introduce the fact that database vendors build their databases based on the ANSI SQL standard, adding their own proprietary SQL commands. In some cases, as above, the ANSI SQL commands haven’t been implemented. So SQL that works on one database might need to be tweaked to run on another.

Basic selection, specific fields

Instead of returning all data in a row, you can specify which columns to return.

SELECT
	EmployeeID,
	FirstName,
	LastName
FROM
	Employees;

SQL SQL

Basic selection, specific columns renamed

The column names in a table might not be exactly what you want to display in your results. An alternate column name can be provided to use in the results. If the name includes a space, it needs to be surrounded by double quotes.

SELECT
	EmployeeID,
	FirstName "First Name",
	LastName Surname
FROM
	Employees;

SQL SQL

You can also use the AS command when defining the column header.

SELECT
	EmployeeID,
	FirstName AS "First Name",
	LastName Surname
FROM
	Employees;

Basic selection, with dummy columns

You can include dummy columns to be returned in the results.

SELECT
	EmployeeID,
	FirstName,
	LastName,
	'P' EmployeeType
FROM
	Employees;

SQL SQL

Get the current date

Now a quick change of direction to setup the next topic. You can get todays date from the database.

SELECT 
	date('now');

It returns the current date in a YYYY-mm-dd format.

To change the format, SQLite uses the strftime command.

SELECT 
	strftime('%d/%m/%Y', date('now'));

SQL SQL

Notice how if a column heading isn’t specified it uses the command.

The commands to get the date and format it are specific to SQLite. The ANSI standard SQL to get the current date is CURRENT_DATE.

SELECT CURRENT_DATE;

But usage varies wildly between databases.

SQL server looks like this:

SELECT FORMAT(GETDATE(), 'dd/MM/yyyy');

Oracle looks like this:

SELECT TO_CHAR(SYSDATE, 'DD/MM/YYYY') 
FROM dual;

The simple rule is that Date commands vary depending on the database being used.
As a rule of thumb most date are returned as YYYYMMDD. The largest unit to the smallest unit.

Basic selection, with calculated columns

After that long diversion, onto the next topic. You can include calculated columns to be returned in the results. In this case we are calculating the number of days that the employee has worked for Northwind Traders.

SELECT
  EmployeeID,
  FirstName,
  LastName,
  HireDate,
  date('now') AS Today,
  julianday(date('now')) - julianday(HireDate) AS DaysEmployed
FROM
  Employees;

A column showing todays date doesn’t need to be included in the results. The column isn’t used directly in the calculations.

SQL SQL

The julianday command is specific to SQLite. It counts the numbers of days since noon on 1 January 4713 BCE, which is treated as zero.

Basic selection with a filter

Now we get onto filters. Instead of returning all rows in a table, we can apply one or more filters to reduce the rows returned.

SELECT
	*
FROM
	Employees
WHERE
	Country = 'UK';

SQL SQL

Northwind Traders have four employees working in the UK.

Basic selection with filters

Next, we will go wild and apply multiple filter criteria to show the type of filters that are available.

SELECT
	*
FROM
	Employees
WHERE
	Country = 'UK' AND
	EmployeeID > 5 OR
	HireDate < '2014-01-31' AND
	ReportsTo <> '' AND
	TitleOfCourtesy IN ('Mr.', 'Mrs.', 'Ms.') AND
	PostalCode LIKE '%'

SQL SQL

The criteria are joined with an AND or an OR.
The criteria themselves use equals, not equal to, less than, greater than, in a list or like a wild card. The filter criteria can be grouped by enclosing them in brackets. This provides clearer instructions when running the query and can make it easier to read.

Aggregation and Grouping

The data in tables can be aggregated using functions like COUNT, SUM, MAX, MIN, etc. The aggregation available depends on the data in the column. For example, the Employees.FirstName column cannot be aggregated with SUM because the column is filled with text values. You can COUNT the text values in the column.

In the code we are looking at the Products table and using the COUNT function to determine how many products Northwind Traders sell.

SELECT 
	COUNT(*) AS ProductCount
FROM 
	Products

SQL SQL

Seventy-seven products are sold by NorthWind Traders.

The GROUP BY command allows us to set how the aggregation is applied.
In the code we are going count the number of products in each product category.

SELECT 
	CategoryID, 
	COUNT(*) AS ProductCount
FROM 
	Products
GROUP BY 
	CategoryID;

SQL SQL

The seventy-seven products are divided across eight categories.

Joins

Another change in direction. So far, we have only been retrieving data from a single table. If we are going to retrieve data from other tables, we need to say how the tables are joined with each other.

The most common join that I use is the LEFT OUTER JOIN. It returns all the records from the first table and matching records from the second table.

Expressed as a Venn diagram, it would look like this.

SQL SQL

Select from two tables

Up to now we have only been selecting data from a single table. We are going to select data from two tables. A LEFT OUTER JOIN will be used to join the Employees table to the EmployeeTerritories table. The join will be where the EmployeeID columns match. Plus, we are going to add a filter and only return data for EmployeeID number one, Nancy Davolio.

SELECT
	Employees.EmployeeID, 
	Employees.FirstName,
	Employees.LastName,
	EmployeeTerritories.TerritoryID
FROM
 Employees 
 LEFT OUTER JOIN EmployeeTerritories
 ON Employees.EmployeeID = EmployeeTerritories.EmployeeID
WHERE
	Employees.EmployeeID = '1';

SQL SQL

Nancy has two territories. Notice how Nancy’s details from the first table are repeated.

Select from two tables using alias

Instead of typing the table name in full each time, you can give the table an alias. In this example, we will call Employees, A and EmployeeTerritories, B.

Less typing but in complex queries, I prefer using the table name as it is easier to follow what is happening.

SELECT
	A.EmployeeID, 
	A.FirstName,
	A.LastName,
	B.TerritoryID
FROM
 Employees A
 LEFT OUTER JOIN EmployeeTerritories B
 ON A.EmployeeID = B.EmployeeID
WHERE
	A.EmployeeID = '1';

SQL SQL

Select from three tables using alias

If you can do two tables, the next step is three or more.

In the SQL query, we want to return the name of the territory instead of the ID. So the EmployeeTerritories table is joined with the Territories table.

SELECT
	A.EmployeeID, 
	A.FirstName,
	A.LastName,
	C.TerritoryDescription
FROM
 Employees A
 LEFT OUTER JOIN EmployeeTerritories B 
 ON A.EmployeeID = B.EmployeeID
 LEFT OUTER JOIN Territories C 
 ON B.TerritoryID = C.TerritoryID
WHERE
	A.EmployeeID = '1';

SQL SQL

The same two results as before are returned but with the TerritoryDescription instead of the TerritoryID.

Changing data in tables

Up until now we have been looking at selecting data. The next sections look at changing, adding or deleting data in tables. Generally, you don’t do this, it is handled via the front-end application using the database. If you are going to make changes, make sure that you understand how the data is structured and stored in the database. It is very easy to create inconsistent orphaned records.

Update values in a table

By default, SQLite starts in auto-commit mode. This means when you run the SQL, the database is updated. If you make a mistake, you need to write more SQL to correct it.

What we are going to do is use transactions. This allows you to make an update, review it and then decide if you want to keep the update or throw it away.

To start a transaction use BEGIN TRANSACTION, after the transaction use COMMIT or ROLLBACK.

In this example we are going to update the title of one person.
First, we check the existing value.
Then we change it.
The change is reviewed.
If we are happy with the change and want to keep it, we COMMIT it.

BEGIN TRANSACTION;

	SELECT
		*
	FROM
		Employees
	WHERE
		EmployeeID = '8';
	--Select to show current value

		UPDATE 
			Employees
		SET
			TitleOfCourtesy = 'Dr.'
		WHERE  
			EmployeeID = '8';
	--Update the title from Ms. to Dr.

	SELECT
		*
	FROM
		Employees
	WHERE
		EmployeeID = '8';
	--Select to show new value

ROLLBACK;
--Rollback to undo update.  Always put first so if all commands are accidentally executed, the transaction is automatically rolled back.
COMMIT;
--Commit the transaction.

SQL SQL

The UPDATE, ROLLBACK and COMMIT commands do not generate results to display.

Notice at the end of the statement, the ROLLBACK comes before the COMMIT. This is good practice. If you accidentally click the execute icon with no SQL selected, it will run all the SQL. As the ROLLBACK comes before the COMMIT, all the changes are undone and there is nothing left to commit.

To do a manual reset of the update, run this SQL.

BEGIN TRANSACTION;
	UPDATE 
		Employees
	SET
		TitleOfCourtesy = 'Ms.'
	WHERE  
		EmployeeID = '8'
ROLLBACK;
COMMIT;

Insert values into a table

In this example two new people are added to the Employees table.

BEGIN TRANSACTION;

SELECT
	*
FROM
	Employees
WHERE
	EmployeeID > 9

	INSERT INTO
		Employees
			(LastName, FirstName, Title, TitleOfCourtesy, BirthDate, HireDate, Address, City, Region, PostalCode, Country, HomePhone, Extension, Photo, Notes, ReportsTo, PhotoPath) VALUES
	('Smith', 'John', 'Sales Coordinator', 'Mr.', '1980-01-01', '2015-02-01', '999 Albion Ave', 'Bristol', 'British Isles', 'BB1 3AA', 'UK', '(71) 555-2222', '1234',NULL , 'No details', '2', 'http://accweb/employees/smith.bmp'),
	('Smythe', 'Jane', 'Sales Leader', 'Ms.', '1988-01-01', '2016-12-01', '77 Smits Str', 'Manchester', 'British Isles', 'MM1 3BB', 'UK', '(71) 555-3456', '5678', NULL , 'No details', '2', 'http://accweb/employees/smythe.bmp');

SELECT
	*
FROM
	Employees
WHERE
	EmployeeID > 9

ROLLBACK;
--Rollback to undo update.  Always put first so if all commands are accidentally executed, the transaction is automatically rolled back.
COMMIT;
--Commit the transaction.

Notice how the EmployeeID field isn’t specified. The field is setup as AUTOINCREMENT so the database automatically assigns the next number.

SQL SQL

Delete row from a table

In this example we are going to remove the two people added in the previous step.

BEGIN TRANSACTION;

	SELECT 
		*
	FROM
		Employees
	WHERE
		EmployeeID > '9';
	--Check the data before running the delete.

	DELETE FROM
		Employees
	WHERE
		EmployeeID > '9';
		
	SELECT 
		*
	FROM
		Employees
	WHERE
		EmployeeID > '9';

ROLLBACK;
--Rollback to undo delete.  Always put first, before the COMMIT command. So, if all commands are accidentally executed, the transaction is automatically rolled back.
COMMIT;
--Commit the transaction.

SQL SQL

After the DELETE the SELECT statement returns no data for employees with an EmployeeID more than nine.

Create and drop tables

The final piece of the puzzle is creating and deleting tables. I tend to do this very rarely. The exception is creating a temporary table to help simplify very complex SQL.

Create a table

We are going to create a table to store the tax id for employees.

CREATE TABLE TaxID (
	EmployeeID INTEGER NOT NULL,
	TaxAuthority TEXT NOT NULL,
	TaxID TEXT NOT NULL
);

Then try and select data from the table.

SELECT
    * 
FROM 
   TaxID;

As there is no data in the table, only the column headings are returned.

SQL SQL

Delete a table

Next, we are going to delete the table that was created in the previous step.

DROP TABLE 
    TaxID;

Very easy, almost too easy?

Create a table from a query

This is useful to take a backup of a table before you start making changes. It also helps to simplify your SQL statement if it is getting very complex.

In the next SQL statement, a table is create based on the Employees table.

CREATE TABLE 
	tmpEmployeesUK 
AS 
SELECT 
	* 
FROM 
	Employees
WHERE 
	Country = 'UK'

A SQL statement to query the results.

SELECT 
	*
FROM 
	tmpEmployeesUK;

Showing all the UK based employees in the new table.

SQL SQL

Finally, the table is deleted after we no longer need it.

DROP TABLE 
    tmpEmployeesUK;

Miscellaneous

NULL

In tables there can be zeros or empty strings.

This is data.

NULL is simply nothing. The value is unknown and nothing has been assigned.

Comments

You can use comments to add notes. They are also useful to comment out sections of the SQL statement.

-- This is a single line comment
/* 
	This is a 
	multi line comment
*/

Convert

You can convert a field to other values by using a CASE statement. It should only be used for a short number of variables. If it gets too long, it is better to create a table and join it to your query.

	SELECT 
		CASE 
			WHEN  ReportsTo = '2' THEN 'Owner' 
			WHEN  ReportsTo = '5' THEN 'Manager' 
			ELSE 'Not Defined' END ReportsToCategory,
		*
	FROM
		Employees;

SQL SQL

BLOB

The Northwind database contains Binary Large Objects (BLOB), photos of the employees.

DB Browser for SQLite can display the BLOB data as a picture. Not all SQL tools can.

SELECT 
	EmployeeID, Photo
FROM
	Employees
WHERE
	EmployeeID = 1;

The binary code.

SQL SQL

The picture.

SQL SQL

What to look at next

Data types - Defines the type of data that can be entered into a column, for example DATE, INT, VARCHAR, etc.

Constraints - Restrict what data can be entered in a column, for example NOT NULL, UNIQUE, etc.

Primary Keys - A primary key is a column (or set of columns) in a table that uniquely identify each row. The key value must be unique and not have null values.

Foreign Keys - A foreign key is a column in one table that references the primary key in another table. This creates a relationship between the two tables.

Indexes - An index works as a lookup guide, similar to an index in a book. It speeds up data retrieval.

Data Control Language (DCL) - Manage user access and permissions to the data.

Finally, a SQL joke from xkcd, Exploits of a Mom.