Chapter 1 The Single Entity

I want to be alone.

Attributed to Greta Garbo

Learning Objectives

Students completing this chapter will be able to

write queries for a single-table database.

1.1 The relational model

The relational model introduced by Codd in 1970 is the most popular technology for managing large collections of data. In this chapter, the major concepts of the relational model are introduced. Extensive coverage of the relational model is left until Chapter 8, by which time you will have sufficient practical experience to appreciate fully its usefulness, value, and elegance.

A relation, similar to the mathematical concept of a set, is a two-dimensional table arranged in rows and columns. This is a very familiar idea. You have been using tables for many years. A relational database is a collection of relations, where relation is a mathematical term for a table. One row of a table stores details of one observation, instance, or case of an item about which facts are retained—for example, one row for details of a particular student. All the rows in a table store data about the same type of item. Thus, a database might have one table for student data and another table for class data. Similarly, each column in the table contains the same type of data. For example, the first column might record a student’s identification number. A key database design question is to decide what to store in each table. What should the rows and columns contain?

In a relational database, each row must be uniquely identified. There must be a primary key, such as student identifier, so that a particular row can be designated. The use of unique identifiers is very common. Telephone numbers and e-mail addresses are examples of unique identifiers. Selection of the primary key, or unique identifier, is another key issue of database design.

💠 Global legal entity identifier (LEI)

There is no global standard for identifying legal entities across markets and jurisdictions. The need for such a standard was amplified by Lehman Brothers collapse in 2008. Lehman had 209 registered subsidiaries, legal entities, in 21 countries, and it was party to more than 900,000 derivatives contracts upon its collapse. Key stakeholders, such as financial regulators and Lehman’s creditors, were unable to assess their exposure. Furthermore, others were unable to assess the possible ripple on them of the effects of the collapse because of the transitive nature of many investments (i.e., A owes B, B owes C, and C owes D).

The adoption of a global legal entity identifier (LEI), should improve financial system regulation and corporate risk management. Regulators will find it easier to monitor and analyze threats to financial stability and risk managers will be more able evaluate their companies’ risks.

The tables in a relational database are connected or related by means of the data in the tables. You will learn, in the next chapter, that this connection is through a pair of values—a primary key and a foreign key. Consider a table of airlines serving a city. When examining this table, you may not recognize the code of an airline, so you then go to another table to find the name of the airline. For example, if you inspect the next table, you find that AM is an international airline serving Atlanta.

International airlines serving Atlanta

Airline
AM
JL
KX
LM
MA
OS
RG
SN
SR
LH
LY

If you don’t know which airline has the abbreviation AM, then you need to look at the table of airline codes to discover that AeroMexico, with code AM, serves Atlanta. The two tables are related by airline code. Later, you will discover which is the primary key and which is the foreign key.

A partial list of airline codes

Code	Airline
AA	American Airlines
AC	Air Canada
AD	Lone Star Airlines
AE	Mandarin Airlines
AF	Air France
AG	Interprovincial Airlines
AI	Air India
AM	AeroMexico
AQ	Aloha Airlines

When designing the relational model, Codd provided commands for processing multiple records at a time. His intention was to increase the productivity of programmers by moving beyond the record-at-a-time processing that is found in most programming languages. Consequently, the relational model supports set processing (multiple records-at-a-time), which is most frequently implemented as Structured Query Language (SQL).¹

The relational model separates the logical design of a database from its physical storage. This notion of data independence simplifies data modeling and database programming. In this section, we focus on logical database design, and now that you have had a brief introduction to the relational model, you are ready to learn data modeling.

1.2 Getting started

As with most construction projects, building a relational database must be preceded by a design phase. Data modeling, our design technique, is a method for creating a plan or blueprint of a database. The data model must accurately mirror real-world relationships if it is to support processing business transactions and managerial decision making.

Rather than getting bogged down with a theory first, application later approach to database design and use, we will start with application. We will get back to theory when you have some experience in data modeling and database querying. After all, you did not learn to talk by first studying sentence formation; you just started by learning and using simple words. We start with the simplest data model, a single entity, and the simplest database, a single table, as follows.

Share code	Share name	Share price	Share quantity	Share dividend	PE ratio
FC	Freedonia Copper	27.5	10,529	1.84	16
PT	Patagonian Tea	55.25	12,635	2.50	10
AR	Abyssinian Ruby	31.82	22,010	1.32	13
SLG	Sri Lankan Gold	50.37	32,868	2.68	16
ILZ	Indian Lead & Zinc	37.75	6,390	3.00	12
BE	Burmese Elephant	0.07	154,713	0.01	3
BS	Bolivian Sheep	12.75	231,678	1.78	11
NG	Nigerian Geese	35.00	12,323	1.68	10
CS	Canadian Sugar	52.78	4,716	2.50	15
ROF	Royal Ostrich Farms	33.75	1,234,923	3.00	6

1.3 Modeling a single-entity database

The simplest database contains information about one entity, which is some real-world thing. Some entities are physical—CUSTOMER, ORDER, and STUDENT; others are abstract or conceptual—WORK ASSIGNMENT, and AUTHORSHIP. We represent an entity by a rectangle: the following figure shows a representation of the entity SHARE. The name of the entity is shown in singular form in uppercase in the top part of the rectangle.

The entity SHARE

An entity has characteristics or attributes. An attribute is a discrete element of data; it is not usually broken down into smaller components. Attributes are describe the data we want to store. Some attributes of the entity SHARE are share code, share name, share price, share quantity (number owned), share dividend, and PE ratio (price-to-earnings ratio).² Attributes are shown below the entity’s name. Notice that we refer to share price, rather than price, to avoid confusion if there should be another entity with an attribute called price. Attribute names must be carefully selected so that they are self-explanatory and unique. For example, share dividend is easily recognized as belonging to the entity SHARE.

The entity SHARE and its attributes

An instance is a particular occurrence of an entity (e.g., facts about Freedonia Copper). To avoid confusion, each instance of an entity needs to be uniquely identified. Consider the case of customer billing. In most cases, a request to bill Smith $100 cannot be accurately processed because a firm might have more than one Smith in its customer file. If a firm has carefully controlled procedures for ensuring that each customer has a unique means of identification, then a request to bill customer number 1789 $100 can be accurately processed. An attribute or collection of attributes that uniquely identifies an instance of an entity is called an identifier. The identifier for the entity SHARE is share code, a unique identifier assigned by the stock exchange to a firm issuing shares.

There may be several attributes, or combinations of attributes, that are feasible identifiers for an instance of an entity. Attributes that are identifiers are prefixed by an asterisk. The following figure shows an example of a representation of an entity, its attributes, and identifier.

The entity SHARE, its attributes, and identifier

Briefly, entities are things in the environment about which we wish to store information. Attributes describe an entity. An entity must have a unique identifier.

In this book, MySQL is the relational database for teaching SQL. Because SQL is a standard, it does not matter which implementation of the relational model you use as the SQL language is common across both the proprietary and open variants.[^singleentity-3]

SQL statements can be written in any mix of valid upper and lowercase characters. To make it easier for you to learn the syntax, this book adopts the following conventions:

SQL keywords are in uppercase.
Table and column names are in lowercase.

There are more elaborate layout styles, but we will bypass those because it is more important at this stage to learn SQL. You should lay out your SQL statements so that they are easily read by you and others.

The following table shows some of the data types supported by most relational databases. Other implementations of the relational model may support some of these data types and additional ones. It is a good idea to review the available data types in your RDBMS before defining your first table.

Some allowable data types

Category	Data type	Description
Numeric	SMALLINT	A 15-bit signed binary value
	INTEGER	A 31-bit signed binary value
	FLOAT(p)	A scientific format number of p binary digits precision
	DECIMAL(p,q)	A packed decimal number of p digits total length; q decimal spaces to the right of the decimal point may be specified
String	CHAR(n)	A fixed-length string of n characters
	VARCHAR(n)	A variable-length string of up to n characters
	text	A variable-length string of up to 65,535 characters
Date/time	DATE	Date in the form yyyymmdd
	TIME	Time in the form hhmmss
	timesTAMP	A combination of date and time to the nearest microsecond
	time with time zone	Same as time, with the addition of an offset from universal time coordinated (UTC) of the specified time
	timestamp with time zone	Same as timestamp, with the addition of an offset from UTC of the specified time
Logical	Boolean	A set of truth values: TRUE, FALSE, or UNKNOWN

The CHAR and VARCHAR data types are similar but differ in the way character strings are stored and retrieved. Both can be up to 255 characters long. The length of a CHAR column is fixed to the declared length. When values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed. VARCHAR columns store variable-length strings and use only as many characters as are needed to store the string. Values are not padded; instead, trailing spaces are removed. In this book, we use VARCHAR to define most character strings, unless they are short (less than five characters is a good rule-of-thumb).

The data for the share table will be used in subsequent examples. If you have ready access to a relational database, it is a good idea to now create a table and enter the data. Then you will be able to use these data to practice querying the table.

Data for share

*Code	Name	Price	Quantity	Dividend	PE
FC	Freedonia Copper	27.5	10,529	1.84	16
PT	Patagonian Tea	55.25	12,635	2.50	10
AR	Abyssinian Ruby	31.82	22,010	1.32	13
SLG	Sri Lankan Gold	50.37	32,868	2.68	16
ILZ	Indian Lead & Zinc	37.75	6,390	3.00	12
BE	Burmese Elephant	0.07	154,713	0.01	3
BS	Bolivian Sheep	12.75	231,678	1.78	11
NG	Nigerian Geese	35.00	12,323	1.68	10
CS	Canadian Sugar	52.78	4,716	2.50	15
ROF	Royal Ostrich Farms	33.75	1,234,923	3.00	6

1.4 Querying a single-table database

The objective of developing a database is to make it easier to use the stored data to solve problems. Typically, a manager raises a question (e.g., How many shares have a PE ratio greater than 12?). A question or request for information, usually called a query, is then translated into a specific data manipulation or query language. The most widely used query language for relational databases is SQL. After the query has been executed, the resulting data are displayed. In the case of a relational database, the answer to a query is always a table.

There is also a query language called relational algebra, which describes a set of operations on tables. Sometimes it is useful to think of queries in terms of these operations. Where appropriate, we will introduce the corresponding relational algebra operation.

Generally we use a four-phase format for describing queries:

A brief explanation of the query’s purpose
The query in italics, prefixed by •, and some phrasing you might expect from a manager
The SQL version of the query
The results of the query.

1.4.1 Displaying an entire table

All the data in a table can be displayed using the SELECT statement. In SQL, the all part is indicated by an asterisk (*).

List all data in the share table.

SELECT * FROM share;

Table 1.1: Displaying records 1 - 10
shrcode	shrfirm	shrprice	shrqty	shrdiv	shrpe
AR	Abyssinian Ruby	31.82	22010	1.32	13
BE	Burmese Elephant	0.07	154713	0.01	3
BS	Bolivian Sheep	12.75	231678	1.78	11
CS	Canadian Sugar	52.78	4716	2.50	15
FC	Freedonia Copper	27.50	10529	1.84	16
ILZ	Indian Lead & Zinc	37.75	6390	3.00	12
NG	Nigerian Geese	35.00	12323	1.68	10
PT	Patagonian Tea	55.25	12635	2.50	10
ROF	Royal Ostrich Farms	33.75	1234923	3.00	6
SLG	Sri Lankan Gold	50.37	32868	2.68	16

1.4.2 Project—choosing columns

The relational algebra operation project creates a new table from the columns of an existing table. Project takes a vertical slice through a table by selecting all the values in specified columns. The projection of share on columns shrfirm and shrpe produces a new table with 10 rows and 2 columns. The SQL syntax for the project operation simply lists the columns to be displayed.

Report a firm’s name and price-earnings ratio.

SELECT shrfirm, shrpe FROM share;

Table 1.2: Displaying records 1 - 10
shrfirm	shrpe
Abyssinian Ruby	13
Burmese Elephant	3
Bolivian Sheep	11
Canadian Sugar	15
Freedonia Copper	16
Indian Lead & Zinc	12
Nigerian Geese	10
Patagonian Tea	10
Royal Ostrich Farms	6
Sri Lankan Gold	16

1.4.3 Restrict—choosing rows

The relational algebra operation restrict creates a new table from the rows of an existing table. The operation restricts the new table to those rows that satisfy a specified condition. Restrict takes all columns of an existing table but only those rows that meet the specified condition. The restriction of share to those rows where the PE ratio is less than 12 will give a new table with five rows and six columns.

Restrict is implemented in SQL using the WHERE clause to specify the condition on which rows are restricted.

Get all firms with a price-earnings ratio less than 12.

SELECT * FROM share WHERE shrpe < 12;

Table 1.3: 5 records
shrcode	shrfirm	shrprice	shrqty	shrdiv	shrpe
BE	Burmese Elephant	0.07	154713	0.01	3
BS	Bolivian Sheep	12.75	231678	1.78	11
NG	Nigerian Geese	35.00	12323	1.68	10
PT	Patagonian Tea	55.25	12635	2.50	10
ROF	Royal Ostrich Farms	33.75	1234923	3.00	6

In this example, we have a less than condition for the WHERE clause. All permissible comparison operators are listed below.

Operator	Meaning
=	Equal to
<	Less than
<=	Less than orequal to
>	Greater than
>=	Greater than or equal to
<>	Not equal to

In addition to the comparison operators, the BETWEEN construct is available.

The expression a BETWEEN x AND y is equivalent to a >= x AND a <= y.

1.4.4 Combining project and restrict—choosing rows and columns

SQL permits project and restrict to be combined. A single SQL SELECT statement can specify which columns to project and which rows to restrict.

List the name, price, quantity, and dividend of each firm where the share holding is at least 100,000.

SELECT shrfirm, shrprice, shrqty, shrdiv FROM share
  WHERE shrqty >= 100000;

Table 1.4: 3 records
shrfirm	shrprice	shrqty	shrdiv
Burmese Elephant	0.07	154713	0.01
Bolivian Sheep	12.75	231678	1.78
Royal Ostrich Farms	33.75	1234923	3.00

1.4.5 More about WHERE

The WHERE clause can contain several conditions linked by AND or OR. A clause containing AND means all specified conditions must be true for a row to be selected. In the case of OR, at least one of the conditions must be true for a row to be selected.

Find all firms where the PE is 12 or higher and the share holding is less than 10,000.

SELECT * FROM share
  WHERE shrpe >= 12 AND shrqty < 10000;

Table 1.5: 2 records
shrcode	shrfirm	shrprice	shrqty	shrdiv	shrpe
CS	Canadian Sugar	52.78	4716	2.5	15
ILZ	Indian Lead & Zinc	37.75	6390	3.0	12

1.4.6 The power of the primary key

The purpose the primary key is to guarantee that any row in a table can be uniquely addressed. In this example, we use shrcode to return a single row because shrcode is unique for each instance of share. The sought code (AR) must be specified in quotes because shrcode was defined as a character string when the table was created.

Report firms whose code is AR.

SELECT * FROM share WHERE shrcode = 'AR';

Table 1.6: 1 records
shrcode	shrfirm	shrprice	shrqty	shrdiv	shrpe
AR	Abyssinian Ruby	31.82	22010	1.32	13

A query based on a non-primary-key column cannot guarantee that a single row is accessed, as the following illustrates.

Report firms with a dividend of 2.50.

SELECT * FROM share WHERE shrdiv = 2.5;

Table 1.7: 2 records
shrcode	shrfirm	shrprice	shrqty	shrdiv	shrpe
CS	Canadian Sugar	52.78	4716	2.5	15
PT	Patagonian Tea	55.25	12635	2.5	10

1.4.7 The IN crowd

The keyword IN is used with a list to specify a set of values. IN is always paired with a column name. All rows for which a value in the specified column has a match in the list are selected. It is a simpler way of writing a series of OR statements.

Report data on firms with codes of FC, AR, or SLG.

SELECT * FROM share WHERE shrcode IN ('FC','AR','SLG');

The foregoing query could have also been written as

SELECT * FROM share
  WHERE shrcode = 'FC' or shrcode = 'AR' or shrcode = 'SLG';

Table 1.8: 3 records
shrcode	shrfirm	shrprice	shrqty	shrdiv	shrpe
AR	Abyssinian Ruby	31.82	22010	1.32	13
FC	Freedonia Copper	27.50	10529	1.84	16
SLG	Sri Lankan Gold	50.37	32868	2.68	16

1.4.8 The NOT IN crowd

A NOT IN list is used to report instances that do not match any of the values.

Report all firms other than those with the code CS or PT.

SELECT * FROM share WHERE shrcode NOT IN ('CS','PT');

is equivalent to

SELECT * FROM share WHERE shrcode <> 'CS' AND shrcode <> 'PT';

Table 1.9: 8 records
shrcode	shrfirm	shrprice	shrqty	shrdiv	shrpe
AR	Abyssinian Ruby	31.82	22010	1.32	13
BE	Burmese Elephant	0.07	154713	0.01	3
BS	Bolivian Sheep	12.75	231678	1.78	11
FC	Freedonia Copper	27.50	10529	1.84	16
ILZ	Indian Lead & Zinc	37.75	6390	3.00	12
NG	Nigerian Geese	35.00	12323	1.68	10
ROF	Royal Ostrich Farms	33.75	1234923	3.00	6
SLG	Sri Lankan Gold	50.37	32868	2.68	16

❓ Skill Builder

List those shares where the value of the holding exceeds one million.

1.4.9 Ordering columns

The order of reporting columns is identical to their order in the SQL command. For instance, compare the output of the following queries.

SELECT shrcode, shrfirm FROM share WHERE shrpe = 10;

Table 1.10: 2 records
shrcode	shrfirm
NG	Nigerian Geese
PT	Patagonian Tea

SELECT shrfirm, shrcode FROM share WHERE shrpe = 10;

Table 1.11: 2 records
shrfirm	shrcode
Nigerian Geese	NG
Patagonian Tea	PT

1.4.10 Regular expression—pattern matching

Regular expression is a concise and powerful method for searching for a specified pattern in a nominated column. Regular expression processing is supported by languages such as Java, R, and PHP. In this chapter, we introduce a few typical regular expressions and will continue developing your knowledge of this feature in the next chapter.

1.4.10.1 Search for a string

List all firms containing ‘Ruby’ in their name.

SELECT shrfirm FROM share WHERE shrfirm REGEXP 'Ruby';

Table 1.12: 1 records
shrfirm
Abyssinian Ruby

1.4.10.2 Search for alternative strings

In some situations you want to find columns that contain more than one string. In this case, we use the alternation symbol ‘|’ to indicate the alternatives being sought. For example, a|b finds ‘a’ or ‘b’.

List firms containing gold or zinc in their name, irrespective of case.

SELECT shrfirm FROM share WHERE shrfirm REGEXP 'gold|zinc|Gold|Zinc';

Table 1.13: 2 records
shrfirm
Indian Lead & Zinc
Sri Lankan Gold

1.4.10.3 Search for a beginning string

If you are interested in finding a value in a column that starts with a particular character string, then use ^ to indicate this option.

List the firms whose name begins with Sri.

SELECT shrfirm FROM share WHERE shrfirm REGEXP '^Sri';

Table 1.14: 1 records
shrfirm
Sri Lankan Gold

1.4.10.4 Search for an ending string

If you are interested in finding if a value in a column ends with a particular character string, then use $ to indicate this option.

List the firms whose name ends with Geese.

SELECT shrfirm FROM share WHERE shrfirm REGEXP 'Geese$';

Table 1.15: 1 records
shrfirm
Nigerian Geese

❓ Skill builder

List the firms containing “ian” in their name.

1.4.11 Ordering rows

People can generally process an ordered (e.g., sorted alphabetically) report faster than an unordered one. In SQL, the ORDER BY clause specifies the row order in a report. The default ordering sequence is ascending (A before B, 1 before 2). Descending is specified by adding DESC after the column name.

List all firms where PE is at least 10, and order the report in descending PE. Where PE ratios are identical, list firms in alphabetical order.

SELECT * FROM share WHERE shrpe >= 10
  ORDER BY shrpe DESC, shrfirm;

Table 1.16: 8 records
shrcode	shrfirm	shrprice	shrqty	shrdiv	shrpe
FC	Freedonia Copper	27.50	10529	1.84	16
SLG	Sri Lankan Gold	50.37	32868	2.68	16
CS	Canadian Sugar	52.78	4716	2.50	15
AR	Abyssinian Ruby	31.82	22010	1.32	13
ILZ	Indian Lead & Zinc	37.75	6390	3.00	12
BS	Bolivian Sheep	12.75	231678	1.78	11
NG	Nigerian Geese	35.00	12323	1.68	10
PT	Patagonian Tea	55.25	12635	2.50	10

1.4.12 Numeric versus character sorting

Numeric data in character fields (e.g., a product code) do not always sort the way you initially expect. The difference arises from the way data are stored:

Numeric fields are right justified and have leading zeros.
Character fields are left justified and have trailing spaces.

For example, the value 1066 stored as CHAR(4) would be stored as ‘1066’ and the value 45 would be stored as ‘45’. If the column containing these data is sorted in ascending order, then ‘1066’ precedes ‘45’ because the leftmost character ‘1’ is less than ‘4’. You can avoid this problem by always storing numeric values as numeric data types (e.g., integer or decimal) or preceding numeric values with zeros when they are stored as character data. Alternatively, start numbering at 1,000 so that all values are four digits, though the best solution is to store numeric data as numbers rather than characters.

1.4.13 Derived data

One of the important principles of database design is to avoid redundancy. One form of redundancy is including a column in a table when these data can be derived from other columns. For example, we do not need a column for yield because it can be calculated by dividing dividend by price and multiplying by 100 to obtain the yield as a percentage. This means that the query language does the calculation when the value is required.

Get firm name, price, quantity, and firm yield.

SELECT shrfirm, shrprice, shrqty, shrdiv/shrprice*100 AS yield FROM share;

Table 1.17: Displaying records 1 - 10
shrfirm	shrprice	shrqty	yield
Abyssinian Ruby	31.82	22010	4.148334
Burmese Elephant	0.07	154713	14.285714
Bolivian Sheep	12.75	231678	13.960784
Canadian Sugar	52.78	4716	4.736643
Freedonia Copper	27.50	10529	6.690909
Indian Lead & Zinc	37.75	6390	7.947020
Nigerian Geese	35.00	12323	4.800000
Patagonian Tea	55.25	12635	4.524887
Royal Ostrich Farms	33.75	1234923	8.888889
Sri Lankan Gold	50.37	32868	5.320627

You can give the results of the calculation a column name. In this case, a good choice is yield. Note the use of AS to indicate the name of the column in which the results of the calculation are displayed.

In the preceding query, the keyword AS is introduced to specify an alias, or temporary name. The statement specifies that the result of the calculation is to be reported under the column heading yield. You can rename any column or specify a name for the results of an expression using an alias.

1.4.14 Aggregate functions

SQL has built-in functions to enhance its retrieval power and handle many common aggregation queries, such as computing the total value of a column. Four of these functions (AVG, SUM, MIN, and MAX) work very similarly. COUNT is a little different.

1.4.14.1 COUNT

COUNT computes the number of rows in a table. Use COUNT(*) to count all rows irrespective of their content (i.e., null or not null), and use COUNT(columnname) to count rows without a null value for columname. Count can be used with a WHERE clause to specify a condition.

How many firms are there in the portfolio?

SELECT COUNT(shrcode) AS investments FROM share;

Table 1.18: 1 records
investments
10

How many firms have a holding greater than 50,000?

SELECT COUNT(shrfirm) AS bigholdings FROM share WHERE shrqty > 50000;

Table 1.19: 1 records
bigholdings
3

1.4.14.2 AVG—averaging

AVG computes the average of the values in a column of numeric data. Null values in the column are not included in the calculation.

Find the average dividend.

SELECT AVG(shrdiv) AS avgdiv FROM share;

Table 1.20: 1 records
avgdiv
2.031

What is the average yield for the portfolio?

SELECT AVG(shrdiv/shrprice*100) AS avgyield FROM share;

Table 1.21: 1 records
avgyield
7.530381

1.4.14.3 SUM, MIN, and MAX

SUM, MIN, and MAX differ in the statistic they calculate but are used similarly to AVG. As with AVG, null values in a column are not included in the calculation. SUM computes the sum of a column of values. MIN finds the smallest value in a column; MAX finds the largest.

1.4.15 Subqueries

Sometimes we need the answer to another query before we can write the query of ultimate interest. For example, to list all shares with a PE ratio greater than the portfolio average, you first must find the average PE ratio for the portfolio. You could do the query in two stages:

SELECT AVG(shrpe) FROM share;

and

SELECT shrfirm, shrpe FROM share WHERE shrpe > x;

where x is the value returned from the first query.

Unfortunately, the two-stage method introduces the possibility of errors. You might forget the value returned by the first query or enter it incorrectly. It also takes longer to get the results of the query. We can solve these problems by using parentheses to indicate the first query is nested within the second one. As a result, the value returned by the inner or nested subquery, the one in parentheses, is used in the outer query. In the following example, the nested query returns 11.20, which is then automatically substituted in the outer query.

Report all firms with a PE ratio greater than the average for the portfolio.

SELECT shrfirm, shrpe FROM share
  WHERE shrpe > (SELECT AVG(shrpe) FROM share);

Table 1.22: 5 records
shrfirm	shrpe
Abyssinian Ruby	13
Canadian Sugar	15
Freedonia Copper	16
Indian Lead & Zinc	12
Sri Lankan Gold	16

⚠️ The preceding query is often mistakenly written as SELECT shrfirm, shrpe from share WHERE shrpe > avg(shrpe); You need to use a subquery to find the average, so the computed value can be used in the outer query

❓ Skill builder

Find the name of the firm for which the value of the holding is greatest.

1.4.15.1 DISTINCT—eliminating duplicate rows

The DISTINCT clause is used to eliminate duplicate rows. It can be used with column functions or before a column name. When used with a column function, it ignores duplicate values.

Report the different values of the PE ratio.

SELECT DISTINCT shrpe FROM share;

Table 1.23: 8 records
shrpe
13
3
11
15
16
12
10
6

Find the number of different PE ratios.

SELECT COUNT(DISTINCT shrpe) as 'Different PEs' FROM share;

Table 1.24: 1 records
Different PEs
8

When used before a column name, DISTINCT prevents the selection of duplicate rows. Notice a slightly different use of the keyword AS. In this case, because the alias includes a space, the entire alias is enclosed in straight quotes.

Quotes

There are three types of quotes that you can typically use with SQL. Double and single quotes are equivalent and can be used interchangeably. Note that single and double quotes must be straight rather than curly, and the back quote is to the left of the 1 key.

Type of quote	Representation
Single	’
Double	“
Back	`

The following SQL illustrates the use of three types of quotes to find a person with a last name of O’Hara and where the column names are person first and person last.

SELECT `person first` FROM person WHERE `person last` = "O'Hara";

Table 1.25: 1 records
person first
Sheila

Summary

In SQL, queries are written using the SELECT statement. Project (choosing columns) and restrict (choosing rows) are common table operations. The WHERE clause is used to specify row selection criteria. WHERE can be combined with IN and NOT IN, which specify values for a single column. The rows of a report are sorted using the ORDER BY clause. Arithmetic expressions can appear in SQL statements, and SQL has built-in functions for common arithmetic operations. A subquery is a query within a query. Regular expressions are used to find string patterns within character strings. Duplicate rows are eliminated with the DISTINCT clause.

Key terms and concepts

Alias	Instance
AS	MAX
Attribute	MIN
AVG	NOT IN
Column	ORDER BY
COUNT	Primary key

Data modeling	Relational database
Data type
Database	Row
	SELECT
DISTINCT	SQL
Entity	Subquery
Entity integrity rule	SUM
Identifier	Table
IN	WHERE

Exercises

Do the following queries using SQL:
1. List a share’s name and its code.
2. List full details for all shares with a price less than $1.
3. List the names and prices of all shares with a price of at least $10.
4. Create a report showing firm name, share price, share holding, and total value of shares held. (Value of shares held is price times quantity.)
5. List the names of all shares with a yield exceeding 5 percent.
6. Report the total dividend payment of Patagonian Tea. (The total dividend payment is dividend times quantity.)
7. Find all shares where the price is less than 20 times the dividend.
8. Find the share(s) with the minimum yield.
9. Find the total value of all shares with a PE ratio > 10.
10. Find the share(s) with the maximum total dividend payment.
11. Find the value of the holdings in Abyssinian Ruby and Sri Lankan Gold.
12. Find the yield of all firms except Bolivian Sheep and Canadian Sugar.
13. Find the total value of the portfolio.
14. List firm name and value in descending order of value.
15. List shares with a firm name containing “Gold.”
16. Find shares with a code starting with “B.”
Run the following queries and explain the differences in output. Write each query as a manager might state it.
1. SELECT shrfirm FROM share WHERE shrfirm NOT REGEXP ‘s’;
2. SELECT shrfirm FROM share WHERE shrfirm NOT REGEXP ‘S’;
3. SELECT shrfirm FROM share WHERE shrfirm NOT REGEXP ‘s|S’;
4. SELECT shrfirm FROM share WHERE shrfirm NOT REGEXP ‘^S’;
5. SELECT shrfirm FROM share WHERE shrfirm NOT REGEXP ‘s$’;

Officially pronounced as “S-Q-L,” but often also pronounced as “sequel”.↩︎
Attributes are shown in italics.↩︎