DISC 210 Week 4

= Basic Concepts of Relational Databases =

Introduction
A database management system (DBMS) is a software system that:
 * allows a person to store, use, or modify data in a database,
 * controls shared access to a database, and
 * provides mechanisms to help ensure the integrity and security of the shared data.

We will be using the MySQL DBMS, which is used by companies such as Google, Yahoo!, and YouTube. (And your professor)

More specifically, among its functions, a DBMS typically has procedures to
 * 1) set up the storage structures in secondary memory for the data;
 * 2) load the data;
 * 3) accept requests (or queries) from programs or users,
 * 4) format retrieved data so that it appears in a form appropriate for the person or program that requested it;
 * 5) accept and perform updates to the data;  this could include adding new data, deleting data, or changing the values of existing data;
 * 6) perform data backup and recovery;
 * 7) allow concurrent use of the data without having users interfere with one another;
 * 8) deny access to data to users who are not authorized such access.

Relational Databases
Since the mid 1980s the dominant paradigm for database has been the relational database. The term “relational” comes from the mathematical concept of a relation, which comprises the theoretical foundation on which this approach to database design rests (earlier approaches were based on what were known as the “hierarchical” approach or on the “network” approach). We are not going to go into this theory here. Instead we shall simply note that relational databases are organized using tables, although formally these tables are called relations.

The columns of the relation (table) are known as the relation’s attributes, while each row of a relation is known as a tuple. Since a (database) relation is modeled on a mathematical relation, and since a mathematical relation is a set, then a database relation should exhibit the properties of a set. One of the properties of a set is that all of its elements are distinct. Likewise, all of the tuples of a relation are expected to be distinct. What this means is that any pair of tuples in a relation, there must be at least one attribute where the values of these tuples are different. This will become clearer when we look at some examples of relational databases, which we now do.

Examples of simple relational databases
Employee-Project-AssignedTo database (EPA): This database attempts to capture information about a company’s employees, the projects the company is working on, and who is working on each project (or what projects each employee is working on).

Employee Table

Project

AssignedTo

Student-Class-Enrollment database (SCE): This database attempts to capture information about a university’s studentrs, the courses the university offers, and who is enrolled in each course (or what courses each student is taking).

Student

Class

Enrollment

You will note that in both examples a relation plays one of two roles:
 * Some relations provide information about certain things: The Employee, Project, Student, and Class relations all illustrate this.
 * Other relations bring together items from the above types of relations: The AssignedTo and Enrollment illustrate this.

Property of Relations
Here are some properties of relations:


 * 1) Each attribute in a given relation is identified by a name.  While names within a relation must be unique, two relations in the same database can have one or more attributes with the same names.
 * 2) Each attribute has an associated type from which its values come.  Among the most common types used in relational databases are :
 * Integers
 * Reals (which may be further classified as floating point, or double precision)
 * Fixed-length character strings
 * Variable-length character strings
 * Dates
 * 1) Each relation must have one or more attributes whose combined values are sufficient to identify each tuple in the relation.  Such a group of attributes in known as a primary key, or simply the relation’s key.
 * It is not uncommon for there to be several groups of attributes, each of which could serve as a primary key. Each such group is known as a candidate key.
 * It is usually easiest to work with single-attribute keys. Multi-attribute keys are known as composite keys.
 * 1) Some relations may have an attribute whose values are expected to match those of the primary key from another relation.  Such an attribute is known as a foreign key.

How to identify candidate, and foreign keys for a relation

 * 1) If the meaning of each attribute in a relation is “well-understood,” it may be possible to identify candidate and foreign keys based on this alone.
 * 2) If sample data is available “that is representative of all possible data values the relation may hold” it may be possible to identify candidate and foreign keys based on this.
 * Caution: It is important to keep in mind that while a sample of a relation can always rule out an attribute, or set of attributes, as a primary or candidate key it is risky to attempt to designate candidate keys based on sample data alone.  It is essential to have another way to confirm such designations.  Having said this, we will shortly violate this canon where we infer primary, candidate, and foreign key  for the tables in the examples of databases we gave earlier.  We do this only to help you become familiar with the concepts.
 * 1) Candidate and foreign keys may be identified from narratives that are available about the database or its data.

Example: Use the instances of the tables in the two examples of databases given earlier to infer possible primary keys, candidate keys, and foreign keys. (Note, we say “possible” here because it may be the case that our relations simply do not have enough data to indicate otherwise).

Employee-Project-AssignedTo database (EPA)
 * Candidate Keys
 * Foreign Keys

Student-Class-Enrollment database (SCE)
 * Candidate Keys
 * Foreign Keys

How to Choose a Primary Key from Among Several Candidate Keys
When faced with the prospect of designating a primary key from among several candidate keys, the following guidelines may be helpful:
 * 1) Choose the candidate key with fewer attributes.
 * * Rather than be forced to work with a composite key for a relation, database designers will commonly create artificial, single-attribute keys.
 * 1) Choose the candidate key that seems most "natural."
 * 2) Be mindful of privacy issues.  For example, in the U.S., Social Security numbers should not be used as identifiers in relations whose focus is people.

Identifying versus Non-identifying
An identifying relationship is when the existence of a row in a child table depends on a row in a parent table. This may be confusing because it's common practice these days to create a pseudokey for a child table, but not make the foreign key to the parent part of the child's primary key. Formally, the "right" way to do this is to make the foreign key part of the child's primary key. But the logical relationship is that the child cannot exist without the parent.

Example: A Person has one or more phone numbers. If they had just one phone number, we could simply store it in a column of Person. Since we want to support multiple phone numbers, we make a second table PhoneNumbers, whose primary key includes the person_id referencing the Person table.

We may think of the phone number(s) as belonging to a person, even though they are modeled as attributes of a separate table. This is a strong clue that this is an identifying relationship (even if we don't literally include person_id in the primary key of PhoneNumbers).

A non-identifying relationship is when the primary key attributes of the parent must not become primary key attributes of the child. A good example of this is a lookup table, such as a foreign key on Person.state referencing the primary key of States.state. Person is a child table with respect to States. But a row in Person is not identified by its state attribute. I.e. state is not part of the primary key of Person.

A non-identifying relationship can be optional or mandatory, which means the foreign key column allows NULL or disallows NULL, respectively.

(from http://dba.stackexchange.com/questions/9844/whats-the-difference-between-identifying-and-non-identifying-relationships)

Implementations of a Relational Database in a Relational Database Management System
In this section we describe two methods by which databases may be created in a relational database management system. The first method assumes there is a user interface for the database management system that allows one to define the database. The second approach relies on a data definition language (DDL) through which a user can issue commands to create the database.

Database Schemas
A database’s schema simply refers to the organization of the database, including the set of relations, and for each relation the names of the attributes, its primary key, and any foreign keys. For our purposes we give the organization of each table in the following format

TableName(att1, att2, att3,…attn)

The primary key attribute(s) will be underlined, and any foreign keys will appear in bold italics. In the latter case we shall assume that the attribute being referenced by a foreign key is apparent from the foreign key’s name, although of course this may not always be the case. We give n`2o type information in the schema. Full details on each table’s structure, including attribute types and a precise accounting of foreign key references, will come from separate documentation.

Example: We give a schema for a simple relational database, UNIVERSITY, to represent some data commonly used for colleges or universities.

STUDENT( SID, LastName, FirstName, Major, Rank, GPA, AdvisorID)

FACULTY( FID, LastName, FirstName, Rank, Tenured, DepartmentID)

DEPARTMENT( DID, DeptName, OfficeLocation, ChairID)

COURSE( CID, Dept, Title, CreditHrs)

SECTION( SectID, CourseID, SectionNumber, Semester, Year, InstructorID)

CLASS( SectID, StudentID , Grade)

Note that in the relation CLASS the attributes SectID and StudentID are both components of a compound key and foreign keys.

We will be using MySQL workbench as our user interface.

Defining a Database Using the Data Definition Features of SQL
SQL (“structured query language,” pronounced S-Q-L by some, and “sequel” by others) is a language developed for relational database systems that has become the de facto standard for working with relational databases. SQL has one set of constructs for data definition, which includes defining tables, and has another set of constructs for data manipulation, which mostly encompasses extracting information from the database, but which also involves inserting, deleting, or modifying the data already in the database (the data manipulation/query features of the language).

Fundamental Data Types in SQL
The main data types available for attributes in SQL include numeric, character string, date and time types:
 * Numeric types include various sizes of integers (INTEGER, INT, or SMALLINT), various precisions of real numbers (FLOAT, REAL, and DOUBLE PRECISION), and formatted numbers (DECIMAL(i,j), DEC(i,j), or NUMERIC(i,j), where i, the precision, is the total number of decimal digits and j, the scale, is the number of digits after the decimal point. The default scale is 0).
 * Character string data types include fixed length character strings (CHAR(n) or CHARACTER(n), where n is the number of characters), or varying –length((VARCHAR(n), or CHAR VARYING(n), or CHARACTER VARYING(n), where n is the maximum length).
 * For date and time types, the type DATE has ten positions in the form YYYY-MM-DD and the type TIME ahs eight positions in the form HH:MM:SS. There are other types for working with dates and times, but these two will be sufficient for our purposes.  There is also a combined DATETIME type.

Table Definition in SQL
The CREATE TABLE construct is used to specify relations in SQL. In its simplest form one specifies the name of the relation/table and gives its attributes (name and type) as follows:

CREATE TABLE relation-name ( attribute1 type,  attribute2  type, ...  attributen  type );

Example: Consider the relation Student with the table schema given in the example above. We can specify this table in SQL via the statement

CREATE TABLE Student ( SID		CHAR(8),  LastName		VARCHAR(20),  FirstName		VARCHAR(15),  Major		CHAR(4),  Rank		CHAR(2),  GPA		DECIMAL(4,3),  AdvisorID		CHAR(8) );

Attribute and Table Constraints The specification of primary keys, of attributes that do not allow null values, and of foreign keys is done by associating constraints with either an attribute or a table. Later we shall see how to incorporate even more general constraints into a relational model.


 * 1) Attribute constraints:  The following constraints can be applied to an attribute as it is specified:
 * 2) NOT NULL constraint: Since SQL allows null values, a constraint NOT NULL can be specified for an attribute if the attribute should not be allowed to have the value null.  For example, in the table declaration for STUDENT we can specify that the attribute SID not allow null values as follows
 * SID	CHAR(8)  NOT NULL,
 * 1) Default Values:  When an attribute is being specified, one can also specify a default value to be included in any tuple if an explicit value is not provided for the attribute.  If no default value is specified, a default value of null is used. For example, we can specify that GPA be given a default value of 0.000 as follows
 * GPA	DECIMAL(4,3)  DEFAULT 0.000,
 * In cases where a single attribute can serve as a primary key or candidate key, one can also append one of the following constraints, as appropriate
 * 1) PRIMARY KEY constraint: Note, this presumes NOT NULL. In the table declaration for STUDENT we could designate the attribute SID as the table’s primary key as follows
 * SID	CHAR(8)  PRIMARY KEY,
 * Although this is a viable way for designating single-attribute primary keys, the more common way to do is by means of a table constraint, which we describe shortly.
 * 1) UNIQUE constraint: This indicates that no two tuples can have the same value on the associated attribute.  Unlike the primary key constraint, however, the attribute can assume the value null unless specifically excluded by the NOT NULL constraint.
 * SID	CHAR(8) UNIQUE  NOT NULL,
 * As with primary keys, the more common way to do is by means of a table constraint.
 * 1) Table constraints:  Table constraints are constraints that are specified after the attribute declarations.  Among the constraints that can be imposed in this way are primary and candidate key constraints and foreign key constraints.  Although not required, table constraints can be given a name so that it can be identified later in case it is necessary to drop the constraint and replace it with another.  As we suggested above, in general table constraints are more widely used (even where one attribute is involved).
 * 2) Primary key: The primary key for a relation can be specified with a primary key constraint, which assumes one of the following forms, depending on whether one wants to name the constraint or not
 * PRIMARY KEY(PKatt1,...,PKattn) or CONSTRAINT constraint-name PRIMARY KEY(PKatt1,...,PKattn)
 * 1) Candidate keys: A candidate key can be specified with a unique constraint
 * UNIQUE(AKatt1,...,AKattn) or CONSTRAINT constraint-name UNIQUE(AKatt1,...,AKattn)
 * As with the analogous attribute constraint, unique does not preclude the possibility of null values for any of the associated attributed unless specifically indicated with an accompanying NOT NULL attribute constraint.
 * 1) Foreign keys: With the foreign key constraint, one can designate a foreign key, the table and primary key it references.  Note, the table being referenced must already exist.  If this is not the case, then we must wait to add any foreign keys after the table being referenced has been created.
 * FOREIGN KEY (FKatt) REFERENCES table-name(att) or CONSTRAINT constraint-name FOREIGN KEY(FKatt) REFERENCES table-name(att)

Example: Consider the University database schema given earlier. We specify the relational model in SQL as follows (here we use table constraints over attribute constraints):

CREATE TABLE Department ( DID	CHAR(4),  DeptName	VARCHAR(30)	NOT NULL,  OfficeLocation	VARCHAR(50),  ChairID	CHAR(8),

PRIMARY KEY (DID) );

CREATE TABLE Faculty ( FID		CHAR(8),  LastName		VARCHAR(20)	NOT NULL,  FirstName		VARCHAR(15)	NOT NULL,  Rank		CHAR(4),  Tenured		CHAR(1),  DeptID		CHAR(4),

PRIMARY KEY (FID), FOREIGN KEY (DeptID) REFERENCES DEPARTMENT(DID) );

CREATE TABLE Student ( SID		CHAR(8),  LastName		VARCHAR(20)	NOT NULL,  FirstName		VARCHAR(15)	NOT NULL,  Major		CHAR(4),  Rank		CHAR(2),  GPA		DECIMAL(4,3),  AdvisorID		CHAR(8),

PRIMARY KEY (SID), FOREIGN KEY (AdvisorID) REFERENCES FACULTY(FID) );

CREATE TABLE Course ( CID	CHAR(8),  Dept	CHAR(4),  Title	VARCHAR(50),  CreditHrs	SMALLINT,

PRIMARY KEY (CID) );

CREATE TABLE Section ( SectID		INTEGER,  CourseID		CHAR(8)	NOT NULL,  SectionNumber	CHAR(3)	NOT NULL,  Semester		CHAR(4)	NOT NULL,  Year		CHAR(4)	NOT NULL,  InstructorID	CHAR(8),

PRIMARY KEY (SectID), UNIQUE (CourseID, SectionNumber, Semester, Year), FOREIGN KEY (CourseID) REFERENCES COURSE(CID), FOREIGN KEY (InstructorID) REFERENCES FACULTY(FID) );

CREATE TABLE Class ( SectID		INTEGER,  StudentID		CHAR(8),  Grade		CHAR(2),

PRIMARY KEY (SectID, StudentID),

FOREIGN KEY (SectID) REFERENCES SECTION(SectID), FOREIGN KEY (StudentID) REFERENCES STUDENT(SID) );

Note, we were not able to define the foreign key ChairID in the DEPARTMENT relation because the FACULTY table had not yet been created at the time that DEPARTMENT was created. Such constraints can be handled later, however using the

ALTER TABLE …ADD CONSTRAINT

construct of SQL. In this particular case we would write

ALTER TABLE DEPARTMENT ADD CONSTRAINT FK_DEPARTMENT_1 FOREIGN KEY (ChairID) REFERENCES FACULTY(FID)

You can find more information about the alter command at http://www.techonthenet.com/sql/tables/alter_table.php.

Entering Data in Tables

 * 1) Entering data via a user interface provided by the DBMS
 * 2) Entering data via SQL commands. In SQL this is accomplished by the INSERT command.  There are two forms of the INSERT command that are germane to us here:

To add a single tuple to a relation by giving the name of the relation that is to receive the tuple and the complete list of values. The values must be presented in the same order as the one in which the corresponding attributes were listed in the CREATE TABLE command that created the relation. The form of the INSERT command is INSERT INTO table-name VALUES (value1,...,valuen)

We use the keyword NULL to indicate a NULL value.

To add a single tuple to a relation by directly associating attributes and their value. In this case we use the the form

INSERT INTO table-name (attribute-name1,...,attribute-namen) VALUES (value1,...,valuen)

Any attributes not specified in this form are set to their default value or to null.

Examples: 

We will use the University database defined earlier in this section.

1. Add a course DISC210 with the title “Dataset Organization and Management” with 3 credits and associated with the DISC program.

INSERT INTO COURSE VALUES ('DISC210′, ′DISC′, ′Dataset Organization and Management′,3)

2. Add a 3-credit course DISC220 associated with the DISC program but with a title as yet to be determined.

INSERT INTO COURSE VALUES (′DISC220′, ′DISC′,NULL,3)

Or,

INSERT INTO COURSE(CID, Dept, CreditHours) VALUES (′DISC220′, ′DISC′,3)

Or even,

INSERT INTO COURSE(CreditHours, CID, Dept) VALUES (3,′DISC220′, ′DISC′)

Modifying the Tuples in a Table
When we talk about modifying the tuples in a database, we mean one of two operations:

1. Deleting tuples from a relation:  In SQL this is accomplished by the DELETE command. This has only one form

DELETE FROM table-name WHERE condition

It will remove from the specified table all tuples whose attribute values satisfy the given condition (be careful when you use it!).

2. Changing the values of attributes of tuples in the database. In SQL this is accomplished by the UPDATE command. There is one form for this command:

UPDATE table-name SET att1 = value1, att2 = value2,...,attp = valuep WHERE condition

Each tuple in the given table who attribute values satisfy the given condition will have the present values of the specified attributes changed to those given. It is possible to designate null or default as a new value for an attribute.

Examples:

1. Delete all courses associated with the DISC department. DELETE FROM COURSE WHERE Dept=’DISC’

2. Change the affiliation of all DISC courses from DISC to CSCI. UPDATE COURSE SET Dept=’CSCI’ WHERE Dept=’DISC’

Integrity Constraints and the Relational Data Model
In this section we introduce the notion of integrity constraints into relational model. Integrity constraints are rules that the data in a data model are expected to obey. Primarily the constraints are imposed to ensure that the data in the database does not become inconsistent as new data are introduced into the database, as data are removed from the database, or as data are modified.

Key Constraints and Entity Integrity
As we noted earlier, the relational model requires every relation to have at least one candidate key. Such a condition is known as a key constraint. More precisely, in order to satisfy the key constraint a relation must have a set of attributes such that:


 * 1) Two distinct tuples cannot have identical values in all the attributes of the candidate key.
 * 2) No subset of the set of attributes in a key can also serve as a candidate key.

Key constraints serve two purposes:
 * 1) Relations are mathematical sets and sets do not allow duplicate values. The key constraint assures us that relations will reflect this property of sets.
 * 2) Relational data models are used to represent abstractions of real situations; in particular, the tuples in some relations are going to represent objects from that real situation.  The key constraint assures us that the relational model is capable of distinguishing between objects.

As we've seen, a relation may have several candidate keys and in principle any candidate key can be used to single out a particular tuple. In the relational model, however, one of these candidate keys must be selected to serve as the primary key of the relation. This allows the RDBMS to expect tuples to be accessed via this key and to optimize its storage and retrieval operations based on this. In order to fully realize the role of a primary key as a tuple identifier, however, there is another constraint that must be in effect

Entity Integrity Constraint: No attribute of a primary key can accept null values.
 * An attribute of a primary key is called a prime attribute.

As we have already noted, the value null is a special value for an attribute in a tuple that indicates that the value for that attribute of a tuple is either not known yet or that a value is not applicable for the attribute for a given tuple. Since all tuples of a given relation must be distinguishable from one another, and since primary keys are supposed to provide this unique identification in the relational data model, the entity integrity constraint guarantees that the primary key will serve this function. Without such a rule, it would be permissible to store data for a tuple, but not provide enough data to distinguish the tuple other tuples.

Example: Consider the relation Student with attributes SID, StName, Class, and GPA. Here SID is assumed to be its primary key. Without the entity integrity constraint the primary key SID could accept null values; it is then possible for Student to contain tuples such as the following:

101 	Jones 	Senior 	3.620 _	Smith 	Sophomore 	1.653 109 	Allen 	Junior 	2.587 _ 	Smith 	Senior 	4.000 _ 	Smith 	Sophomore 	1.653

Foreign Keys and Referential Integrity
An attribute (or set of attributes) FK in a relation R1 is a foreign key if it satisfies both of the following conditions:
 * 1) (Each of the attributes in) FK has the same domain as (one of the attributes of) the primary key, PK, of another relation R2.
 * 2) A value of (an attribute of) FK in a tuple t1 of R1 either occurs as the value of (an associated attribute of) PK for some tuple t2 in R2, or is null.  In the first case we say that the tuple t1 references the tuple t2.
 * In the relational model the first condition for a foreign key can be enforced by requiring that the domains of the attributes of the foreign key and the domains of the attributes of the primary key of the relation being referenced be the same. To enforce the second condition, however, we have the following constraint:
 * Referential Integrity Constraint: Given two relations R and S, suppose R has a foreign key that references the primary key of S. Then the value of the foreign key of a tuple of R must be equal to the value of the primary key of a tuple of S or else be (entirely) null.
 * This integrity rule is important since it implicitly requires that one specify the actions that must be taken when a tuple whose primary key value is the object of a foreign key reference is updated.
 * Example: In our example above using the Department and Faculty relations, any update that relates a tuple in the Department relation to a Faculty tuple with a non-matching FID value should be rejected. Furthermore, if an attempt is made to delete a tuple from the Faculty relation when that tuple is the object of a foreign key, reference steps must be taken to maintain the integrity of the database. These could include:
 * 1) Delete all tuples containing references to the deleted tuple (which might in turn lead to other tuples being deleted, etc., resulting in a “cascading” deletion.)
 * 2) Do not allow the deletion to take place.
 * 3) Delete the tuple and change the value of any affected foreign keys to null.

The choice of which option to use normally depends on the application.

Referential Integrity and SQL
When foreign keys are created using SQL, there is an option to specify what actions should be taken when modifications are made to the value of an associated primary key.

FOREIGN KEY(FKatt) REFERENCES table-name(att) ON DELETE delete-action ON UPDATE update-action

Here are some of the actions that can be used for a referential integrity violation caused by a delete or update. Note, not all DBMSs support these actions, and others may add some of their won.

SET NULL if a referenced tuple is deleted or the value of its primary key is changed, change the value of the foreign key to null for all referencing tuples

SET DEFAULT if a referenced tuple is deleted or the value of its primary key is changed, change the value of the foreign key its default value for all referencing tuples

CASCADE	if a referenced tuple is deleted, delete all the referencing tuples;

if the value of the primary key of a referenced primary key is changed, change the value of the foreign key to the new key value for all referencing tuples

= Further Reading = http://en.wikipedia.org/wiki/Database_management_system

http://en.wikipedia.org/wiki/Relational_database