ER Model, when conceptualized into diagrams, gives a good overview of entity-relationship, which is easier to understand. ER diagrams can be mapped to relational schema, that is, it is possible to create relational schema using ER diagram. We cannot import all the ER constraints into relational model, but an approximate schema can be generated.
There are several processes and algorithms available to convert ER Diagrams into Relational Schema. Some of them are automated and some of them are manual. We may focus here on the mapping diagram contents to relational basics.
ER diagrams mainly comprise of −
Entity and its attributes
Relationship, which is association among entities.
Mapping Entity
An entity is a real-world object with some attributes.
Mapping Process (Algorithm)
Create table for each entity.
Entity’s attributes should become fields of tables with their respective data types.
Declare primary key.
Mapping Relationship
A relationship is an association among entities.
Mapping Process
Create table for a relationship.
Add the primary keys of all participating Entities as fields of table with their respective data types.
If relationship has any attribute, add each attribute as field of table.
Declare a primary key composing all the primary keys of participating entities.
Declare all foreign key constraints.
Mapping Weak Entity Sets
A weak entity set is one which does not have any primary key associated with it.
Mapping Process
Create table for weak entity set.
Add all its attributes to table as field.
Add the primary key of identifying entity set.
Declare all foreign key constraints.
Mapping Hierarchical Entities
ER specialization or generalization comes in the form of hierarchical entity sets.
Mapping Process
Create tables for all higher-level entities.
Create tables for lower-level entities.
Add primary keys of higher-level entities in the table of lower-level entities.
In lower-level tables, add all other attributes of lower-level entities.
Declare primary key of higher-level table and the primary key for lower-level table.
In relational algebra, several operators are essential due to their unique functionalities. One such operator is the division operator, symbolized by “÷“. This operator is relatively complex but plays a crucial role in solving queries that involve the “all” condition. While many relational algebra operations focus on combining or filtering data, the division operation identifies tuples in one relation that are associated with every tuple in another.
In this chapter, we’ll explore the concept of the division operator in detail, understand its theoretical foundation, and walk through practical examples to grasp its application.
Basics of the Division Operator
The division operation applies to two relations −
Numerator Relation (R) − Represents the main dataset and contains the superset of possible combinations.
Denominator Relation (S) − Represents the subset or the set of conditions that must be satisfied.
The result is a relation containing only those tuples from the numerator that are associated with every tuple in the denominator.
Attributes in Relations
Let’s define the attributes involved in the division operator −
R(Z) is the numerator, where Z = X ∪ Y
S(X) is the denominator
T(Y) is the result
Here, T contains the attributes in R that are not in S. For a tuple to appear in T, it must pair with every tuple in S within R.
To better understand the concept, consider a real-world scenario. Suppose we are organizing a workshop. Participants (R) are linked to sessions (X). We want to find those participants who attended all required sessions (S). The division operator helps identify these individuals by evaluating the “all sessions attended” condition.
Step-by-Step Explanation of the Division Operator
The division operation ensures that each result tuple in T(Y) must appear in R(Z) in combination with every tuple in S(X). Any tuple in R that fails to match all tuples in S is excluded from the result.
Mathematical Representation
The division operator can be expressed using a combination of projection, Cartesian product, and difference −
T1 − Identify all potential result tuples: T1 ← πY(R)
T2 − Find tuples in T1 that do not satisfy the pairing condition with S: T2 ← πY((S × T1) − R)
Final Result − Subtract the unsatisfying tuples from the potential results. T ← T1 − T2
Example: Employees Working on All Projects
Let us take an example to get a clear understanding of how the division operator works.
Query − Find the names of employees who work on all projects that “John Smith” works on.
Relations Involved − Following are the relations involved in this query –
WORKS_ON − Contains tuples of employee IDs (Essn) and project numbers (Pno).
Essn
Pno
Hours
123456789
1
32.5
123456789
2
7.5
666884444
3
40.0
453453453
1
20.0
453453453
2
20.0
333445555
2
10.0
333445555
3
10.0
333445555
10
10.0
333445555
20
10.0
999887777
30
30.0
999887777
10
10.0
987987987
10
35.0
987987987
30
5.0
987654321
30
20.0
987654321
20
15.0
888665555
20
NULL
EMPLOYEE − Contains personal details like Fname, Lname, and Ssn.
Fname
Minit
Lname
Ssn
Bdate
Address
Sex
Salary
Super_ssn
Dno
John
B
Smith
123456789
1965-01-09
731 Fondren, Houston, TX
M
30000
333445555
5
Franklin
T
Wong
333445555
1955-12-08
638 Voss, Houston, TX
M
40000
888665555
5
Alicia
J
Zelaya
999887777
1968-01-19
3321 Castle, Spring, TX
F
25000
987654321
4
Jennifer
S
Wallace
987654321
1941-06-20
291 Berry, Bellaire, TX
F
43000
888665555
4
Ramesh
K
Narayan
666884444
1962-09-15
975 Fire Oak, Humble, TX
M
38000
333445555
5
Joyce
A
English
453453453
1972-07-31
5631 Rice, Houston, TX
F
25000
333445555
5
Ahmad
V
Jabbar
987987987
1969-03-29
980 Dallas, Houston, TX
M
25000
987654321
4
James
E
Borg
888665555
1937-11-10
450 Stone, Houston, TX
M
55000
NULL
1
Now, let’s understand step-by-step how to find the names of employees who work on all projects that “John Smith” works on.
Retrieve John Smith’s Projects
Start by filtering the projects linked to “John Smith.”
SMITH←σFname=′John′ANDLname=′Smith′(EMPLOYEE)
Fname
Minit
Lname
Ssn
Bdate
Address
Sex
Salary
Super_ssn
Dno
John
B
Smith
123456789
1965-01-09
731 Fondren, Houston, TX
M
30000
333445555
5
SMITHPNOS←πPno(WORKSON⋈Essn=SsnSMITH)
Pno
1
2
This gives a relation SMITH_PNOS containing all project numbers “John Smith” is assigned to.
Create All Employee-Project Relationships
Extract a relation showing all employees and their associated projects.
SSNPNOS←πEssn,Pno(WORKSON)
Essn
Pno
123456789
1
123456789
2
666884444
3
453453453
1
453453453
2
333445555
2
333445555
3
333445555
10
333445555
20
999887777
30
999887777
10
987987987
10
987987987
30
987654321
30
987654321
20
888665555
20
Apply Division
Use the division operator to find employees whose project assignments cover all projects in SMITH_PNOS.
SSNS(Ssn)←SSNPNOS÷SMITHPNOS
Ssn
123456789
453453453
Map Employee IDs to Names
Finally, retrieve the names of these employees.
RESULT←πFname,Lname(SSNS⋈EMPLOYEE)
Fname
Lname
John
Smith
Joyce
English
Final output − The final relation contains the names of employees who work on all the projects that “John Smith” works on.
Generalized Example: Products and Suppliers
Scenario − A store sells products (R) supplied by various suppliers (S). We want to identify products available from all suppliers.
Input Relations
R − Contains ProductID and SupplierID
S − Contains just SupplierID
Let’s go through its steps.
Extract potential products −
T1←πProductID(R)
Identify mismatched products −
T2←πProductID((S×T1)−R)
Subtract mismatches from potential products −
T←T1−T2
Output − If only certain products are supplied by every supplier, those product IDs will appear in the result relation T.
Key Observations
Handling Universal Quantification − The division operator is useful for queries that require “for all” conditions. This is like finding students enrolled in all mandatory courses or employees assigned to every project in a list.
Limitations in SQL − SQL does not directly support the division operator. However, similar results can be achieved using NOT EXISTS or complex joins. Although these approaches may lack the power of relational algebra.
Practical Relevance − While it is theoretically significant, division is less common in practical database management systems due to its complexity and rare use cases.
Additional Applications
Matching Preferences − Find customers whose preferences match every feature of a product.
Cross-Department Participation − Identify employees involved in activities across all departments.
We understand the benefits of taking a Cartesian product of two relations, which gives us all the possible tuples that are paired together. But it might not be feasible for us in certain cases to take a Cartesian product where we encounter huge relations with thousands of tuples having a considerable large number of attributes.
Join is a combination of a Cartesian product followed by a selection process. A Join operation pairs two tuples from different relations, if and only if a given join condition is satisfied.
We will briefly describe various join types in the following sections.
Theta (θ) Join
Theta join combines tuples from different relations provided they satisfy the theta condition. The join condition is denoted by the symbol θ.
Notation
R1 ⋈θ R2
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that the attributes dont have anything in common, that is R1 ∩ R2 = Φ.
Theta join can use all kinds of comparison operators.
Student
SID
Name
Std
101
Alex
10
102
Maria
11
Subjects
Class
Subject
10
Math
10
English
11
Music
11
Sports
Student_Detail −
STUDENT ⋈Student.Std = Subject.Class SUBJECT
Student_detail
SID
Name
Std
Class
Subject
101
Alex
10
10
Math
101
Alex
10
10
English
102
Maria
11
11
Music
102
Maria
11
11
Sports
Equijoin
When Theta join uses only equality comparison operator, it is said to be equijoin. The above example corresponds to equijoin.
Natural Join (⋈)
Natural join does not use any comparison operator. It does not concatenate the way a Cartesian product does. We can perform a Natural Join only if there is at least one common attribute that exists between two relations. In addition, the attributes must have the same name and domain.
Natural join acts on those matching attributes where the values of attributes in both the relations are same.
Courses
CID
Course
Dept
CS01
Database
CS
ME01
Mechanics
ME
EE01
Electronics
EE
HoD
Dept
Head
CS
Alex
ME
Maya
EE
Mira
Courses ⋈ HoD
Dept
CID
Course
Head
CS
CS01
Database
Alex
ME
ME01
Mechanics
Maya
EE
EE01
Electronics
Mira
Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join includes only those tuples with matching attributes and the rest are discarded in the resulting relation. Therefore, we need to use outer joins to include all the tuples from the participating relations in the resulting relation. There are three kinds of outer joins − left outer join, right outer join, and full outer join.
Left Outer Join(R S)
All the tuples from the Left relation, R, are included in the resulting relation. If there are tuples in R without any matching tuple in the Right relation S, then the S-attributes of the resulting relation are made NULL.
Left
A
B
100
Database
101
Mechanics
102
Electronics
Right
A
B
100
Alex
102
Maya
104
Mira
Courses HoD
A
B
C
D
100
Database
100
Alex
101
Mechanics
—
—
102
Electronics
102
Maya
Right Outer Join: ( R S )
All the tuples from the Right relation, S, are included in the resulting relation. If there are tuples in S without any matching tuple in R, then the R-attributes of resulting relation are made NULL.
Courses HoD
A
B
C
D
100
Database
100
Alex
102
Electronics
102
Maya
—
—
104
Mira
Full Outer Join: ( R S)
All the tuples from both participating relations are included in the resulting relation. If there are no matching tuples for both relations, their respective unmatched attributes are made NULL.
In relational algebra we use several set theory operations. Since relational model is based on set theory it borrows several concepts from set theory as well in respect to the operations. These include UNION, INTERSECTION, MINUS (also called SET DIFFERENCE), and also the CARTESIAN PRODUCT. In this article, we will see down each operator and work through examples for a better understanding.
Let us see the following two tables that we will focus in the next examples −
In relational algebra, several operations are derived from set theory. Since the relational model is based on set theory, it adopts key concepts such as UNION, INTERSECTION, MINUS (also called SET DIFFERENCE), and CARTESIAN PRODUCT.
In this chapter, we will examine each of these operators with practical examples for better understanding. Let’s begin by reviewing two tables that will be used in the upcoming examples −
EMPLOYEE Table
Fname
Minit
Lname
Ssn
Bdate
Address
Sex
Salary
Super_ssn
Dno
John
B
Smith
123456789
1965-01-09
731 Fondren, Houston, TX
M
30000
333445555
5
Franklin
T
Wong
333445555
1955-12-08
638 Voss, Houston, TX
M
40000
888665555
5
Alicia
J
Zelaya
999887777
1968-01-19
3321 Castle, Spring, TX
F
25000
987654321
4
Jennifer
S
Wallace
987654321
1941-06-20
291 Berry, Bellaire, TX
F
43000
888665555
4
Ramesh
K
Narayan
666884444
1962-09-15
975 Fire Oak, Humble, TX
M
38000
333445555
5
Joyce
A
English
453453453
1972-07-31
5631 Rice, Houston, TX
F
25000
333445555
5
Ahmad
V
Jabbar
987987987
1969-03-29
980 Dallas, Houston, TX
M
25000
987654321
4
James
E
Borg
888665555
1937-11-10
450 Stone, Houston, TX
M
55000
NULL
1
DEPENDENT Table
Essn
Dependent_name
Sex
Bdate
Relationship
333445555
Alice
F
1986-04-05
Daughter
333445555
Theodore
M
1983-10-25
Son
333445555
Joy
F
1958-05-03
Spouse
987654321
Abner
M
1942-02-28
Spouse
123456789
Michael
M
1988-01-04
Son
123456789
Alice
F
1988-12-30
Daughter
123456789
Elizabeth
F
1967-05-05
Spouse
Set Theory Basics in Relational Algebra
In set theory, each element in a set is unique, and operations like UNION and INTERSECTION combine or compare these elements systematically. Relational algebra applies similar principles by treating each row in a table as a set element.
However, relational tables have more structure than simple sets, so additional rules apply −
Union Compatibility − For operations like UNION and INTERSECTION, the tables must have the same number of attributes, and each attribute must have the same data type.
Duplicate Elimination − These operations inherently remove duplicates, as relational algebra operates on sets, not multisets.
The UNION Operator
The UNION operator combines rows from two relations, producing a result with all unique rows present in either relation-similar to merging two lists while removing duplicates.
Syntax and Representation
In relational algebra, UNION is represented as −
R ∪ S
Here, R and S are two relations (tables) that must be union compatible.
Combining Employee and Manager IDs
Let us consider we want a find of all employees who either work in department 5 or supervise someone in department 5. We use UNION to merge these groups −
Retrieve Social Security numbers (SSN) of employees in department 5 −
RESULT1 ← πSsn(σDno =5(EMPLOYEE))
Ssn
123456789
333445555
666884444
453453453
Retrieve SSNs of managers supervising department 5 employees −
RESULT2 ← πSuper_ssn(σDno =5(EMPLOYEE))
Super_ssn
333445555
888665555
333445555
333445555
Combine both lists −
RESULT ← RESULT1 ∪ RESULT2
Ssn
123456789
333445555
666884444
453453453
333445555
888665555
333445555
333445555
The final table will include SSNs of all employees who meet either condition. For instance, if RESULT1 contains {123, 456} and RESULT2 contains {456, 789}, the UNION result is {123, 456, 789}.
The INTERSECTION Operator
The INTERSECTION operator identifies rows present in both relations. It is like finding the overlap between two groups.
Syntax and Representation
INTERSECTION is denoted as −
R ∩ S
Finding SSN which are common in manager and employee
Consider the table of SSN of employees who work for department 1 and 5 −
RESULT1 ← πSsn(σDno =5^ Dno =1(EMPLOYEE))
Ssn
123456789
333445555
666884444
453453453
888665555
Retrieve SSNs of managers supervising department 5 employees −
RESULT2 ← πSuper_ssn(σDno =5(EMPLOYEE))
Super_ssn
333445555
888665555
333445555
333445555
The common SSN values are −
RESULT ← RESULT1 ∩ RESULT2
Ssn
333445555
888665555
The MINUS (SET DIFFERENCE) Operator
The MINUS operator, also called SET DIFFERENCE, is used to remove rows in one relation that appear in another. It essentially answers the question, “What is in one table but not the other?”
Syntax and Representation
In relational algebra, MINUS is represented as −
R − S
SSN of employees who are not manager
Consider the table of SSN of employees who work for department 1 and 5 −
RESULT1 ← πSsn(σDno =5^ Dno =1(EMPLOYEE))
Ssn
123456789
333445555
666884444
453453453
888665555
Retrieve SSNs of managers supervising department 5 employees −
RESULT2 ← πSuper_ssn(σDno =5(EMPLOYEE))
Super_ssn
333445555
888665555
333445555
333445555
The SSN of non-managers are −
RESULT ← RESULT1 − RESULT2
Ssn
123456789
666884444
453453453
The CARTESIAN PRODUCT (CROSS PRODUCT) Operator
The CARTESIAN PRODUCT operator combines every row of one table with every row of another. This operation produces all possible combinations of rows between the two tables. This can result in a large and often unwieldy table.
Syntax and Representation
In relational algebra, CARTESIAN PRODUCT is represented as −
R × S
Combining Employees with Dependents
Suppose we have a table EMPLOYEE and another table DEPENDENT. To pair every employee with every dependent −
EMP_DEPENDENTS ← EMPLOYEE × DEPENDENT
The resulting table will include every possible combination of rows from EMPLOYEE and DEPENDENT. However, this raw result is rarely useful on its own. Usually, we filter the combined rows using a SELECT operation to create meaningful connections.
Fname
Lname
Ssn
Essn
Dependent_name
Sex
Bdate
…
Alicia
Zelaya
999887777
333445555
Alice
F
1986-04-05
…
Alicia
Zelaya
999887777
333445555
Theodore
M
1983-10-25
…
Alicia
Zelaya
999887777
333445555
Joy
F
1958-05-03
…
Alicia
Zelaya
999887777
987654321
Abner
M
1942-02-28
…
Alicia
Zelaya
999887777
123456789
Michael
M
1988-01-04
…
Alicia
Zelaya
999887777
123456789
Alice
F
1988-12-30
…
Alicia
Zelaya
999887777
123456789
Elizabeth
F
1967-05-05
…
Jennifer
Wallace
987654321
333445555
Alice
F
1986-04-05
…
Jennifer
Wallace
987654321
333445555
Theodore
M
1983-10-25
…
Jennifer
Wallace
987654321
333445555
Joy
F
1958-05-03
…
Jennifer
Wallace
987654321
987654321
Abner
M
1942-02-28
…
Jennifer
Wallace
987654321
123456789
Michael
M
1988-01-04
…
Jennifer
Wallace
987654321
123456789
Alice
F
1988-12-30
…
Jennifer
Wallace
987654321
123456789
Elizabeth
F
1967-05-05
…
Joyce
English
453453453
333445555
Alice
F
1986-04-05
…
Joyce
English
453453453
333445555
Theodore
M
1983-10-25
…
Joyce
English
453453453
333445555
Joy
F
1958-05-03
…
Joyce
English
453453453
987654321
Abner
M
1942-02-28
…
Joyce
English
453453453
123456789
Michael
M
1988-01-04
…
Joyce
English
453453453
123456789
Alice
F
1988-12-30
…
Joyce
English
453453453
123456789
Elizabeth
F
1967-05-05
…
The table is truncated for space otherwise it will be much larger.
Sequences and Combined Operations
Relational algebra becomes useful when we combine operators. Sometimes, the result of one operation feeds into another. This is allowing us to answer complex queries.
Female Employees and Their Dependents
To find dependents of female employees −
Filter female employees −
FEMALE_EMPS ← σSex=′F′(EMPLOYEE)
Retrieve names and SSNs of female employees −
EMP_NAMES ← πFname,Lname,Ssn(FEMALE_EMPS)
Combine with the DEPENDENT table −
EMP_DEPENDENTS ← EMP_NAMES × DEPENDENT
Match dependents to employees by SSN −
ACTUAL_DEPENDENTS ← σSsn=Essn(EMP_DEPENDENTS)
Extract names of employees and their dependents −
πFname,Lname,Dependent_name(ACTUAL_DEPENDENTS)
This sequence demonstrates how CARTESIAN PRODUCT and SELECT work together to filter meaningful relationships from raw combinations.
SQL Equivalents of Set Theory Operators
These relational operators have direct counterparts in SQL −
UNION − Combines results from two queries, removing duplicates unless UNION ALL is used.
INTERSECT − Retrieves rows common to both queries.
EXCEPT (or MINUS) − Retrieves rows in one query but not the other.
CROSS JOIN − Implements CARTESIAN PRODUCT in SQL.
Example in SQL
For the UNION operation −
SELECT Ssn
FROM EMPLOYEE
WHERE Dno =5UNIONSELECT Super_ssn
FROM EMPLOYEE
WHERE Dno =5;
The relational model plays a key role in organizing and managing data. At its core lies relational algebra, a set of operations that enables users to efficiently manipulate and query data. Among these are unary operations such as SELECT and PROJECT, which are essential tools for filtering and organizing data.
In this chapter, we will explore these two unary relational operations (SELECT and PROJECT), understand how they work, and provide examples to make it easier for you to understand.
In all the examples throughout this chapter, we will use the following Sample Data Table called EMPLOYEE:
Fname
Minit
Lname
Ssn
Bdate
Address
Sex
Salary
Super_ssn
Dno
John
B
Smith
123456789
1965-01-09
731 Fondren, Houston, TX
M
30000
333445555
5
Franklin
T
Wong
333445555
1955-12-08
638 Voss, Houston, TX
M
40000
888665555
5
Alicia
J
Zelaya
999887777
1968-01-19
3321 Castle, Spring, TX
F
25000
987654321
4
Jennifer
S
Wallace
987654321
1941-06-20
291 Berry, Bellaire, TX
F
43000
888665555
4
Ramesh
K
Narayan
666884444
1962-09-15
975 Fire Oak, Humble, TX
M
38000
333445555
5
Joyce
A
English
453453453
1972-07-31
5631 Rice, Houston, TX
F
25000
333445555
5
Ahmad
V
Jabbar
987987987
1969-03-29
980 Dallas, Houston, TX
M
25000
987654321
4
James
E
Borg
888665555
1937-11-10
450 Stone, Houston, TX
M
55000
NULL
1
Understanding the SELECT Operation
The SELECT operation allows us to retrieve a subset of rows (or tuples) from a table (or relation) that satisfy a specific condition. It acts like a filter. It is keeping only the rows that meet the specified criteria while discarding the rest. The resulting table keeps all the original attributes of the filtered rows.
Syntax and Representation
In relational algebra, the representation of SELECT operation is very simple:
σcondition(R)
Here:
σ (sigma): Symbolizes the SELECT operation
condition: Specifies the criteria to filter the rows
R: Name of the relation or table being operated upon
Example: Filtering by Department Number
Suppose we have a table named EMPLOYEE with attributes such as department number (Dno) and salary.
To retrieve the rows for employees in department 4, we write:
ÏDno=4(EMPLOYEE)
It will fetch the following records â
Fname
Minit
Lname
Ssn
Bdate
Address
Sex
Salary
Super_ssn
Dno
Alicia
J
Zelaya
999887777
1968-01-19
3321 Castle, Spring, TX
F
25000
987654321
4
Jennifer
S
Wallace
987654321
1941-06-20
291 Berry, Bellaire, TX
F
43000
888665555
4
Ahmad
V
Jabbar
987987987
1969-03-29
980 Dallas, Houston, TX
M
25000
987654321
4
Example: Filtering by Salary
To retrieve employees earning more than $30,000, we can write the query as follows:
σSalary>30000(EMPLOYEE)
It will fetch the following records â
Fname
Minit
Lname
Ssn
Bdate
Address
Sex
Salary
Super_ssn
Dno
Franklin
T
Wong
333445555
1955-12-08
638 Voss, Houston, TX
M
40000
888665555
5
Jennifer
S
Wallace
987654321
1941-06-20
291 Berry, Bellaire, TX
F
43000
888665555
4
Ramesh
K
Narayan
666884444
1962-09-15
975 Fire Oak, Humble, TX
M
38000
333445555
5
James
E
Borg
888665555
1937-11-10
450 Stone, Houston, TX
M
55000
NULL
1
Combining Multiple Conditions
The SELECT operation supports combining multiple conditions using Boolean operators like AND, OR, and NOT.
Example: Complex Filtering
Let us see another example where we merge multiple conditions. To find employees working in department 4 with a salary above 25,000orindepartment5withasalaryabove30,000, we can write the following complex query:
This operation results in a filtered table with only those rows that satisfy at least one of the conditions.
From the corresponding examples, one thing we understood that the SELECT operation is commutative, which means,
σcond1(σcond2(R))=σcond2(σcond1(R))
SELECT does not alter the structure of the table; only the number of rows changes.
Understanding the PROJECT Operation
The PROJECT operation targets columns. It creates a new table with only the specified attributes, discarding all others. This operation is useful when we need to focus on specific fields from a dataset.
Syntax and Representation
In relational algebra, the PROJECT operation can be denoted as:
πattribute_list(R)
Here:
π (pi): symbolizes the PROJECT operation.
attribute_list: specifies the columns to include in the result.
R is the relation being projected.
Example: Retrieving Specific Attributes
Let us see some examples from simpler to complex. If we want to list only the first name, last name, and salary of employees from the EMPLOYEE table, we write the query in the following format.
πFname,Lname,Salary(EMPLOYEE)
The resulting table will contain only the selected columns â
Fname
Lname
Salary
John
Smith
30000
Franklin
Wong
40000
Alicia
Zelaya
25000
Jennifer
Wallace
43000
Ramesh
Narayan
38000
Joyce
English
25000
Ahmad
Jabbar
25000
James
Borg
55000
Duplicate Elimination in PROJECT
A critical aspect of PROJECT is that it eliminates duplicate rows. If two rows in the original table have the same values for the projected columns, only one will appear in the result.
Example: Duplicate Elimination
To list unique combinations of gender (Sex) and salary, use:
πSex,Salary(EMPLOYEE)
This gives that any duplicate rows in the Sex and Salary columns are removed in the output.
Sex
Salary
M
30000
M
40000
F
25000
F
43000
M
38000
M
25000
M
55000
“F 25000” is removed since it is duplicate.
Sequences and Naming in Operations
For more complex queries, we often combine SELECT and PROJECT operations. These sequences can either be written as nested expressions or broken down into steps with intermediate results.
Example: Combining SELECT and PROJECT
To find the first name, last name, and salary of employees in department 5, we can write:
First, apply the SELECT operation:
TEMP⟵σDno=5(EMPLOYEE)
It will fetch the following records â
Fname
Minit
Lname
Ssn
Bdate
Address
Sex
Salary
Super_ssn
Dno
John
B
Smith
123456789
1965-01-09
731 Fondren, Houston, TX
M
30000
333445555
5
Franklin
T
Wong
333445555
1955-12-08
638 Voss, Houston, TX
M
40000
888665555
5
Ramesh
K
Narayan
666884444
1962-09-15
975 Fire Oak, Humble, TX
M
38000
333445555
5
Joyce
A
English
453453453
1972-07-31
5631 Rice, Houston, TX
F
25000
333445555
5
Next, apply the PROJECT operation:
RESULT⟵ÏFname,Lname,Salary(TEMP)
It will fetch the following records â
Fname
Lname
Salary
John
Smith
30000
Franklin
Wong
40000
Ramesh
Narayan
38000
Joyce
English
25000
Alternatively, this can be combined into a single inline expression:
πFname,Lname,Salary(σDno=5(EMPLOYEE))
Renaming the Attributes
Sometimes, the resulting table from a sequence of operations needs attribute names to be changed for clarity. The RENAME operation, denoted by “ρ” (rho), allows us to rename the attributes or the table itself.
Relational database systems are expected to be equipped with a query language that can assist its users to query the database instances. There are two kinds of query languages − relational algebra and relational calculus.
Relational Algebra
Relational algebra is a procedural query language, which takes instances of relations as input and yields instances of relations as output. It uses operators to perform queries. An operator can be either unary or binary. They accept relations as their input and yield relations as their output. Relational algebra is performed recursively on a relation and intermediate results are also considered relations.
The fundamental operations of relational algebra are as follows −
Select
Project
Union
Set different
Cartesian product
Rename
We will discuss all these operations in the following sections.
Select Operation (σ)
It selects tuples that satisfy the given predicate from a relation.
Notation − σp(r)
Where σ stands for selection predicate and r stands for relation. p is prepositional logic formula which may use connectors like and, or, and not. These terms may use relational operators like − =, ≠, ≥, < , >, ≤.
For example −
σsubject = "database"(Books)
Output − Selects tuples from books where subject is ‘database’.
σsubject = "database" and price = "450"(Books)
Output − Selects tuples from books where subject is ‘database’ and ‘price’ is 450.
σsubject = "database" and price = "450" or year > "2010"(Books)
Output − Selects tuples from books where subject is ‘database’ and ‘price’ is 450 or those books published after 2010.
Project Operation (∏)
It projects column(s) that satisfy a given predicate.
Notation − ∏A1, A2, An (r)
Where A1, A2 , An are attribute names of relation r.
Duplicate rows are automatically eliminated, as relation is a set.
For example −
∏subject, author (Books)
Selects and projects columns named as subject and author from the relation Books.
Union Operation (∪)
It performs binary union between two given relations and is defined as −
r ∪ s = { t | t ∈ r or t ∈ s}
Notation − r U s
Where r and s are either database relations or relation result set (temporary relation).
For a union operation to be valid, the following conditions must hold −
r, and s must have the same number of attributes.
Attribute domains must be compatible.
Duplicate tuples are automatically eliminated.
∏ author (Books) ∪ ∏ author (Articles)
Output − Projects the names of the authors who have either written a book or an article or both.
Set Difference (−)
The result of set difference operation is tuples, which are present in one relation but are not in the second relation.
Notation − r − s
Finds all the tuples that are present in r but not in s.
∏ author (Books) − ∏ author (Articles)
Output − Provides the name of authors who have written books but not articles.
Cartesian Product (Χ)
Combines information of two different relations into one.
Notation − r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
σauthor = 'tutorialspoint'(Books Χ Articles)
Output − Yields a relation, which shows all the books and articles written by tutorialspoint.
Rename Operation (ρ)
The results of relational algebra are also relations but without any name. The rename operation allows us to rename the output relation. ‘rename’ operation is denoted with small Greek letter rhoρ.
Notation − ρx (E)
Where the result of expression E is saved with name of x.
Additional operations are −
Set intersection
Assignment
Natural join
Relational Calculus
In contrast to Relational Algebra, Relational Calculus is a non-procedural query language, that is, it tells what to do but never explains how to do it.
Relational calculus exists in two forms −
Tuple Relational Calculus (TRC)
Filtering variable ranges over tuples
Notation − {T | Condition}
Returns all tuples T that satisfies a condition.
For example −
{ T.name | Author(T) AND T.article = 'database' }
Output − Returns tuples with ‘name’ from Author who has written article on ‘database’.
TRC can be quantified. We can use Existential (∃) and Universal Quantifiers (∀).
For example −
{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}
Output − The above query will yield the same result as the previous one.
Domain Relational Calculus (DRC)
In DRC, the filtering variable uses the domain of attributes instead of entire tuple values (as done in TRC, mentioned above).
Notation −
{ a1, a2, a3, …, an | P (a1, a2, a3, … ,an)}
Where a1, a2 are attributes and P stands for formulae built by inner attributes.
For example −
{ | ∈ TutorialsPoint ∧ subject = 'database'}
Output − Yields Article, Page, and Subject from the relation TutorialsPoint, where subject is database.
Just like TRC, DRC can also be written using existential and universal quantifiers. DRC also involves relational operators.
The expression power of Tuple Relation Calculus and Domain Relation Calculus is equivalent to Relational Algebra.
The Enhanced Entity-Relationship (EER) model is used to get more advanced features into database design, and help database designers map real-world scenarios more effectively. There are two core concepts in the EER Model – specialization and generalization – that help refine or abstract the entities and help designers create schemas that are both flexible and intuitive.
In this chapter, we will throw some light on these two core concepts, specialization and generalization, and highlight how they are different from each other, using practical examples for a better understanding.
What is Specialization in the Extended ER Model?
Specialization focuses on dividing a general entity type, or superclass, into smaller, more specific categories called subclasses. This process is quite useful when entities in the superclass have distinct characteristics or relationships, and the relationships only apply to certain members.
Specialization is like zooming in on a larger picture to highlight its finer details. Specialization takes a broad category and splits it into meaningful subcategories based on distinguishing attributes or roles. For example, from the EMPLOYEE entity type, we can define subclasses such as SECRETARY, ENGINEER, and TECHNICIAN.
Types of Specialization
Specialization could be either attribute-defined or user-defined –
Attribute-Defined Specialization − In this type of specialization, the membership in subclasses is determined by the value of a specific attribute in the superclass. Like consider the attribute Job_type in EMPLOYEE. If Job_type = “Technician”, then the employee is part of the TECHNICIAN subclass.
User-Defined Specialization − In this type, the subclass membership is manually assigned by users. For example, managers might decide which EMPLOYEES belong to a specific training group.
Examples of Specialization
Let’s consider the company database. EMPLOYEE is specialized into subclasses, such as:
SECRETARY with an attribute Typing_speed
ENGINEER with an attribute Eng_type
TECHNICIAN with attributes like Tgrade
These subclasses allow the database to store data more efficiently by associating unique attributes with only the relevant groups.
Constraints on Specialization
While working on specializations, we face some constraints. Specialization constraints could be –
Disjoint − Entities can belong to only one subclass. Like, an EMPLOYEE cannot simultaneously be a TECHNICIAN and an ENGINEER.
Overlapping − Entities can belong to multiple subclasses. For instance, a salaried engineer could belong to both SALARIED_EMPLOYEE and ENGINEER.
Specializations may also be:
Total − Every entity in the superclass must belong to at least one subclass. For example, all EMPLOYEES are either HOURLY_EMPLOYEES or SALARIED_EMPLOYEES.
Partial − Some entities may not belong to any subclass. For example, not every EMPLOYEE is a SECRETARY, ENGINEER, or TECHNICIAN.
What is Generalization in the Extended ER Model?
Generalization combines multiple entity types into a single, broader entity type by identifying shared characteristics. Generalization is like stepping back to see the bigger picture. It suppresses the differences among entities and emphasizes what they have in common.
For instance, in a vehicle database, the CAR and TRUCK share attributes like Vehicle_id and License_plate_no. These can be generalized into a VEHICLE superclass.
Process of Generalization
Generalization is simple. All that is needed is to identify common attributes or relationships among two or more entity types. So, we need to define a new superclass that captures these shared features, representing the original entity types as subclasses of the new superclass.
In the previous transportation database example, we have seen the following –
Entity Types − CAR and TRUCK
CAR has attributes like No_of_passengers.
TRUCK has attributes like Tonnage and No_of_axles.
Here, the VEHICLE includes shared attributes such as Vehicle_id and License_plate_no.
This approach avoids redundancy by grouping shared data at the superclass level while preserving unique characteristics in the subclasses.
Combining Specialization and Generalization
Specialization and generalization are applied within the same database design. These two processes are not mutually exclusive; they are complementary.
Consider the university database. Let’s see how specialization and generalization are applied here –
Specialization − The PERSON entity type is specialized into STUDENT, EMPLOYEE, and ALUMNUS. EMPLOYEE is further divided into FACULTY, STAFF, and STUDENT_ASSISTANT.
Generalization − FACULTY and STAFF are generalized into EMPLOYEE.
By combining these processes, designers can make a more structured and accurate representation of the data.
Diagrammatic Representation of Specialization and Generalization
In EER diagrams, we visualize specialization and generalization clearly.
Specialization − The superclass connects to subclasses through a circle. The lines indicating subclass relationships. Subclass-specific attributes, such as Typing_speed for SECRETARY, are attached to the respective subclass.
Generalization − The subclasses connect to the generalized superclass. It highlights shared attributes. For instance:
In a specialization diagram, EMPLOYEE connects to subclasses like ENGINEER and SECRETARY.
In a generalization diagram, CAR and TRUCK converge into a VEHICLE superclass.
Differences between Specialization and Generalization
Specialization focuses on highlighting the differences between entities. It breaks down a superclass into specific subclasses. For example: EMPLOYEE → SECRETARY, ENGINEER.
Generalization emphasizes the commonalities between entities. It combines specific subclasses into a generalized superclass. For example: CAR, TRUCK → VEHICLE.
Real-World Applications of Specialization and Generalization
The concepts of specialization and generalization are widely used in various domains:
Example 1: Company Database
Specialization − EMPLOYEE entity type is specialized into categories like SALARIED_EMPLOYEE and HOURLY_EMPLOYEE.
Generalization − TEMP_WORKER and PERMANENT_WORKER are generalized into EMPLOYEE.
Example 2: Vehicle Registration
Specialization − VEHICLE entity type is specialized into PASSENGER_CAR and COMMERCIAL_VEHICLE.
Generalization − CAR and TRUCK are generalized into VEHICLE.
Advantages of Specialization and Generalization
Following are the advantages of using specialization and generalization features in the Extended ER Model –
Data Organization − Ensures logical grouping of attributes and relationships, making the database easier to manage.
Flexibility − Accommodates both unique and shared characteristics of entities.
Efficiency − Reduces redundancy by storing shared attributes in a superclass.
Real-World Representation − Mirrors how entities are structured in real life.
In traditional ER diagrams, we use the entities and their relationships with rectangles and diamonds, respectively. The Enhanced Entity-Relationship (EER) model extends the traditional ER model to support more complex database designs. In EER, we have new concepts such as subclasses, superclasses, and inheritance that are particularly useful in representing real-world entities and their relationships in a structured and hierarchical way. These useful features make EER models suitable for applications like as engineering design, telecommunications, and complex organizational systems.
In this chapter, we will understand in detail the concepts of subclasses, superclasses, inheritance, and their associated processes of specialization and generalization with examples for a better understanding.
Subclasses and Superclasses
A superclass shows a broad category of entities, however its subclasses define more specific groupings within that category. The superclass-subclass relationship helps us model hierarchical data.
For example, consider the EMPLOYEE entity type in a company’s database. It is a broad category that might include subclasses such as:
SECRETARY − Employees who handle administrative tasks.
TECHNICIAN − Employees who provide technical support or services.
ENGINEER − Employees who design and develop systems or products.
The EMPLOYEE entity type is like the superclass, it has the shared characteristics of all employees. The specific roles like ENGINEER or SECRETARY are the subclasses.
The Class / Subclass Relationship
The connection between a superclass and its subclasses is known as the class-subclass relationship. This relationship shows that entities in a subclass are also members of the superclass.
For example, a TECHNICIAN entity is an EMPLOYEE with additional characteristics specific to the TECHNICIAN role. Similarly, a SECRETARY is an EMPLOYEE but with unique attributes, such as Typing_speed. This relationship is often described using an “is-a” relationship:
A TECHNICIAN is an
A SECRETARY is an
EER diagrams illustrate these relationships clearly –
Like ER diagrams, rectangles represent entities (both superclasses and subclasses).
Circles denote specialization or generalization processes.
Subset symbols point from subclasses back to their superclass.
For example, the EMPLOYEE superclass connects to its subclasses through a circle. Specific attributes, like Typing_speed for SECRETARY, are attached to the respective subclass rectangle.
Inheritance in Superclass-Subclass Relationship
Inheritance is one of the key features of the superclass-subclass relationship. Inheritance ensures that entities in a subclass retain the attributes and relationships of their superclass. It is used to enables reuse and prevents redundancy.
Attribute Inheritance
Entities in a subclass inherit −
Superclass Attributes − For instance, all EMPLOYEES might have attributes such as Name, SSN, and Address. These are inherited by subclasses like ENGINEER and TECHNICIAN.
Subclass-Specific Attributes − Each subclass has its unique attributes. For example –
SECRETARY might have Typing_speed.
TECHNICIAN might have Tgrade.
This hierarchical organization helps ensure that common characteristics remain at the superclass level while allowing subclasses to specialize further.
Relationship Inheritance
Subclasses also inherit the relationships of their superclasses. For example, If EMPLOYEE participates in a MANAGES relationship with a PROJECT, all subclasses of EMPLOYEE inherit this relationship.
This flexibility is useful for subclasses to focus on unique relationships while sharing inherited ones.
Specialization: Defining Subclasses
Specialization is the process of dividing a superclass into meaningful subclasses based on specific characteristics. This process is used in designing database schema that reflects real-world distinctions.
Types of Specialization
Predicate-Defined Specialization − Subclass membership can be found in conditions on a superclass attribute. Like Subclasses of EMPLOYEE might be defined based on the attribute named “Job_type”:
SECRETARY if Job_type = “Secretary”.
TECHNICIAN if Job_type = “Technician”.
User-Defined Specialization − Membership in subclasses is manually assigned by users. This is not determined by predefined conditions.
Overlapping vs. Disjoint Specialization
Specializations can be:
Disjoint − Entities belong to at most one subclass. For example, An EMPLOYEE can be either HOURLY_EMPLOYEE or SALARIED_EMPLOYEE, but not both.
Overlapping − Entities may belong to multiple subclasses. For example: A SALARIED_EMPLOYEE might also be a MANAGER.
Total vs. Partial Specialization
Total Specialization − Every entity in the superclass must belong to at least one subclass. Like every EMPLOYEE must be either HOURLY_EMPLOYEE or SALARIED_EMPLOYEE.
Partial Specialization − Some entities in the superclass do not belong to any subclass. Like some EMPLOYEES might not fit into SECRETARY, ENGINEER, or TECHNICIAN subclasses.
These entities have been represented in the following figure –
Generalization: Abstracting Superclasses
Generalization is the reverse process of specialization. Generalization identifies common attributes or relationships among multiple entity types and abstracts them into a single superclass.
For example, consider the following entity types:
CAR: Attributes include License_plate_no and Vehicle_id.
TRUCK: Attributes include Tonnage and No_of_axles.
These entities share common attributes like License_plate_no and Vehicle_id. Through generalization, these common attributes can be combined into a VEHICLE superclass, with CAR and TRUCK as its subclasses.
Combining Specialization and Generalization
In real-world examples, database design is often used in a mix of specialization and generalization –
Specialization refines entity types by breaking them into specific subgroups.
Generalization abstracts multiple entity types into broader categories.
For instance, in a university database:
Specialization might divide PERSON into EMPLOYEE, STUDENT, and ALUMNUS.
Generalization might combine FACULTY and STAFF into an EMPLOYEE superclass.
Example: University Database
Now let us consider a university database to showcase the practical application of these concepts –
Superclass − PERSON, with attributes like Name, SSN, Birth_date, and Address.
Subclasses −
EMPLOYEE − Specialized into FACULTY, STAFF, and STUDENT_ASSISTANT.
STUDENT − Specialized into GRADUATE_STUDENT and UNDERGRADUATE_STUDENT.
Shared Subclasses and Lattices
Sometimes a subclass belongs to multiple superclasses, forming a shared subclass. For example, STUDENT_ASSISTANT is a subclass of both STUDENT and EMPLOYEE. This creates a lattice structure, where entities inherit attributes from multiple paths.
For example:
A GRADUATE_STUDENT inherits attributes from STUDENT and PERSON.
A STUDENT_ASSISTANT inherits from both STUDENT and EMPLOYEE.
Advantages of Subclasses, Superclasses, and Inheritance
Following are the advantages of using subclasses, superclasses, and inheritance in Enhanced ER diagrams –
Efficient Data Representation − Common attributes are stored in the superclass, reducing redundancy.
Flexibility − Subclasses can add specific details while retaining shared attributes and relationships.
Real-World Alignment − The hierarchical structure mirrors real-world scenarios, simplifying database design and usage.
The ER model is used to visually represent the process of designing a database. There are several components in an ER diagram. We can extend this ER model to some extent to make them more versatile.
The Enhanced Entity-Relationship (EER) model extends the basic ER model, by adding advanced concepts to represent more complex data structures. This improvement is particularly valuable for modeling databases in fields that demand greater precision. For example engineering, telecommunications, and geographic information systems.
In this chapter, we will have a look at the main elements of the EER model. In addition, we will touch upon the basics of subclasses, superclasses, inheritance, and specialization. We will cover them in greater detail in the subsequent chapters of this tutorial.
Basics of the Enhanced ER Model
The ER model offers foundational tools for creating database schemas. This is useful in many business and industrial applications. But, as database applications evolved, so did the need for more sophisticated modelling tools. That could be represented through complex relationships and data constraints with higher accuracy. The EER model can meet these needs by adding new features such as subclasses, superclasses, inheritance, and uniontypes.
Subclasses and Superclasses
The EER Model uses concepts such as subclasses and superclasses which are normally used in OOPs programming. A superclass represents a general entity type, while a subclass is a more specific grouping of that entity.
For instance, consider the EMPLOYEE entity type. Within a company, the EMPLOYEE category could include distinct groups such as SECRETARY, ENGINEER, TECHNICIAN, and MANAGER. Now each of these subgroups forms a subclass of EMPLOYEE.
Example Diagram − The EER diagram uses a circle to connect subclasses and their superclass. Lines link these entities, illustrating the relationship. In this case, EMPLOYEE is the superclass, while SECRETARY, ENGINEER, and TECHNICIAN are subclasses.
Here, the relationship between a superclass and its subclasses is often described as an “IS-A” relationship. This type of relationship means that any instance of a subclass is inherently also an instance of the superclass. Like a SECRETARY entity is also an EMPLOYEE entity in the database. This relationship indicates that a subclass can access all attributes and relationships defined in its superclass.
Inheritance of Attributes and Relationships
Inheritance is a popular feature from OOPs programming that allows the entities in a subclass to inherit attributes and relationships from their superclass. For instance, the EMPLOYEE entity type could have common attributes called Name, Ssn, and Address. The SECRETARY subclass may inherit these attributes while adding its own, such as Typing_speed.
Inheritance applies to relationships as well. If the EMPLOYEE entity participates in a PROJECT relationship, the SECRETARY subclass will inherit this participation, ensuring a consistent data structure.
Specialization and Generalization in EER Model
Specialization and generalization are two related processes in the EER model that refine how entities are categorized.
Specialization in EER Model
Specialization shows how to define subsets of an entity type based on distinct attributes or characteristics. In our example, we could create subclasses of EMPLOYEE like HOURLY_EMPLOYEE and SALARIED_EMPLOYEE based on the payment method. This can be done with Pay_scale attribute used for relevant subclass.
So, the subclasses created through specialization may have specific attributes and relationships that do not apply to the superclass.
Generalization in EER Model
Generalization is the opposite of specialization. It combines similar entity types into a broader superclass by identifying shared characteristics. For example, entity types CAR and TRUCK with attributes like License_plate_no and Price could be generalized into a superclass called VEHICLE.
The entity-specific attributes, such as No_of_axles for TRUCK and Max_speed for CAR, would be retained in their respective subclasses while inheriting common attributes from VEHICLE.
Constraints in Specialization and Generalization
Specialization and generalization can include constraints that define how entities belong to subclasses. The two main constraints are explained below –
Disjointness Constraint − The disjointness constraint indicates that an entity can be a member of only one subclass. If subclasses are disjoint, the EMPLOYEE cannot be both a SECRETARY and an ENGINEER. This is represented in diagrams by a “d” in the circle connecting subclasses to the superclass. Take a look at the figure below.
Completeness Constraint − The completeness constraint is used if an entity must belong to at least one subclass (total specialization) or if it can exist without being in any subclass (partial specialization). There could be a double line between the superclass and the circle to denote a total specialization, while a single line indicates a partial specialization.
Example − Take a look at the following figure. In Figure 8.1, the specialization {HOURLY_EMPLOYEE, SALARIED_EMPLOYEE} is disjoint and total. Every EMPLOYEE must be either hourly or salaried, but not both.
Using Specialization and Generalization in EER Diagrams
In practice, database designers often use specialization to start with a general entity type and create specific subtypes to handle particular attributes or relationships. Conversely, they may use generalization to combine existing entity types when commonalities are found.
Advanced Features: Union Types and Categories
Sometimes, a subclass may need to inherit from more than one superclass. This type of structure is known as a category or uniontype. This is used when an entity type shares attributes across multiple parent classes. For instance, an OWNER of a vehicle in a registration database might be a PERSON, a BANK, or a COMPANY. The OWNER subclass would be connected to all these superclasses. In the above figure, we are using the “∪” symbol to represent union relationships.
Conclusion
In this chapter, we learned the Enhanced ER model that extends the traditional ER model by incorporating subclasses, superclasses, type inheritance, and specialization or generalization processes. We also reviewed how these features enable more precise and flexible database schemas. We also touched upon constraints such as disjointness and completeness and looked at union types for complex relationships.
Designing a database becomes easier when we use a graphical representation of the entire database. One of the best tools for this is the Entity-Relationship Diagram (ERD). It helps capture important details about entities, their relationships, and any constraints within a business environment.
In this chapter, we’ll learn how to design an ERD using a real-world example of a COMPANY database. We’ll break down each step, explain the types of entities and relationships, and make the process easy to understand.
Company Database Tables Design
We have the following tables in the COMPANY database −
EMPLOYEE
Fname
Minit
Lname
Ssn
Bdate
Address
Sex
Salary
Super_ssn
Dno
DEPARTMENT
Dname
Dnumber
Mgr_ssn
Mgr_start_date
DEPT_LOCATIONS
Dnumber
Dlocation
PROJECT
Pname
Pnumber
Plocation
Dnum
WORKS_ON
Essn
Pno
Hours
DEPENDENT
Essn
Dependent_name
Sex
Bdate
Relationship
Basics of ER Diagram
To understand the example. Let us see the foundational concepts of an ERD: There are several components.
Entities: Represent objects or items within a system, such as EMPLOYEE, DEPARTMENT, and PROJECT.
Attributes: Characteristics that define an entity. For example, an EMPLOYEE may have attributes like Name, Ssn, Address, and Salary.
Relationships: Connect entities and signify how they interact. For example, an EMPLOYEE may WORKS_FOR a DEPARTMENT.
Visual Representation in ER Diagrams
We use the following shapes for visual representation in ER diagrams −
Rectangles for entities
Diamonds for relationships
Ovals for attributes
Double ovals for multivalued attributes
Double Rectangle: Weak Entity
Double Diamond: Weak Relationship
Real-World Example: COMPANY Database ER Diagram
The COMPANY database shows a complex business structure. Here is how we create an ERD for it, incorporating key entities and relationships:
Identifying the Core Entities
The main entities in the COMPANY database are highlighted below −
EMPLOYEE − Represents staff members with attributes such as Name, Ssn, Salary, and Bdate.
DEPARTMENT − The organizational unit with attributes like Name and Number.
PROJECT − Tasks managed by departments with attributes including Name and Number.
DEPENDENT − Individuals dependent on an employee with attributes like Name, Relationship, and Birth_date.
Defining the Key Relationships
Here are the main relationships connecting these entities:
MANAGES − A relationship between EMPLOYEE and DEPARTMENT. Each department has one manager, but an employee may or may not manage a department.
WORKS_FOR − A relationship between DEPARTMENT and EMPLOYEE. Each department has many employees, and each employee must be assigned to a department.
CONTROLS − A relationship where a DEPARTMENT oversees one or more PROJECTS. A project is always managed by one department.
SUPERVISION − A relationship where one EMPLOYEE supervises others.
WORKS_ON − A relationship, showing that employees can work on multiple projects and vice versa. The Hours attribute is linked here to indicate time spent.
DEPENDENTS_OF − A relationship between EMPLOYEE and DEPENDENT, highlighting that an employee may have one or more dependents.
Structuring the ER Diagram
Let us now understand the step-by-step process of creating the ER diagram:
Draw the Entities − Create rectangles labelled EMPLOYEE, DEPARTMENT, PROJECT, and DEPENDENT. Connect ovals with key attributes like Name, Ssn, and Salary to the EMPLOYEE. Link attributes like Number to DEPARTMENT. We have seen the entity in the previous figures.
Define the Relationships − Use diamonds to represent relationships like WORKS_FOR, MANAGES, etc. Connect these diamonds to relevant entities using straight lines. Use single lines for partial participation and double lines for total participation.
Example Breakdown: The following figure depicts the COMPANY ER Diagram −
MANAGES shows that each DEPARTMENT must have a manager (total participation for DEPARTMENT). However, an EMPLOYEE may not manage any department (partial participation).
WORKS_ON connects EMPLOYEE and PROJECT, indicating an M relationship with the attribute Hours. It is describing the duration an employee works on a project.
DEPENDENTS_OF illustrates EMPLOYEE and DEPENDENT with a 1
ratio, showing that while employees might not have dependents (partial participation), each DEPENDENT must have an associated EMPLOYEE (total participation).
Special Considerations in Drawing the ER Diagram
Make a note of the following special considerations while drawing an ER diagram:
Weak Entities and Identifying Relationships: In the ERD, we can see DEPENDENT is a weak entity. It is identified by its connection to EMPLOYEE. It cannot exist independently and is shown in a double rectangle. We can also notice that the DEPENDENTS_OF relationship is an identifying relationship, shown in a double diamond.
Redundancy and Refinement: Here the attributes should be consolidated into relationships when necessary. For example, Manager and Manager_start_date from DEPARTMENT were moved to the MANAGES relationship to reduce redundancy.
Composite and Multivalued Attributes: We can also see the attributes like Name for EMPLOYEE may be composite (consisting of Fname, Minit, Lname). Multivalued attributes like Locations for DEPARTMENT are shown using double ovals.
Role Names − Add role names to relationships when the same entity participates more than once, such as SUPERVISION, where one EMPLOYEE acts as the supervisor and another as the supervisee.
Conclusion
In this chapter, we learned how to design an ER diagram step by step using a real-world example the COMPANY database. We identified the main entities, defined the relationships between them, marked participation and cardinality, and used special elements like weak entities and composite attributes to add clarity. This structured approach helps in designing ER diagrams for visualizing data models clearly for effective database design.