Skip to content

Data Modeling

Entity-Relationship Diagram, Relational Model - Constraints, Languages, Design, and Programming, Relational Database Schemas, Update Operations and Dealing with Constraint Violations; Relational Algebra and Relational Calculus; Codd Rules.

Entity-Relationship (ER) Diagram

An Entity-Relationship (ER) Diagram is a visual representation of the database structure. It helps in designing the database by illustrating the entities, their attributes, and relationships between them.


Components of an ER Diagram

ComponentDescription
EntityAn object or concept in the database (e.g., Student, Employee). Represented by a rectangle.
AttributesCharacteristics of an entity (e.g., Name, Age). Represented by ellipses.
Primary KeyA unique attribute that identifies an entity (e.g., Student_ID). Underlined inside the ellipse.
RelationshipsLinks between entities (e.g., A Student enrolls in a Course). Represented by a diamond.
CardinalityDefines the number of entity occurrences related (1:1, 1:M, M:N).
Weak EntityAn entity that depends on another entity (e.g., Order depends on Customer). Represented by double rectangles.
Multivalued AttributeAn attribute that can have multiple values (e.g., Phone Numbers). Represented by double ellipses.

Example ER Diagram: University Database

  • Entities: Student, Course, Instructor
  • Attributes: Student (ID, Name, Age), Course (Code, Title, Credits), Instructor (ID, Name, Department)
  • Relationships:
    • Student enrolls in Course (Many-to-Many)
    • Instructor teaches Course (One-to-Many)

Convert ER into table - Tpoint Tech

Relational Model

The Relational Model is the foundation of relational databases and is based on the concept of tables (relations) that store data in a structured format.


Constraints in the Relational Model

Constraints ensure data integrity and consistency. The main types of constraints are:

Constraint TypeDescription
Domain ConstraintEnsures attribute values are from a predefined domain (e.g., Age must be an integer).
Key ConstraintEvery table must have a Primary Key, which uniquely identifies each row.
Entity Integrity ConstraintThe Primary Key cannot be NULL.
Referential Integrity ConstraintA Foreign Key in one table must match a Primary Key in another table (or be NULL).
Not Null ConstraintEnsures an attribute always has a value.
Unique ConstraintEnsures values in a column are unique.

Relational Languages

Relational databases support different query languages:

LanguagePurpose
SQL (Structured Query Language)Standard language for managing relational databases (includes DDL, DML, DCL, TCL).
Relational AlgebraProcedural query language that uses operations like Selection (σ), Projection (π), Join (⨝), Union (∪), Difference (-), etc.
Tuple Relational Calculus (TRC)Non-procedural query language based on predicate logic.
Domain Relational Calculus (DRC)Similar to TRC but focuses on domains of attributes.

Relational Database Design

Good design ensures data consistency, eliminates redundancy, and improves efficiency.

Design PrincipleDescription
NormalizationProcess of organizing tables to minimize redundancy using Normal Forms (1NF, 2NF, 3NF, BCNF, etc.).
Functional DependenciesRules that define relationships between attributes (e.g., A → B means B is functionally dependent on A).
DecompositionSplitting tables into smaller ones to reduce redundancy while preserving dependencies.

Programming with Relational Databases

Relational databases support various programming techniques:

ApproachDescription
Embedded SQLSQL queries written inside application code (e.g., in Java, C, Python).
Stored ProceduresPredefined SQL scripts stored in the database to optimize execution.
TriggersAutomated SQL actions that execute when specific conditions are met.
ODBC/JDBCInterfaces that allow applications to connect to relational databases.

Relational Database Schemas

A Relational Database Schema defines the structure of a relational database, including:

  1. Relations (Tables) – A collection of tuples (rows) representing data.

  2. Attributes (Columns) – Each relation consists of attributes, which have a name and a domain.

  3. Keys – Identifies unique tuples. This includes:

    • Primary Key (PK) – A unique identifier for each row.
    • Foreign Key (FK) – A reference to a primary key in another table to establish relationships.
  4. Constraints – Rules to maintain data integrity (e.g., Not NULL, Unique, Referential Integrity).

  5. Schema Notation – A formal way to define relations:

    Example:

    Student(SID: INT, Name: VARCHAR(50), Age: INT, Dept: VARCHAR(20), PRIMARY KEY(SID))
    Course(CID: INT, Title: VARCHAR(100), Credits: INT, PRIMARY KEY(CID))
    Enrolled(SID: INT, CID: INT, Grade: CHAR(2),
    PRIMARY KEY(SID, CID),
    FOREIGN KEY (SID) REFERENCES Student(SID),
    FOREIGN KEY (CID) REFERENCES Course(CID))

Update Operations in a Relational Database

A relational database supports four key update operations:

OperationDescription
InsertAdds a new row to a table.
DeleteRemoves an existing row.
UpdateModifies attribute values of existing rows.
Modify (Alter Table)Changes the schema (adding/removing columns, changing constraints).

(a) Insert Operation

INSERT INTO Student (SID, Name, Age, Dept)
VALUES (101, 'Alice', 20, 'CS');
  • Inserts a new student record.
  • May violate Primary Key, Domain, or Referential Integrity Constraints.

(b) Delete Operation

DELETE FROM Student WHERE SID = 101;
  • Removes Alice from the database.
  • May violate Referential Integrity (if her SID exists in the Enrolled table).

(c) Update Operation

UPDATE Student SET Age = 21 WHERE SID = 101;
  • Updates Alice’s age.
  • May violate Domain Constraints (if an invalid value is entered).

(d) Modify Table Structure

ALTER TABLE Student ADD COLUMN Email VARCHAR(100);
  • Adds a new column to store student emails.

Dealing with Constraint Violations

Constraint violations occur when update operations do not satisfy predefined rules. Let’s examine how to handle them:

(a) Domain Constraint Violation

Occurs when an attribute value is outside the defined domain.

INSERT INTO Student (SID, Name, Age, Dept) VALUES (102, 'Bob', 'Twenty', 'CS');
  • Age should be an integer, but ‘Twenty’ is a string.
    Solution: Use correct data types.

(b) Primary Key Violation

Occurs when a duplicate key is inserted.

INSERT INTO Student (SID, Name, Age, Dept) VALUES (101, 'Charlie', 22, 'EE');
  • SID 101 already exists.
    Solution: Ensure unique primary key values or use ON DUPLICATE KEY UPDATE (in MySQL).

(c) Referential Integrity Violation

Occurs when a foreign key references a non-existent primary key.

INSERT INTO Enrolled (SID, CID, Grade) VALUES (999, 201, 'A');
  • SID 999 does not exist in the Student table.
    Solution: Ensure referenced values exist before insertion or use ON DELETE CASCADE to automatically remove dependent records.

(d) Not NULL Constraint Violation

Occurs when a mandatory field is left empty.

INSERT INTO Student (SID, Name, Age, Dept) VALUES (103, NULL, 23, 'ME');
  • Name cannot be NULL.
    Solution: Ensure required fields have values before inserting data.

(e) Unique Constraint Violation

Occurs when an attribute marked as unique is duplicated.

ALTER TABLE Student ADD CONSTRAINT unique_name UNIQUE(Name);
INSERT INTO Student (SID, Name, Age, Dept) VALUES (104, 'Alice', 24, 'CS');
  • If ‘Alice’ already exists, the insert fails.
    Solution: Ensure unique values or allow duplicates by removing the constraint.

(f) Check Constraint Violation

Occurs when a condition is not met.

ALTER TABLE Student ADD CONSTRAINT age_check CHECK (Age >= 18);
INSERT INTO Student (SID, Name, Age, Dept) VALUES (105, 'David', 16, 'CS');
  • Age must be ≥ 18, but 16 violates the constraint.
    Solution: Use valid values or remove the check constraint if unnecessary.

Summary Table: Update Operations & Constraint Handling

Update OperationPossible Constraint ViolationSolution
INSERTPrimary Key, Domain, Not NULL, Referential IntegrityEnsure valid values, handle foreign key constraints
DELETEReferential IntegrityUse ON DELETE CASCADE or manually delete dependent records
UPDATEDomain, Unique, Not NULLValidate new values before updating
ALTER TABLEUnique, Domain, Referential IntegrityEnsure data consistency before modification

Relational Algebra, Relational Calculus, and Codd’s Rules


1. Relational Algebra

Relational Algebra is a procedural query language that uses a set of operations to manipulate relations (tables) and retrieve data. It provides a foundation for SQL and other database query languages.

Basic Operations in Relational Algebra

OperationDescriptionNotation
Selection (σ)Retrieves rows that satisfy a condition.σ_condition(R)
Projection (π)Retrieves specific columns (attributes).π_attribute-list(R)
Union (∪)Combines tuples from two relations (duplicates removed).R ∪ S
Set Difference (-)Returns tuples in R but not in S.R - S
Cartesian Product (×)Combines all tuples of R with all tuples of S.R × S
Rename (ρ)Renames relation or attributes.ρ_newName(oldRelation)

Example Queries in Relational Algebra

(a) Selection (σ)

Retrieve all students from the “CS” department:

σ_Dept='CS' (Student)

(b) Projection (π)

Retrieve only student names:

π_Name (Student)

(c) Union (∪)

Retrieve all students from two different tables (Student1 and Student2):

Student1 ∪ Student2

(d) Set Difference (-)

Find students enrolled in Course1 but not in Course2:

Enrolled1 - Enrolled2

(e) Cartesian Product (×)

Pair every student with every course:

Student × Course

2. Relational Calculus

Relational Calculus is a non-procedural query language, meaning it specifies what data to retrieve without specifying how.

Types of Relational Calculus

  1. Tuple Relational Calculus (TRC)
    • Uses tuple variables and logical conditions.
    • Notation: {t | Condition(t)}
    • Example: Find students enrolled in course CS101:
{ t | t ∈ Student ∧ ∃ e ∈ Enrolled (e.SID = t.SIDe.CID = 'CS101') }
  1. Domain Relational Calculus (DRC)
    • Uses domain variables instead of tuple variables.
    • Notation: {<d1, d2, ..., dn> | Condition(d1, d2, ..., dn)}
    • Example: Find student names with age > 20:
{ <Name> | ∃ Age, Dept ( <Name, Age, Dept> ∈ Student ∧ Age > 20) }

Key Differences: Relational Algebra vs. Relational Calculus

FeatureRelational AlgebraRelational Calculus
TypeProceduralNon-procedural
FocusSpecifies how to retrieve dataSpecifies what data to retrieve
OperationsUses selection, projection, join, etc.Uses logical conditions
ImplementationUsed in query executionUsed for query specification

3. Codd’s 12 Rules for RDBMS

Dr. Edgar F. Codd defined 12 rules (often called Codd’s Rules) to qualify a system as a Relational Database Management System (RDBMS).

RuleDescription
Foundation RuleThe system must manage databases only through relational capabilities.
Information RuleAll information must be stored in tables (relations).
Guaranteed AccessEvery data element must be accessible using a combination of table name, primary key, and column name.
Systematic Null HandlingNULL values should be supported for missing or unknown information.
Active Online CatalogMetadata (schema, structure) must be stored in a relational format and accessible using queries.
Comprehensive Data SublanguageThe system must support a language (like SQL) for defining, querying, and modifying data.
View UpdatingViews should be updatable when logically possible.
High-Level Insert, Update, DeleteThe system must allow row-level and set-level operations, not just record-by-record modifications.
Physical Data IndependenceChanges in physical storage should not affect queries.
Logical Data IndependenceChanges in schema (e.g., adding a column) should not affect applications using the database.
Integrity IndependenceIntegrity constraints must be definable within the database, not the application.
Distribution IndependenceThe database should work regardless of whether data is centralized or distributed across multiple locations.
Non-SubversionThe system must prevent bypassing security and integrity constraints via low-level operations.

Significance of Codd’s Rules

  • Ensures ACID compliance (Atomicity, Consistency, Isolation, Durability).
  • Enforces relational principles for better scalability and performance.
  • SQL-based databases like MySQL, PostgreSQL, and Oracle follow these rules.

Final Summary

ConceptDescription
Relational AlgebraA procedural query language using operations like Selection, Projection, Join, and Union.
Relational CalculusA non-procedural query language using logical conditions (Tuple and Domain Calculus).
Codd’s Rules12 rules defining a proper Relational DBMS to ensure data integrity, consistency, and efficiency.