Database System Concepts and Architecture
Data Models
A collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints.
- Relational Model
- collection of tables to represent both data and the relationships among those data
- Tables are also known as relations, tuples (rows), and attributes (columns)
- Record-based model
- Entity-Relationship Model
- uses a collection of basic objects, called entities, and relationships among these objects
- Graphical representation
- Semi-structured Data Model
- individual data items of the same type may have different sets of attributes
- e.g.. JSON (JavaScript Object Notation) and Extensible Markup Language (XML)
- Object-Based Data Model
- instead of just storing plain data, we store objects with both data + behavior (methods).
- Methods are defined using
CREATE FUNCTIONin database.
CREATE FUNCTION start_engine(car_id INT) RETURNS TEXT AS $$BEGIN RETURN 'Engine started for Car ID ' || car_id;END;$$ LANGUAGE plpgsql;Data Models Comparison
| Feature | Relational Model | Entity-Relationship Model | Semi-structured Data Model | Object-Based Data Model |
|---|---|---|---|---|
| Definition | Uses tables to store data and relationships. | Uses entities and relationships to model data conceptually. | Allows flexible data representation where attributes may vary. | Stores objects with data and methods. |
| Purpose | Organizing structured data efficiently. | Conceptual design of data relationships. | Handling irregular and hierarchical data. | Storing and manipulating complex data with behavior. |
| Structure | Tables (relations), rows (tuples), columns (attributes). | Graphical model with entities, attributes, and relationships. | Tree or graph-like hierarchical structure. | Objects containing attributes (data) and methods (behavior). |
| Primary Component | Tables, Rows, Columns. | Entities, Relationships, Attributes. | Nodes, Attributes, and Links. | Objects, Classes, Methods. |
| Representation | Tabular format. | ER diagrams. | JSON, XML, NoSQL. | Object-oriented structure (like Java/C++ classes). |
| Data Storage | Stored in tables with fixed schema. | Conceptual, converted into relational or NoSQL structures. | Flexible schema, supports self-describing formats. | Stored as objects in relational or NoSQL databases. |
| Keys | Primary key, Foreign key. | Entity Identifiers, Relationship Keys. | Key-Value pairs, Path-based access. | Object ID, References. |
| Normalization | Ensures minimal redundancy using normalization techniques. | Not applicable directly (converted to relational later). | Schema-less, normalization not required. | Encapsulation reduces redundancy but does not follow strict normalization. |
| Implementation | SQL-based relational databases (MySQL, PostgreSQL). | Used for conceptual design before implementation. | NoSQL databases (MongoDB, CouchDB). | Object-relational databases (PostgreSQL with JSONB, ObjectDB). |
| Use Case | Banking, ERP, CRM, Inventory systems. | Data modeling before implementation in any database. | Web data, APIs, logs, XML/JSON document storage. | Complex applications like CAD, AI/ML models, and multimedia databases. |
Data Abstraction
-
Physical Level
- Uses complex low-level data structures.
- Lowest level, describes how data is actually stored.
- Example: Data stored as blocks of bytes, records in files, indexes for retrieval.
-
Logical Level
- Defines what data is stored and the relationships between data.
- Uses simpler structures compared to the physical level.
- Example: Table schema, relationships (e.g., instructor dept_name must exist in department).
- Provides physical data independence (changes at physical level do not affect logical level).
-
View Level
- Highest level, shows only relevant parts of the database to users.
- Multiple views for different users (e.g., students see course details, but not instructor salaries).
- Enhances security by restricting access to certain data.
Schema (Structure of the Database)
- The schema is the design or blueprint of a database.
- It defines how data is organized in terms of tables, columns, data types, relationships, constraints, and indexes.
- It remains constant over time unless modified.
Example
CREATE TABLE Student ( ID INT PRIMARY KEY, Name VARCHAR(50), Age INT, Dept VARCHAR(20) );This schema defines the structure of the “Student” table, specifying its columns and their data types.
Physical Schema vs. Logical Schema in DBMS
| Aspect | Physical Schema | Logical Schema |
|---|---|---|
| Definition | Describes how data is physically stored on storage devices (disk, SSD, etc.). | Describes the logical structure of the database, including tables, relationships, and constraints. |
| Purpose | Focuses on storage optimization, indexing, and file organization. | Focuses on data organization, relationships, and integrity constraints. |
| Visibility | Hidden from users and application developers; handled by the DBMS internals. | Used by database administrators and developers to define data structure. |
| Changes | Changes when storage mechanisms or performance optimizations are updated. | Changes when database structure or requirements are modified. |
| Example | Data stored as B-Trees, Hash Indexes, File Structures. | CREATE TABLE Student (ID INT, Name VARCHAR(50)); |
| Analogy: |
- Physical Schema → Like the hard disk storage format (how files are structured internally).
- Logical Schema → Like the table structure in SQL (how data is logically organized).
Instance (Actual Data in the Database)
- An instance refers to the actual data stored in the database at a specific moment.
- It changes frequently as data is inserted, updated, or deleted.
- The table now contains an instance with actual records.
Example
INSERT INTO Student VALUES (101, 'John', 21, 'CSE'); INSERT INTO Student VALUES (102, 'Alice', 22, 'ECE');Key Differences Between Schema and Instance
| Aspect | Schema | Instance |
|---|---|---|
| Definition | Structure of the database | Actual data stored |
| Changes | Rarely changes | Changes frequently |
| Persistence | Permanent | Temporary (data changes over time) |
| Example | Table design (CREATE TABLE) | Records stored in the table |
Three-Schema Architecture in DBMS
The Three-Schema Architecture separates a database into three levels to achieve data abstraction and independence:
- Internal Schema (Physical Level)
- Defines how data is stored physically (file structures, indexing, storage allocation).
- Deals with performance optimization and hardware-level data organization.
- Conceptual Schema (Logical Level)
- Defines what data is stored and the relationships between them.
- Includes tables, attributes, constraints, and integrity rules.
- Provides physical data independence (insulating applications from storage changes).
- External Schema (View Level)
- Defines user-specific views of the database.
- Users interact with filtered or restricted views of the database for security and simplicity.
- Provides logical data independence (insulating users from changes in the logical schema).
Data Independence
Data independence ensures that changes at one level do not affect other levels. It is divided into:
- Physical Data Independence
- Changes in physical storage (e.g., indexing, compression, partitioning) do not affect the logical schema.
- Example: Changing a database’s storage from SSD to cloud storage without altering table definitions.
- Logical Data Independence
- Changes in the logical schema (tables, relationships, constraints) do not affect external views or applications.
- Example: Adding a new column to a table without affecting existing user queries or views.
Database Languages
DBMS uses different languages to interact with and manage databases. These include:
- Data Definition Language (DDL)
- Used to define and modify database structures.
- Commands:
CREATE(creates tables, views, indexes)ALTER(modifies existing schema)DROP(deletes tables or databases)TRUNCATE(removes all records without logging individual row deletions)
- Data Manipulation Language (DML)
- Used for inserting, updating, deleting, and retrieving data.
- Types:
- Procedural DML (Requires the user to specify how data should be retrieved)
- Declarative DML (SQL-based) (Only specifies what data is needed, and DBMS decides how to retrieve it)
- Commands:
INSERT,UPDATE,DELETE,SELECT
- Data Control Language (DCL)
- Used for access control and security.
- Commands:
GRANT(assigns privileges)REVOKE(removes privileges)
- Transaction Control Language (TCL)
- Manages transactions to maintain data integrity.
- Commands:
COMMIT(saves changes)ROLLBACK(undoes changes)SAVEPOINT(sets a rollback point within a transaction)
Database Interfaces
DBMS provides different interfaces for users to interact with the database:
- Command-Line Interface (CLI)
- Uses SQL commands to interact with the database.
- Example: MySQL CLI, PostgreSQL psql
- Graphical User Interface (GUI)
- Provides a visual way to interact with the database.
- Example: phpMyAdmin, pgAdmin
- Application Programming Interface (API)
- Allows developers to interact with the database using programming languages.
- Example: JDBC (Java Database Connectivity), ODBC (Open Database Connectivity)
- Natural Language Interface
- Allows users to interact using human-like language.
- Example: AI-powered database queries (ChatGPT-based SQL generation)
- Web-Based Interfaces
- Enables database access through web applications.
- Example: SQL-based reporting tools, cloud-based database dashboards
Centralized DBMS Architecture
- A single database system runs on a central server.
- All data is stored in one location, and users access it via terminals or remote connections.
- Example: A mainframe system with multiple dumb terminals.
Advantages:
✅ Easier data management and maintenance.
✅ Strong security as data is centralized.
✅ No data redundancy.
Disadvantages:
❌ Performance bottlenecks due to a single server handling all requests.
❌ If the server fails, the entire system goes down.
Single-User vs. Multiuser Systems
| Feature | Single-User Systems | Multiuser Systems |
|---|---|---|
| Devices | Mobile, PC | Servers, Data Centers |
| Users | One user at a time | Multiple users remotely connected |
| Processors | Typically 1 CPU (multi-core) | Multiple CPUs |
| Concurrency Control | Minimal | Full transaction support |
| Crash Recovery | Basic or absent | Robust mechanisms |
| Query Support | API-based | SQL queries & API |
| Example | Embedded databases (SQLite) | Server-based DBMS (PostgreSQL, MySQL) |
Client/Server DBMS Architecture
- The database is stored on a server, and users (clients) interact with it via networked applications.
- Clients send requests (SQL queries), and the server processes them and returns results.
Types of Client/Server Architectures
- Two-Tier Architecture:
- Clients directly communicate with the database server using APIs (e.g., JDBC, ODBC).
- Example: A desktop application connecting to a database server.
- Three-Tier Architecture:
- Adds an intermediate application server between the client and database.
- The application server processes client requests, handles business logic, and interacts with the database.
- Example: Web applications with frontend (browser), middleware (API server), and backend (database).
Advantages:
✅ Faster performance with load distribution.
✅ Scalability—more clients can be added easily.
✅ Better security since clients don’t directly access the database.
Disadvantages:
❌ Complex setup and maintenance.
❌ Network latency may affect response times.
Difference Between Centralized and Client/Server Architectures for DBMS
| Feature | Centralized DBMS | Client/Server DBMS |
|---|---|---|
| Architecture | Entire database resides on a single machine (server). | Database is distributed across client and server. |
| Data Storage | Stored and managed in one central location. | Data is stored on the server, while clients access it remotely. |
| Processing | The server handles all processing (queries, transactions). | Clients process requests and send queries to the server. |
| Performance | Can become a bottleneck if many users access it simultaneously. | More scalable as clients handle some processing. |
| Scalability | Limited (depends on server capacity). | Highly scalable, as more clients and servers can be added. |
| Fault Tolerance | Low – if the server fails, the entire system stops working. | Higher – server redundancy and replication improve reliability. |
| Concurrency | Difficult to manage due to a single system handling all users. | Better as transactions are distributed among clients and the server. |
| Network Dependency | Not required (works independently). | Requires a network for communication between client and server. |
| Security | Easier to manage but poses a single point of failure. | More complex security measures needed due to multiple access points. |
| Examples | Single-user databases like Microsoft Access, SQLite. | Enterprise systems like MySQL, PostgreSQL, Oracle DB. |
Parallelism in Database Systems
Parallelism in databases affects performance and scalability:
Coarse-Grained Parallelism
- Few cores (4–8) share main memory.
- Each query runs on a single processor, while multiple queries execute concurrently.
- Increases throughput (more queries per second).
- Example: Traditional server-based databases.
Fine-Grained Parallelism
- Large number of processors available.
- A single query is split into multiple tasks that run in parallel.
- Improves response time for individual queries.
- Example: Parallel database systems (Google BigQuery, Amazon Redshift).