Author: saqibkhan

  • Deadlock

    In a multi-process system, deadlock is an unwanted situation that arises in a shared resource environment, where a process indefinitely waits for a resource that is held by another process.

    For example, assume a set of transactions {T0, T1, T2, …,Tn}. T0 needs a resource X to complete its task. Resource X is held by T1, and T1 is waiting for a resource Y, which is held by T2. T2 is waiting for resource Z, which is held by T0. Thus, all the processes wait for each other to release resources. In this situation, none of the processes can finish their task. This situation is known as a deadlock.

    Deadlocks are not healthy for a system. In case a system is stuck in a deadlock, the transactions involved in the deadlock are either rolled back or restarted.

    Deadlock Prevention

    To prevent any deadlock situation in the system, the DBMS aggressively inspects all the operations, where transactions are about to execute. The DBMS inspects the operations and analyzes if they can create a deadlock situation. If it finds that a deadlock situation might occur, then that transaction is never allowed to be executed.

    There are deadlock prevention schemes that use timestamp ordering mechanism of transactions in order to predetermine a deadlock situation.

    Wait-Die Scheme

    In this scheme, if a transaction requests to lock a resource (data item), which is already held with a conflicting lock by another transaction, then one of the two possibilities may occur −

    • If TS(Ti) < TS(Tj) − that is Ti, which is requesting a conflicting lock, is older than Tj − then Ti is allowed to wait until the data-item is available.
    • If TS(Ti) > TS(tj) − that is Ti is younger than Tj − then Ti dies. Ti is restarted later with a random delay but with the same timestamp.

    This scheme allows the older transaction to wait but kills the younger one.

    Wound-Wait Scheme

    In this scheme, if a transaction requests to lock a resource (data item), which is already held with conflicting lock by some another transaction, one of the two possibilities may occur −

    • If TS(Ti) < TS(Tj), then Ti forces Tj to be rolled back − that is Ti wounds Tj. Tj is restarted later with a random delay but with the same timestamp.
    • If TS(Ti) > TS(Tj), then Ti is forced to wait until the resource is available.

    This scheme, allows the younger transaction to wait; but when an older transaction requests an item held by a younger one, the older transaction forces the younger one to abort and release the item.

    In both the cases, the transaction that enters the system at a later stage is aborted.

    Deadlock Avoidance

    Aborting a transaction is not always a practical approach. Instead, deadlock avoidance mechanisms can be used to detect any deadlock situation in advance. Methods like “wait-for graph” are available but they are suitable for only those systems where transactions are lightweight having fewer instances of resource. In a bulky system, deadlock prevention techniques may work well.

    Wait-for Graph

    This is a simple method available to track if any deadlock situation may arise. For each transaction entering into the system, a node is created. When a transaction Ti requests for a lock on an item, say X, which is held by some other transaction Tj, a directed edge is created from Ti to Tj. If Tj releases item X, the edge between them is dropped and Ti locks the data item.

    The system maintains this wait-for graph for every transaction waiting for some data items held by others. The system keeps checking if there’s any cycle in the graph.

    Wait-for Graph

    Here, we can use any of the two following approaches −

    • First, do not allow any request for an item, which is already locked by another transaction. This is not always feasible and may cause starvation, where a transaction indefinitely waits for a data item and can never acquire it.
    • The second option is to roll back one of the transactions. It is not always feasible to roll back the younger transaction, as it may be important than the older one. With the help of some relative algorithm, a transaction is chosen, which is to be aborted. This transaction is known as the victim and the process is known as victim selection.
  • Concurrency Control

    In a multiprogramming environment where multiple transactions can be executed simultaneously, it is highly important to control the concurrency of transactions. We have concurrency control protocols to ensure atomicity, isolation, and serializability of concurrent transactions. Concurrency control protocols can be broadly divided into two categories −

    • Lock based protocols
    • Time stamp based protocols

    Lock-based Protocols

    Database systems equipped with lock-based protocols use a mechanism by which any transaction cannot read or write data until it acquires an appropriate lock on it. Locks are of two kinds −

    • Binary Locks − A lock on a data item can be in two states; it is either locked or unlocked.
    • Shared/exclusive − This type of locking mechanism differentiates the locks based on their uses. If a lock is acquired on a data item to perform a write operation, it is an exclusive lock. Allowing more than one transaction to write on the same data item would lead the database into an inconsistent state. Read locks are shared because no data value is being changed.

    There are four types of lock protocols available −

    Simplistic Lock Protocol

    Simplistic lock-based protocols allow transactions to obtain a lock on every object before a ‘write’ operation is performed. Transactions may unlock the data item after completing the write operation.

    Pre-claiming Lock Protocol

    Pre-claiming protocols evaluate their operations and create a list of data items on which they need locks. Before initiating an execution, the transaction requests the system for all the locks it needs beforehand. If all the locks are granted, the transaction executes and releases all the locks when all its operations are over. If all the locks are not granted, the transaction rolls back and waits until all the locks are granted.

    Pre-claiming

    Two-Phase Locking 2PL

    This locking protocol divides the execution phase of a transaction into three parts. In the first part, when the transaction starts executing, it seeks permission for the locks it requires. The second part is where the transaction acquires all the locks. As soon as the transaction releases its first lock, the third phase starts. In this phase, the transaction cannot demand any new locks; it only releases the acquired locks.

    Two Phase Locking

    Two-phase locking has two phases, one is growing, where all the locks are being acquired by the transaction; and the second phase is shrinking, where the locks held by the transaction are being released.

    To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then upgrade it to an exclusive lock.

    Strict Two-Phase Locking

    The first phase of Strict-2PL is same as 2PL. After acquiring all the locks in the first phase, the transaction continues to execute normally. But in contrast to 2PL, Strict-2PL does not release a lock after using it. Strict-2PL holds all the locks until the commit point and releases all the locks at a time.

    Strict Two Phase Locking

    Strict-2PL does not have cascading abort as 2PL does.

    Timestamp-based Protocols

    The most commonly used concurrency protocol is the timestamp based protocol. This protocol uses either system time or logical counter as a timestamp.

    Lock-based protocols manage the order between the conflicting pairs among transactions at the time of execution, whereas timestamp-based protocols start working as soon as a transaction is created.

    Every transaction has a timestamp associated with it, and the ordering is determined by the age of the transaction. A transaction created at 0002 clock time would be older than all other transactions that come after it. For example, any transaction ‘y’ entering the system at 0004 is two seconds younger and the priority would be given to the older one.

    In addition, every data item is given the latest read and write-timestamp. This lets the system know when the last read and write operation was performed on the data item.

    Timestamp Ordering Protocol

    The timestamp-ordering protocol ensures serializability among transactions in their conflicting read and write operations. This is the responsibility of the protocol system that the conflicting pair of tasks should be executed according to the timestamp values of the transactions.

    • The timestamp of transaction Ti is denoted as TS(Ti).
    • Read time-stamp of data-item X is denoted by R-timestamp(X).
    • Write time-stamp of data-item X is denoted by W-timestamp(X).

    Timestamp ordering protocol works as follows −

    • If a transaction Ti issues a read(X) operation −
      • If TS(Ti) < W-timestamp(X)
        • Operation rejected.
      • If TS(Ti) >= W-timestamp(X)
        • Operation executed.
      • All data-item timestamps updated.
    • If a transaction Ti issues a write(X) operation −
      • If TS(Ti) < R-timestamp(X)
        • Operation rejected.
      • If TS(Ti) < W-timestamp(X)
        • Operation rejected and Ti rolled back.
      • Otherwise, operation executed.

    Thomas’ Write Rule

    This rule states if TS(Ti) < W-timestamp(X), then the operation is rejected and Ti is rolled back.

    Time-stamp ordering rules can be modified to make the schedule view serializable.

    Instead of making Ti rolled back, the ‘write’ operation itself is ignored.

  • Transaction

    A transaction can be defined as a group of tasks. A single task is the minimum processing unit which cannot be divided further.

    Lets take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from A’s account to B’s account. This very simple and small transaction involves several low-level tasks.

    As Account

    Open_Account(A)
    Old_Balance = A.balance
    New_Balance = Old_Balance - 500
    A.balance = New_Balance
    Close_Account(A)
    

    Bs Account

    Open_Account(B)
    Old_Balance = B.balance
    New_Balance = Old_Balance + 500
    B.balance = New_Balance
    Close_Account(B)
    

    ACID Properties

    A transaction is a very small unit of a program and it may contain several lowlevel tasks. A transaction in a database system must maintain Atomicity, Consistency, Isolation, and Durability − commonly known as ACID properties − in order to ensure accuracy, completeness, and data integrity.

    • Atomicity − This property states that a transaction must be treated as an atomic unit, that is, either all of its operations are executed or none. There must be no state in a database where a transaction is left partially completed. States should be defined either before the execution of the transaction or after the execution/abortion/failure of the transaction.
    • Consistency − The database must remain in a consistent state after any transaction. No transaction should have any adverse effect on the data residing in the database. If the database was in a consistent state before the execution of a transaction, it must remain consistent after the execution of the transaction as well.
    • Durability − The database should be durable enough to hold all its latest updates even if the system fails or restarts. If a transaction updates a chunk of data in a database and commits, then the database will hold the modified data. If a transaction commits but the system fails before the data could be written on to the disk, then that data will be updated once the system springs back into action.
    • Isolation − In a database system where more than one transaction are being executed simultaneously and in parallel, the property of isolation states that all the transactions will be carried out and executed as if it is the only transaction in the system. No transaction will affect the existence of any other transaction.

    Serializability

    When multiple transactions are being executed by the operating system in a multiprogramming environment, there are possibilities that instructions of one transactions are interleaved with some other transaction.

    • Schedule − A chronological execution sequence of a transaction is called a schedule. A schedule can have many transactions in it, each comprising of a number of instructions/tasks.
    • Serial Schedule − It is a schedule in which transactions are aligned in such a way that one transaction is executed first. When the first transaction completes its cycle, then the next transaction is executed. Transactions are ordered one after the other. This type of schedule is called a serial schedule, as transactions are executed in a serial manner.

    In a multi-transaction environment, serial schedules are considered as a benchmark. The execution sequence of an instruction in a transaction cannot be changed, but two transactions can have their instructions executed in a random fashion. This execution does no harm if two transactions are mutually independent and working on different segments of data; but in case these two transactions are working on the same data, then the results may vary. This ever-varying result may bring the database to an inconsistent state.

    To resolve this problem, we allow parallel execution of a transaction schedule, if its transactions are either serializable or have some equivalence relation among them.

    Equivalence Schedules

    An equivalence schedule can be of the following types −

    Result Equivalence

    If two schedules produce the same result after execution, they are said to be result equivalent. They may yield the same result for some value and different results for another set of values. That’s why this equivalence is not generally considered significant.

    View Equivalence

    Two schedules would be view equivalence if the transactions in both the schedules perform similar actions in a similar manner.

    For example −

    • If T reads the initial data in S1, then it also reads the initial data in S2.
    • If T reads the value written by J in S1, then it also reads the value written by J in S2.
    • If T performs the final write on the data value in S1, then it also performs the final write on the data value in S2.

    Conflict Equivalence

    Two schedules would be conflicting if they have the following properties −

    • Both belong to separate transactions.
    • Both accesses the same data item.
    • At least one of them is “write” operation.

    Two schedules having multiple transactions with conflicting operations are said to be conflict equivalent if and only if −

    • Both the schedules contain the same set of Transactions.
    • The order of conflicting pairs of operation is maintained in both the schedules.

    Note − View equivalent schedules are view serializable and conflict equivalent schedules are conflict serializable. All conflict serializable schedules are view serializable too.

    States of Transactions

    A transaction in a database can be in one of the following states −

    Transaction States
    • Active − In this state, the transaction is being executed. This is the initial state of every transaction.
    • Partially Committed − When a transaction executes its final operation, it is said to be in a partially committed state.
    • Failed − A transaction is said to be in a failed state if any of the checks made by the database recovery system fails. A failed transaction can no longer proceed further.
    • Aborted − If any of the checks fails and the transaction has reached a failed state, then the recovery manager rolls back all its write operations on the database to bring the database back to its original state where it was prior to the execution of the transaction. Transactions in this state are called aborted. The database recovery module can select one of the two operations after a transaction aborts −
      • Re-start the transaction
      • Kill the transaction
    • Committed − If a transaction executes all its operations successfully, it is said to be committed. All its effects are now permanently established on the database system.
  • Hashing

    For a huge database structure, it can be almost next to impossible to search all the index values through all its level and then reach the destination data block to retrieve the desired data. Hashing is an effective technique to calculate the direct location of a data record on the disk without using index structure.

    Hashing uses hash functions with search keys as parameters to generate the address of a data record.

    Hash Organization

    • Bucket − A hash file stores data in bucket format. Bucket is considered a unit of storage. A bucket typically stores one complete disk block, which in turn can store one or more records.
    • Hash Function − A hash function, h, is a mapping function that maps all the set of search-keys K to the address where actual records are placed. It is a function from search keys to bucket addresses.

    Static Hashing

    In static hashing, when a search-key value is provided, the hash function always computes the same address. For example, if mod-4 hash function is used, then it shall generate only 5 values. The output address shall always be same for that function. The number of buckets provided remains unchanged at all times.

    Static Hashing

    Operation

    • Insertion − When a record is required to be entered using static hash, the hash function h computes the bucket address for search key K, where the record will be stored.Bucket address = h(K)
    • Search − When a record needs to be retrieved, the same hash function can be used to retrieve the address of the bucket where the data is stored.
    • Delete − This is simply a search followed by a deletion operation.

    Bucket Overflow

    The condition of bucket-overflow is known as collision. This is a fatal state for any static hash function. In this case, overflow chaining can be used.

    • Overflow Chaining − When buckets are full, a new bucket is allocated for the same hash result and is linked after the previous one. This mechanism is called Closed Hashing.
    Overflow chaining
    • Linear Probing − When a hash function generates an address at which data is already stored, the next free bucket is allocated to it. This mechanism is called Open Hashing.
    Linear Probing

    Dynamic Hashing

    The problem with static hashing is that it does not expand or shrink dynamically as the size of the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are added and removed dynamically and on-demand. Dynamic hashing is also known as extended hashing.

    Hash function, in dynamic hashing, is made to produce a large number of values and only a few are used initially.

    Dynamic Hashing

    Organization

    The prefix of an entire hash value is taken as a hash index. Only a portion of the hash value is used for computing bucket addresses. Every hash index has a depth value to signify how many bits are used for computing a hash function. These bits can address 2n buckets. When all these bits are consumed − that is, when all the buckets are full − then the depth value is increased linearly and twice the buckets are allocated.

    Operation

    • Querying − Look at the depth value of the hash index and use those bits to compute the bucket address.
    • Update − Perform a query as above and update the data.
    • Deletion − Perform a query to locate the desired data and delete the same.
    • Insertion − Compute the address of the bucket
      • If the bucket is already full.
        • Add more buckets.
        • Add additional bits to the hash value.
        • Re-compute the hash function.
      • Else
        • Add data to the bucket,
      • If all the buckets are full, perform the remedies of static hashing.

    Hashing is not favorable when the data is organized in some ordering and the queries require a range of data. When data is discrete and random, hash performs the best.

    Hashing algorithms have high complexity than indexing. All hash operations are done in constant time.

  • Dynamic Multilevel Indexing with B-Tree and B+ Tree

    Large databases require efficient methods for indexing. It is crucial that we maintain proper indexes to search records in large databases. A common challenge is to make sure the index structure remains balanced when new records are inserted or existing ones are deleted. For this purpose, there are different methods like single level indexing, multi-level indexing, and dynamic multilevel indexing.

    Multilevel indexing can be done using B-Trees and B+ Trees. These advanced data structures adjust themselves automatically, keeping the operations smooth and fast. Read this chapter to learn the fundamentals of dynamic multilevel indexing and understand how B-Trees and B+ Trees work.

    What is Dynamic Multilevel Indexing?

    Dynamic multilevel indexing helps in maintaining an efficient search structure. This is true even when the records in a database keep changing frequently. Unlike static indexing where we can update by rebuilding the index, dynamic indexing updates itself on the fly.

    The two most common structures used are B- Trees and B+ Trees. Both work as balanced tree structures. These trees keep the search times short by minimizing the number of levels. They handle insertions, deletions, and searches efficiently, even in large datasets.

    The Role of B- Trees in Dynamic Indexing

    B- Tree is a balanced search tree where records are stored within its nodes. Each node contains multiple key values and pointers to other nodes or records. The key idea is to keep the tree balanced by splitting and merging the nodes as records are inserted or deleted.

    How Does a B- Tree Work?

    Let’s understand how a B-Tree works −

    • Nodes and Keys − Each node can have several keys and pointers that form a multi-way search tree.
    • Balanced Structure − The tree is always balanced, which means every leaf node is at the same level.
    • Search Process − The search begins at the root and follows pointers based on key comparisons until the desired record is found.

    The following image depicts how a B-Tree looks like −

    Role of B- Trees in Dynamic Indexing

    Key Properties of B-Trees

    Given below are some of the important properties of B-Trees −

    • Every internal node can have up to “p – 1” keys and “p” pointers. Here, “p” is the order of the B-Tree.
    • Keys in each node are arranged in ascending order.
    • Each node must be at least half full, except for the root.
    • Leaf nodes are linked for easier traversal if needed.

    Example of a B-Tree

    Let’s see an example of a B-Tree for a database with order and fan-out −

    • Order (p) − 23 (maximum keys a node can hold)
    • Fan-out (fo) − 16 (average number of pointers in a node)

    We start with the root node that holds 15 key entries and 16 pointers. As new records are inserted, the tree grows as follows −

    • Level 0 (Root) − 1 node with 15 keys and 16 pointers
    • Level 1 − 16 nodes with 240 keys and 256 pointers
    • Level 2 − 256 nodes with 3840 keys and 4096 pointers
    • Level 3 (Leaf Level) − 4096 nodes holding 61,440 keys

    The tree can efficiently organize over 65,535 records and we can see that there are just three levels. It is this efficiency that reduces the search times to a great extent.

    B+ Trees: More Efficient than B-Tree

    A B+ Tree is a modified version of a B-Tree. B+ Trees are specifically designed for indexing. In a B+ Tree, all the data records are stored only at the leaf nodes and the internal nodes hold only keys and pointers. This design allows the internal nodes to hold more keys, making the structure shallower and more efficient.

    How Do B+ Trees Work

    In a B+ Tree,

    • Leaf Nodes − Contain records or pointers to records.
    • Internal Nodes − Contain only keys and pointers to lower-level nodes.
    • Linked Leaf Nodes − Leaf nodes are linked, which makes the sequential access easier.

    Key Properties of B+ Trees

    Listed below are some of the important properties of B+ Trees −

    • Every internal node can have up to p pointers and p-1 keys.
    • Leaf nodes hold actual data or pointers to data.
    • Leaf nodes are linked for easy traversal.
    • The tree stays balanced due to automatic splitting and merging during updates.

    Example of a B+ Tree

    Let us see the same example that we used for explaining B-Trees but this time, with B+ Tree logic −

    Assumptions −

    • Key size − 9 bytes
    • Pointer size − 7 bytes (for records), 6 bytes (for blocks)
    • Block size − 512 bytes

    Internal Nodes − Maximum of 34 keys and 35 pointers (calculated based on available space).

    Leaf Nodes − Maximum of 31 data entries (keys and data pointers).

    • Root Node − 1 node with 22 keys and 23 pointers.
    • Level 1 − 23 nodes holding 506 keys and 529 pointers.
    • Level 2 − 529 nodes holding 11,638 keys and 12,167 pointers.
    • Leaf Level − 12,167 nodes holding 255,507 data pointers.

    This structure is useful and it can handle over 255,507 records efficiently with just three levels. This is why B+ Trees are commonly used in database indexing systems.

    Advantages of Dynamic Multilevel Indexing

    Dynamic multilevel indexing offers several advantages as given below −

    • Automatic Balancing − Trees adjust themselves during insertions and deletions.
    • Efficient Searches − Shallow trees mean fewer levels to search through.
    • Faster Updates − Data changes are quick due to rebalancing logic.
    • Scalability − B-Trees and B+ Trees handle massive datasets without performance drops.

    Real-world Applications of B-Trees and B+ Trees

    B-Trees and B+ Trees are widely used in −

    • DBMS − For indexing large tables.
    • File Systems − To manage files in storage systems.
    • Search Engines − To keep search indexes optimized.
    • Operating Systems − For directory management.

    Difference between B-Trees and B+ Trees

    The following table highlights the major differences between B-Trees and B+ Trees −

    FeatureB- TreeB+ Tree
    Data StorageIn all nodesOnly in leaf nodes
    Data RetrievalSlower for range queriesFaster due to linked leaf nodes
    Tree DepthDeeperShallower
    Use CasesGeneral indexingIndexing with range queries
  • Multi-level Indexing

    Data retrieval is the process in database management systems where we need speed and efficiency. We implement the concept of indexing in order to reduce the search time and facilitate faster data retrieval. As databases grow in size, efficient indexing techniques become our primary option to reduce search times. Multi-level indexing is one such indexing technique that is designed to manage large datasets with minimal disk access. Read this chapter to get a good understanding of what multi-level indexing means, what is its structure, and how it works.

    What is Multi-level Indexing in DBMS?

    In database systems, indexing improves the data retrieval speed by organizing the records in a way that allows faster searches. A single-level index makes a list of key values pointing to corresponding records. This process can be used with a binary search. However, when we are working with massive datasets, a single-level index becomes inefficient due to its size. This is where multi-level indexing is needed for efficiency.

    Why Do We Use Multi-level Indexing?

    The main reason for using multi-level indexing is to reduce the number of blocks accessed during a search. We know we can apply the binary search where the search space is divided in half at each step. Binary search requires approximately log2 (bi) block accesses for an index with bi blocks. With multi-level indexing, we can improve the search speed by dividing the search space into larger segments. It will reduce the search time exponentially.

    For example, instead of cutting the search space in half, we can use multi-level indexing to split it further. This reduces the search space by a factor equal to the fan-out (f0) value, which denotes the number of entries that can fit into a single block. When the fan-out value is much larger than 2, the search process becomes significantly faster.

    Structure of Multi-level Indexing

    To understand the concept of multi-level indexing, we must know its structures. It is organized into different levels, each representing a progressively smaller index until a manageable size is reached.

    The structure consists of −

    • First Level (Base Level) − This level stores the main index entries. This is also called the base index. It contains unique key values and pointers to corresponding records.
    • Second Level − This level acts as a primary index for the first level. It stores pointers to the blocks of the first level.
    • Higher Levels − If the second level becomes too large to fit in a single block, then additional levels are created. It reduces the index size further.
    Structure of Multi-level Indexing

    How Does Multi-level Indexing Work?

    Each level of the multi-level index reduces the number of entries in the previous level. This is done by the fan-out value (f0). The process continues until the final level fits into a single block, referred to as the top level.

    The number of levels (t) required is calculated as −

    t=[logf0(r1)]

    Where, r1 is the number of entries in the first level and f0 is the fan-out value.

    From this, it is evident that searching involves retrieving a block from each level and finally accessing the data block. It results in a total of t + 1 block accesses.

    Example of Multi-level Indexing

    Let us take a detailed example to understand multi-level indexing in action.

    The given data is as follows −

    • Blocking factor (bfri) − 68 entries per block (also called the fan-out, fo).
    • First-level blocks (b1) − 442 blocks.

    Step 1: Calculate the Second Level

    We calculate the number of blocks needed at the second level −

    b2=[b1f0]=[44268]=7

    The second level has seven blocks.

    Step 2: Calculate the Third Level

    Similarly, we can calculate the number of blocks needed at the third level −

    b3=[b2f0]=[768]=1

    Since the third level fits into one block, it becomes the top level of the index. This is making the total number of levels t = 3.

    Step 3: Record Search Example

    After making the index, we must search from it. To search for a record using this multi-level index, we need to access −

    • One block from each level − Three levels in total.
    • One data block from the file − The block containing the record.

    Total Block Accesses is: T + 1 = 3 + 1 = 4. This is a significant improvement over a single-level index. There are 10 block accesses would have been needed using a binary search.

    Types of Multi-level Indexing

    Depending on the type of records and access patterns, multi-level indexing can be applied in various forms −

    • Primary Index − Built on a sorted key field, which makes it sparse (only one index entry per block).
    • Clustering Index − Built on non-key fields where multiple records share the same value.
    • Secondary Index − Built on unsorted fields, requiring more maintenance but offering flexibility.

    Indexed Sequential Access Method (ISAM)

    Indexed Sequential Access Method (ISAM) is a practical implementation of multi-level indexing. ISAM is commonly used in older IBM systems. It uses a two-level index −

    • Cylinder Index − Points to track-level blocks.
    • Track Index − Points to specific tracks in the cylinder.

    Data insertion is managed using overflow files. This is periodically merged with the main file during reorganization.

    Advantages of Multi-level Indexing

    Multi-level indexing offers the following benefits −

    • Faster Searches − Reduces the number of disk accesses.
    • Scalability − Handles large datasets efficiently.
    • Supports Different Index Types − Works with primary, clustering, and secondary indexes.
    • Balanced Access − Ensures near-uniform access times.

    One of the major challenges in managing multi-level indexes is during insertions or deletions. It can be complex, as all index levels must be updated. This process becomes problematic when frequent updates occur.

    The solution could be dynamic indexing. To address this problem, modern databases use dynamic multi-level indexes such as B − trees and B+ − trees. These structures balance the index by reorganizing the nodes automatically during insertions and deletions.

  • Indexing

    We know that data is stored in the form of records. Every record has a key field, which helps it to be recognized uniquely.

    Indexing is a data structure technique to efficiently retrieve records from the database files based on some attributes on which the indexing has been done. Indexing in database systems is similar to what we see in books.

    Indexing is defined based on its indexing attributes. Indexing can be of the following types −

    • Primary Index − Primary index is defined on an ordered data file. The data file is ordered on a key field. The key field is generally the primary key of the relation.
    • Secondary Index − Secondary index may be generated from a field which is a candidate key and has a unique value in every record, or a non-key with duplicate values.
    • Clustering Index − Clustering index is defined on an ordered data file. The data file is ordered on a non-key field.

    Ordered Indexing is of two types −

    • Dense Index
    • Sparse Index

    Dense Index

    In dense index, there is an index record for every search key value in the database. This makes searching faster but requires more space to store index records itself. Index records contain search key value and a pointer to the actual record on the disk.

    Dense Index

    Sparse Index

    In sparse index, index records are not created for every search key. An index record here contains a search key and an actual pointer to the data on the disk. To search a record, we first proceed by index record and reach at the actual location of the data. If the data we are looking for is not where we directly reach by following the index, then the system starts sequential search until the desired data is found.

    Sparse Index

    Multilevel Index

    Index records comprise search-key values and data pointers. Multilevel index is stored on the disk along with the actual database files. As the size of the database grows, so does the size of the indices. There is an immense need to keep the index records in the main memory so as to speed up the search operations. If single-level index is used, then a large size index cannot be kept in memory which leads to multiple disk accesses.

    Multi-level Index

    Multi-level Index helps in breaking down the index into several smaller indices in order to make the outermost level so small that it can be saved in a single disk block, which can easily be accommodated anywhere in the main memory.

    B+ Tree

    A B+ tree is a balanced binary search tree that follows a multi-level index format. The leaf nodes of a B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at the same height, thus balanced. Additionally, the leaf nodes are linked using a link list; therefore, a B+ tree can support random access as well as sequential access.

    Structure of B+ Tree

    Every leaf node is at equal distance from the root node. A B+ tree is of the order n where n is fixed for every B+ tree.

    B+ tree

    Internal nodes −

    • Internal (non-leaf) nodes contain at least ⌈n/2⌉ pointers, except the root node.
    • At most, an internal node can contain n pointers.

    Leaf nodes −

    • Leaf nodes contain at least ⌈n/2⌉ record pointers and ⌈n/2⌉ key values.
    • At most, a leaf node can contain n record pointers and n key values.
    • Every leaf node contains one block pointer P to point to next leaf node and forms a linked list.

    B+ Tree Insertion

    • B+ trees are filled from bottom and each entry is done at the leaf node.
    • If a leaf node overflows −
      • Split node into two parts.
      • Partition at i = ⌊(m+1)/2⌋.
      • First i entries are stored in one node.
      • Rest of the entries (i+1 onwards) are moved to a new node.
      • ith key is duplicated at the parent of the leaf.
    • If a non-leaf node overflows −
      • Split node into two parts.
      • Partition the node at i = ⌈(m+1)/2.
      • Entries up to i are kept in one node.
      • Rest of the entries are moved to a new node.

    B+ Tree Deletion

    • B+ tree entries are deleted at the leaf nodes.
    • The target entry is searched and deleted.
      • If it is an internal node, delete and replace with the entry from the left position.
    • After deletion, underflow is tested,
      • If underflow occurs, distribute the entries from the nodes left to it.
    • If distribution is not possible from left, then
      • Distribute from the nodes right to it.
    • If distribution is not possible from left or from right, then
      • Merge the node with left and right to it.
  • Ordered and Unordered Records

    In database management, there are plenty of different techniques to store files for easy access and optimized use-cases. Two of the most common types of file organizations are unordered records (heap files) and ordered records (sorted files), each with their own strengths, weaknesses, and use cases.

    Read this chapter to learn in detail the concepts of unordered and ordered files, explore their differences, and see real-world examples of how they are used.

    Unordered Records: The Heap File

    Unordered records or heap files are nothing but a dump. Heap files are the simplest form of file organization. The records are stored in the order they are inserted. When a new record is added, it is placed at the end of the file. This process makes the insertion quick and straightforward.

    How Do Heap Files Work?

    Let’s see how heap files work −

    • When a record is inserted, the last disk block of the file is loaded into memory. Here the new record is appended. The block is written back to the disk.
    • The address of the last block is maintained in the file header. It gives quick access for new inserts.

    Advantages of Using Heap Files

    Following are the advantages of using heap files −

    • Quick Insertions − Adding a record is very efficient, since no sorting or restructuring is applied.
    • Simplicity − The structure is straightforward and it requires minimal overhead.

    Limitations of Heap Files

    Heap files are simple and easy to maintain, but they have their own limitations −

    • Linear Search for Retrieval − To find a specific record, we need to rely on a linear search through all blocks. On an average, half the blocks must be searched, which makes the retrieval process slow for large files.
    • Wasted Space − Deleting a record leaves unused space in the block unless the file is reorganized periodically.
    • Inefficient Updates − Updates such as modifying some variable-length records, deleting an old record and inserting a new one can further fragment the file.

    Handling Deletions in Heap Files

    One way to handle deletions in heap files is by using a deletion marker. Here, each record has a small flag (a byte or a bit) to indicate whether it is active or deleted. When a record is marked as deleted, it remains in the file but is ignored during searches. Periodically, the file is reorganized to reclaim space and remove deleted records.

    Ordered Records: The Sorted File

    Ordered records are sorted and organized based on the values of a specific field which is known as the ordering field. If this field is unique for each record, it is called the ordering key. This type of file organization makes searches and sequential access faster, however it comes with its challenges for insertions and deletions.

    How Do Ordered Records Work?

    Records are stored in ascending or descending order of the ordering field. For example, an employee database might be sorted alphabetically by employee names. The records are placed in blocks, and the blocks are organized contiguously on the disk.

    Advantages of Ordered Records

    Given below are some of the advantages of using ordered records −

    • Efficient Searches − We can use binary search or other fast search techniques, as the system can quickly locate a record based on the ordering field. For example, if the file has 100 blocks, a binary search requires only about 7 block accesses on average.
    • Easy Sequential Access − It’s easy to access ordered records since the blocks are stored contiguously.
    • Sorting for Queries − Ordered records simplify certain queries, such as finding all employees whose names start with a specific letter.

    Limitations of Ordered Records

    While retrieval of data is efficient in ordered records, updates and modifications are not so easy. Given below are some other notable limitations of using ordered records −

    • Insertion Complexity − To insert a new record, the correct position in the file must be located. It requires shifting many records to make space, which is time-consuming for large files.
    • Costly Deletions − Deleting a record leaves a gap. It requires reorganization to maintain order. Using a deletion marker can delay reorganization but does not eliminate the overhead.
    • Overflow Files − To address insertion delays, we can add new records and they are often stored in a temporary unordered file (overflow file). Periodically, the overflow file is merged with the main file. This is a resource-intensive process.

    Example: Binary Search in a Sorted File

    Imagine an ordered file of employee records, sorted by names. The file has 100 blocks. We want to find the employee named Amit Mondal. Using a binary search, we can do the following −

    • The system starts with the middle block, say block 50, and checks if Amit’s name falls before or after the names in that block.
    • If Amit’s name is alphabetically before, the search narrows to blocks 1–49; otherwise, it checks blocks 51–100.
    • This process continues, halving the range with each step, until the record is found or all possibilities are exhausted.
    • This method requires only log2(100) ≈ 7 block accesses, which is much faster than the linear search required for heap files.

    Differences between Unordered and Ordered Files

    The following table highlights the key differences between ordered and unordered files −

    AspectUnordered Records (Heap Files)Ordered Records (Sorted Files)
    InsertionFaster; new records are added to the end of the fileSlower; inserting requires finding the correct position
    SearchLinear search; slow for large filesBinary search; much faster for ordering field
    DeletionLeaves gaps; periodic reorganization neededGaps also require reorganization, or use of overflow files
    ModificationSimple for fixed-length recordsComplex if ordering field changes
    Sequential AccessInefficient; requires sorting firstVery efficient due to physical ordering

    Overflow Files

    To speed up insertions, new records are stored in an overflow file. For example −

    • The main file contains records sorted by names; the overflow file, on the other hand, holds unsorted new records.
    • Periodically, the overflow file is merged with the main file, which is needed to ensure the overall file remains ordered.

    External Sorting

    For very large files, sorting them entirely in memory is impractical. Now external sorting techniques divide the file into smaller chunks, sort each chunk, and then merge them together. This process is quite useful and the file remains ordered without overwhelming system resources.

    When to Use Unordered vs. Ordered Files

    Unordered files are suitable for applications where insertions are frequent, and searches are infrequent or involve scanning the entire file (e.g., collecting log data for later analysis).

    Ordered files are ideal when efficient searches based on a specific field are needed, or when sequential access is common (e.g., payroll processing by employee names).

  • Placing File Records on Disk

    Storing data means more than just saving it somewhere. It is about organizing the data efficiently such that the stored data can retrieved and used easily. In DBMS, it means figuring out how to place file records on a disk. Although it may seem like a simple task, it involves some clever techniques to handle different types of records. These techniques are used to save space, and make the database faster.

    Read this chapter to learn how file records are placed on a disk. We will have specific examples to understand the methods used to store both fixed-length and variable-length records.

    Records and File Types

    A record in a DBMS is a collection of data values. We often tie them to a specific entity. Think of it like a detailed entry in a contact list. For instance, an EMPLOYEE record may have different fields such as name, employee ID, department, and salary. Each of these fields holds a piece of information about a particular employee.

    Placing file records on disk enables better data indexing and searching capabilities. With an organized structure, it becomes easier to locate specific files or retrieve relevant information without wasting time and resources.

    Techniques for Placing File Records on Disk

    There are several techniques for placing file records on disk, including −

    • Fixed-Length Records − Every record is the same size. Each field has a predetermined length, which makes it easier to locate data because the position of each field is consistent.
    • Variable-Length Records − Here the records can differ in size. It happens when some fields hold varying amounts of data. For example, a name that might be 5 characters long for one person and 20 for another.

    Let’s now discuss each of these techniques in detail.

    Fixed-Length Records on Disk

    Fixed-length records are straightforward to store because of their uniform size. The following example shows how it works −

    Example of Fixed-Length Records

    Suppose we have a fixed-length record for employees. It contains the following fields −

    • Name − 30 characters
    • Social Security Number (SSN) − 9 characters
    • Salary − 4 bytes (integer)
    • Job Code − 4 bytes (integer)
    • Department − 20 characters

    Each field’s length is fixed. If we add them up, the total record size will be (30 + 9 + 4 + 4 + 20) = 67 bytes. It is this uniformity that makes it simple to calculate where a specific field is within a record. For instance, the salary field starts at byte 39 (the first 30 bytes are for the name, followed by 9 bytes for SSN).

    Limitations of Fixed-Length Records

    Fixed-length records are easy to handle, but they can waste space. For example, if a department name is only 5 characters long, the remaining 15 bytes are unused. For thousands of records, this wasted space adds up.

    Another issue arises with optional fields. Sometimes some records do not have values for certain fields. Space is reserved for those fields as well. Let us say not every employee has a job code. Even so, 4 bytes will be reserved for that field in every record.

    Variable-Length Records on Disk

    Variable-length records save space by allowing fields to take up only as much space as they need. But, how do we manage records when their sizes are unpredictable?

    Using Separators − We can use separator characters like pipe (|) to separate the fields in a record. So, a record might look like this:

    Smith|123456789|45000|Computer Department|

    Separators make it clear where each field ends, even if the field sizes vary. This format works well but requires extra processing to find the data.

    Storing Field Lengths − Another method is to store the length of each field at the beginning of the record. For example,

    30 Smith	9123456789645000

    Here, the numbers before each field indicate its size. This system reads the length, then grabs the corresponding number of bytes.

    Practical Example: Handling Optional Fields

    Let us say our EMPLOYEE records include an optional field for a middle name. For some employees, this field may be empty. With variable-length records, we can save space by only including the field when it has a value.

    In a file with such records −

    • Record A − Smith|123456789|45000|Computer Department|
    • Record B − Jones|987654321|52000|HR|Michael

    In Record A, the middle name is skipped. This flexibility makes variable-length records more space-efficient, however it complicates how records are processed.

    Mixed Records and Real-Life Applications

    Sometimes, files contain a mix of record types. If we consider a university database with two types of records −

    • Student Records − Fields for name, ID, courses, and grades.
    • Course Records − Fields for course name, instructor, and schedule.

    If related student and course records are stored together, their sizes will vary. This is common in real-world databases where different entities need to be linked efficiently.

    Example of Mixed Records

    In one block, we might have −

    • Student Record − John Doe|12345|Math: A, History: B|
    • Course Record − Math|Prof. Smith|MWF 10:00 AM|

    The database system keeps track of the record type and adjusts accordingly.

    Organization of Records on Disk

    When records are placed on a disk, they are grouped into blocks. Blocks are small chunks of data that the disk reads and writes. The way the records are packed into blocks affects the performance.

    • Unspanned Records − In this method, a record must fit entirely within one block. If a block has extra space left after storing several records, that space remains unused. This approach is simple but wastes some disk space.
    • Spanned Records − For larger records, the spanned approach allows a single record to stretch across multiple blocks. Here, the pointer at the end of one block tells the system where the rest of the record is stored. This method is more space-efficient but slightly more complex to handle in real life scenario.

    Example: Spanned vs. Unspanned

    Take a look at the following examples of spanned and unspanned records −

    • Unspanned Block − Record 1 | Record 2 | Record 3 | Empty Space
    • Spanned Block − Record 1 | Part of Record 2 (next block has the rest of Record 2)

    Optimizing Record Placement

    We need to optimize the placement of records to best utilize the disk space and improve the speed. For example, if an employee’s records are frequently accessed alongside their department details, these can be placed on the same block.

    There is another type of optimization called indexing. By creating an index that points to the location of records, the system reduces the time it takes to find specific data.

  • Buffers and Disk Blocks

    Efficient data handling techniques in DBMS ensure that operations like reading and writing data are fast and reliable. Buffers and disk blocks play an important role in efficient data handling by bridging the gap between the slower secondary storage devices and the faster main memory. Read this chapter to learn the basic concepts of buffering and disk blocks, and how they work together in data base management.

    Buffering in DBMS

    A buffer is a temporary holding area in memory where the data is placed before it is processed or written to the disk. Buffering is a technique used to manage the transfer of data between the slower secondary storage and the faster main memory. This process increases the processing speed by capitalizing on the difference in speed between these storage layers.

    Importance of Buffering

    As an analogy of buffering, imagine you are pouring water from a large jug into a small glass. You cannot pour it all at once. The buffer works in a similar way and ensures that the data flows smoothly without overwhelming the system.

    Data transfer in a DBMS is not instantaneous. Buffering helps by allowing the system to overlap data processing and transfer operations. While one buffer is being filled with new data from the disk, another buffer can be processed by the CPU. This simple method significantly boosts the efficiency of the system.

    What is Double Buffering?

    In double buffering, two buffers are used alternately. While one is being filled, the other is being processed. This process shows that the CPU is never idle, waiting for data.

    Let us see an example to make it more relatable. Consider a conveyor belt in a factory. While one worker loads the goods onto the belt, another worker packs them. They work simultaneously to keep the process smoothly running. Similarly, double buffering allows the CPU and disk I/O operations to run in parallel.

    In the context of DBMS −

    • A disk block is read from the secondary storage and placed in buffer A.
    • While buffer A is being processed by the CPU, buffer B is filled with the next block of data.
    • The process continues alternately, minimizing the time the CPU spends waiting for data.

    This method is particularly useful for reading a continuous stream of data blocks from the disk.

    Concepts of Disk Blocks

    The data in secondary storage is stored in units known as blocks. A block is the smallest unit of data transfer between the disk and the memory. Each block can hold multiple records. The size of a block is typically fixed during disk formatting.

    Instead of transferring one record at a time, we use blocks of data that group several records together. This process reduces the number of I/O operations and thereby improves the overall data transfer efficiency. It is like buying groceries in bulk rather than making multiple trips to the store for individual items. Buying in bulk saves both time and effort.

    Buffering and Blocks in Action

    When blocks of data are transferred from the disk to the main memory, they are placed in buffers for processing. To understand this, let us see how we can use two buffers, A and B −

    • Data from the disk is read into buffer A.
    • While the CPU processes data in A, the next block is read into buffer B.
    • As soon as the CPU finishes processing A, it moves to B, and the next block is loaded into A.

    Let’s elaborate this overlapping operation with a practical example. Suppose the time required to process a block in memory is less than the time needed to read the next block from the disk. By using two buffers −

    • The CPU can start processing data as soon as the first block is transferred to memory.
    • Next, the disk I/O system prepares the next block in the second buffer.

    This process avoids delays, because the CPU does not have to wait for the next block to be read. This technique keeps both the CPU and disk busy, making the process more efficient.

    Advantages of Buffering

    Buffering in DBMS offers several benefits, including the following −

    • Reduced Waiting Time − In overlapping operations, buffering minimizes the time the CPU spends waiting for data.
    • Continuous Data Flow − Double buffering allows data to be processed and transferred seamlessly.
    • Improved Performance − The system can handle larger workloads without slowing down. Buffering also ensures the tasks are distributed more effectively.

    Limitations of Buffering

    Buffering also has its limitations, which are listed below −

    • Complexity − Implementing buffering mechanisms like double buffering requires careful management to avoid errors.
    • Memory Usage − Buffers take up space in the main memory, which could be a limitation for systems with restricted memory capacity.
    • Varied Workloads − In cases where data access patterns are unpredictable, buffering might not always deliver optimal performance.

    Real-World Applications of Buffering

    Buffers and blocks play an important role in applications where large volumes of data need to be processed efficiently. For example −

    • Online Databases − Systems like e-commerce platforms rely on buffering to handle millions of user queries and transactions without delays.
    • Data Analytics − Blocks and buffering techniques enable us to process huge datasets quickly.
    • Backup Operations − During database backups, buffering ensures that the data is written to storage devices in an organized manner.

    Buffering and blocks are also used in video streaming services, where buffering gives uninterrupted playback experience by loading data in advance.