Serializability in DBMS

What does DBMS serializability mean?
In database management systems, serializability is a feature that describes how various processes use shared data. If there is no overlap in execution, a system is serialisable if the outcome is the same as if the operations were carried out in a specific sequential order. To implement a database management system (DBMS), data must be locked to prevent access by other processes while it is being read or updated.

Database systems that have earned the MongoDB Developer certification employ different locking levels to provide concurrent processing while preserving data integrity. Serialisable, the most restrictive level, uses stringent two-phase locking (2PL). Before any operations can be carried out in the first phase, locks are obtained on all data objects. Only then are locks freed.following the completion of the transaction. This ensures that every transaction sees a consistent view of the database and that no operations clash.

Although 2PL offers robust assurances, lock contention and the higher overhead of obtaining and releasing locks may result in reduced performance. In order to improve performance, systems frequently loosen the serializability limitation. Allowing operations that don't clash to run simultaneously is the most popular strategy. By enabling activities to be interleaved, this improves efficiency while still guaranteeing that the end result is identical to some sequential execution.

All things considered, serializability is a system characteristic that characterises how various processes use shared data.This can be achieved in a database management system by locking data while it is being read or written to prevent access by other processes. By permitting actions that do not conflict to run concurrently, serializability improves efficiency while ensuring that the end result is comparable to some sequential execution.

What is a Serializable Schedule?

In a database management system (DBMS), a serializable schedule is a sequence of database actions (read and write operations) that does not violate the serializability property. This property ensures that each transaction appears to execute atomically and is isolated from other transactions' effects. In order for serializability of schedules in DBMS, it must be equivalent to some serial schedule of the same transactions. 

Ensuring data consistency in databases often involves checking if a transaction schedule is serializable. Two main algorithms help with this:

  1. Conflict Serializability:

    This checks for clashes where transactions access the same data but in opposing ways (e.g., one reads while another writes). No conflicts guarantee serializability, while conflicts might or might not.

  2. View Serializability:

    This focuses on potential dependencies between transactions, where each relies on the other's output for accuracy. No dependencies ensure serializability, but their presence might or might not affect it.

Both algorithms assess potential issues in transaction scheduling to maintain data integrity.

Serializability can also be checked using the precedence graph algorithm, which checks for potential cycles between transactions' precedence relationships. A precedence relationship exists when one transaction must precede another transaction for the schedule to be valid. If there are no cycles, then the serializability of schedules in DBMS is guaranteed. However, if there are cycles, the schedule may or may not be serializable.

Types of Serializability in DBMS

In a database management system (DBMS), serializability requires that transactions appear to happen in a particular order, even if they execute concurrently. Transactions that are not serializable may produce incorrect results. There are different types of serializability in DBMS with different advantages and disadvantages. Here, we'll look closely at two of the most common types: conflict serializability and view serializability in DBMS.

 

1. Conflict Serializability

Conflict serializability is a type of serializability in which conflicting operations on the same data items are executed in an order that preserves database consistency. Each transaction is assigned a unique number, and the operations within each transaction are executed in order based on that number. This ensures that no two conflicting operations are executed concurrently. For example, consider a database with two tables: Customers and Orders. A customer can have multiple orders, but each order can only be associated with one customer.

Key conditions for conflict serializability.

  • Both operations belong to different transactions
  • Both operations are on the same data item
  • At least one of the two operations is a write operation

If two transactions were to execute concurrently, one adding an order for customer A and the other adding an order for customer B, conflict serializability would ensure that the transaction adding the order for customer A would finish before the transaction adding the order for customer B. This would prevent any inconsistency in the database, such as an order being associated with the wrong customer.

 

2. View Serializability

View serializability is a type of serializability in which each transaction produces results that are equivalent to some well-defined sequential execution of all transactions in the system. Unlike conflict serializability, which focuses on preventing inconsistencies within the database, view serializability in DBMS focuses on providing users with consistent views of the database.

In order to better understand view serialization in a database management system, it is important to consider schedules S1 and S2. These schedules are created with two transactions in mind, T1 and T2. In order for these schedules to be viewed as equivalent, the following three conditions must be met.

The first condition is that both schedules must have the same set of committed transactions. This simply means that both schedules S1 and S2, cannot have different committed transactions. If one schedule has a committed transaction that the other does not, then the schedules are not viewed as equivalent.

The second condition is that both schedules cannot have different numbers of read/write operations on the same data item. In other words, if schedule S1 has two write operations on data item A while schedule S2 only has one write operation on data item A, the schedules are not viewed as equivalent. The number of read operations can differ between the schedules as long as the number of write operations is equal.

The last and final condition is that both schedules cannot have conflicting orders of execution for the same data items. For example, suppose in schedule S1 transaction T1 writes to data item A before transaction T2 does while in schedule S2 transaction T2 writes to data item A before transaction T1 does. In that case, the schedules are not viewed as equivalent. This conflicts with condition 2, which states that both schedules must have the same number of write operations on each data item. However, if all three conditions are met, the schedules can be considered equivalent.

 

Testing of Serializability in DBMS with Examples

Serializability is the property of a schedule whereby each transaction appears to execute atomically and independently, even though they actually execute concurrently. In other words, when several transactions are executed concurrently, they should appear as if they were executed either sequentially or not.

Let's consider an example to understand better how serializability works in a database management system (DBMS). Suppose two users, Alice and Bob, are each executing two transactions: T1 and T2 for Alice and T3 and T4 for Bob. Further, suppose that T1 reads and writes data item X, T2 reads data item Y. T3 reads and writes data item Z, and T4 reads data item W.  

Now let's say that the schedule of these transactions is as follows: 

  • `T1: Read X → Write X→  Read Y → Write Y` 
  • `T2: Read Y  → Write Y` 
  • `T3: Read Z  → Write Z→  Read W → Write W` 
  • `T4: Read W  → Write W` 

First, let's see why this schedule is not serial.

In order for a schedule to be considered serializable, it must first satisfy the conflict serializability property. In our example schedule above, notice that Transaction 1 (T1) and Transaction 2 (T2) read data item Y before either writing it. This causes a conflict between T1 and T2 because they are both trying to read and write the same data item concurrently. Therefore, the given schedule does not conflict with serializability. 

However, there is another type of serializability called view serializability which our example does satisfy. View serializability requires that if two transactions cannot see each other's updates (i.e., one transaction cannot see the effects of another concurrent transaction), the schedule is considered to view serializable. In our example, Transaction 2 (T2) cannot see any updates made by Transaction 4 (T4) because they do not share common data items. Therefore, the schedule is viewed as serializable.

It's important to note that conflict serializability is a stronger property than view serializability because it requires that all potential conflicts be resolved before any updates are made (i.e., each transaction must either read or write each data item before any other transaction can write it). View serializability only requires that if two transactions cannot see each other's updates, then the schedule is view serializable & it doesn't matter whether or not there are potential conflicts between them.

All in all, both properties are necessary for ensuring correctness in concurrent transactions in a database management system.

Benefits of Serializability in DBMS

Now that we understand the serializability in DBMS let's look at some benefits of Serializability in DBMS.  

  1. Predictable Executions:

    Since all threads are executed one at a time, there are no surprises. All variables are updated as expected, and no data is lost or corrupted.  

  2. Easier to Reason about & Debug:

    As each thread is executed alone, it is easier to reason about what each thread is doing and why. This can make debugging much easier since you don't have to worry about concurrency issues.  

  3. Reduced Costs:

    Serialization in DBMS can help reduce hardware costs by allowing fewer resources to be used for a given computation (e.g., only one CPU instead of two). Additionally, it can help reduce software development costs by making it easier to reason about code and reducing the need for extensive testing with multiple threads running concurrently.  

  4. Increased Performance:

    In some cases, serializable executions can perform better than their non-serializable counterparts since they allow the developer to optimize their code for performance.