University of London / MSc Computer Science: Data management(後半)

University of London / MSc Computer Science: Data management(後半)

October 2, 2023

ロンドン大学で MSc Computer Science: Data management モジュールを履修中。

講義内容に関して記録した個人的なスタディノートです。

全 12 週のうち 6〜12 週目の内容を記録します。(6 週目開始:2023 年 8 月 14 日 / 12 週目終了:2023 年 10 月 2 日)

Week 6: Host Language support in SQL and Transaction Management 1 – Recovery and Logging #

Host Language support for DBMS #

Host Language support for SQL principles #

Why host language support

  • SQL constructs seen so far do not offer the full power of a programming language such as Python or Java.
  • DBMS vendors have extended SQL with programming constructs
    • e.g. T-SQL in SWL Server (Microsoft) and PL/SQL in Oracle.
  • These are proprietary, non-standard extensions.

Alternative: host language support

  • Use SQL from within a conventional programming language.
  • For example employ standard SQL application programming interfaces.
    • Python DB-API(python), JDBC(java), Perl:DB(perl), PDO(PHP).
  • The principle of these APIs is to provide a common interface between our application and any SQL database.

Host Language support with Python #

-- Mechanisms --

Python source code -> Python DB-API -> DB Connector python driver -> Database

Example Code for insert

import mysql.connector

dbconfig = { 'host': '127.0.0.1',
             'user': 'root',
             'password': 'codio',
             'database': 'codioDB', }

conn = mysql.connector.connect(**dbconfig)

# Create a cursor

cursor = conn.cursor()

# Create the prameterised SQL query

_SQL = """insert into log
                 (phrase, letters, ip, browser_string, results)
                 values
                 (%s, %s, %s, %s, %s)"""

# Execute and commit
cursor.execute(_SQL, (req.form['phrase'],
                      req.form['letters'],
                      req.remote_addr,
                      req.user_agent.browser,
                      res, ))

conn.commit()

# Tidy up
cursor.close()

conn.close()

Example Code for select

# ...

query1 = "SELECT * FROM SUPPLY WHERE PID = 'P1' ORDER BY QUANTITY"
cursor.execute(query1)

for x in cur:
  print(x)



query2 = "SELECT * FROM SUPPLY WHERE PID = %s ORDER BY QUANTITY"
pid = ('P1')
cursor.execute(query2, pid)

while True:
  row = cursor.fetchone()
  if row is None:
    break
  print(row)

# ...

Transaction Management 1 #

Transaction Management principles and ACID properties #

Database transactions

A transaction is a unit of work that must either succeed in its entirety or not at all.

ACID

The transaction manager must ensure that the ACID properties of transactions are preserved.

  • A: Atomicity = either all changes made by a transaction succeed or none do
  • C: Consistency = a transaction must preserve the consistency of a database with no violation of integrity constraints
  • I: Isolation = even though transaction are executed concurrently, it must appear to users that transactions are being executed one at a time
  • D: Durability = if a transaction has been committed, its changes must persist even if there is a subsequent system failure

Transaction stages

The start of a transaction is marked either by:

  • any executable SQL statement encountered when a transaction is not already in progress or
  • a particular DML statement explicitly used to begin a transaction (e.g. SET TRANSACTION)

The end of a transaction is marked either by:

  • The COMMIT statement which is typically used to mark the successful completion of a transaction.
  • The ROLLBACK statement which is typically used to mark the unsuccessful completion (failure) of a transaction.

Transaction Management logging and recovery #

Recovery: Rollback and roll forward processing

A DBMS must be able to recover from different types fo failure.

There are three types of failures.

  1. Failure of an individual transaction.
  2. Failure of all transactions active at the time of a general system failure, e.g. a power failure
  3. Corruption of part of the stored database, e.g. disk head has crashed

To deal with “1. Failure of an individual transaction.”:

  • Rollback
    • To deal with the failure of an individual transaction, the updates made to the database by the failed transaction must be undone or rolled back.
    • A DBMS maintains a sequential “log file” of “before image” records for current transactions.
    • The database may be restored by processing the log file in reverse order.

DBMS typically maintains a sequential log file, also called a recovery file or “journal” of before image records for current transactions. A before image record will be written sequentially to the log file for each update made to the database. A before image contains enough information to enable a record in the database that has been updated to be restored to its state before the update. The database may be restored by processing in reverse order the before images relating to the transaction which has failed. Processing in reverse order ensures that a record updated more than once during a transaction is correctly restored to its original state at the beginning of the transaction.

To deal with “2. Failure of all transactions active at the time of a general system failure.”:

  • Roll forward
    • To deal with a general system failure, all uncompleted transactions must be rolled back when the system is brought up again.
    • Extra concern: a modified database or log record can be stored in main memory but not yet written to physical storage.
    • Loss of the in-memory cache leads to lost transaction updates in the database.
    • Log file of “after image” records.

With a system failure, there is an additional issue to consider. The writing of a record to the database or a log record to the log file is not an atomic operation. A cache of buffers will be maintained in main memory relating to the database with further buffers relating to the log file. The writing of a record will result initially in a write to a main memory buffer in the cache. Only subsequently will the buffer itself, and hence the written database or log record, be physically written to the database or log file in disk. Hence, the writing of a record is asynchronous. A program will not wait for an updated record to be physically written to the database before executing further statements. Hence, with a system failure and the loss of the cache of main memory buffers,updated database records for a transaction will be lost if they are at that point still in the cache. The buffers having not yet been written out to the stored database on disk. If such updates belong to a committed transaction,the database must nevertheless reflect those updates. To ensure this happens, a transaction which is committed is redone or rolled forward at a system restart. To support the redo or roll forward of a transaction, a DBMS typically maintains a sequential log file of after-image records. An after-image record will be written sequentially to the log file for each updates made to the database. The after image contains enough information to enable a database record that has been updated to be restored to its state after update. The database may be restored after a general system failure to correctly reflect the updates of committed transactions by processing in forwards order the after images relating to each committed transaction.

To deal with “3. Corruption of part of the stored database.”:

  • Media recovery
    • a backup copy of the database is made at regular intervals.

A database may be restored to the position at the last commit before the corrupting event by reapplying to the latest backup after image records of transactions that have successfully completed since the time of that backup.

Write-ahead log protocol

The write-ahead log protocol has two properties to guarantee recoverability as follows:

  1. A “before image” log record must be physically written to the log file before the corresponding updated database record is physically written to the database.
    • This prevents the situation where an update has been physically written to the database but the transaction fails with loss of the main memory buffer containing the before image log record before that log record is physically written to the log file. In such a situation, recovery would be impossible since no information would exist, which would allow restoration of the original state of the database record.
  2. All “after image” log records for transaction must be physically written to the log file before that transaction may successfully commit.
    • This ensures that the situation where a transaction has committed, but neither has a database record updated by the transaction been physically written to the database nor has the corresponding after-image log record been physically written to the log file. In such a situation, recovery will be impossible since no information would exist which would allow the updated state of the database records to be physically written to the database.

The first property is needed to ensure that failed transactions can always be rolled back, and the second property is needed to ensure that a committed transaction can always be recovered by rolling forward in the event of a general system failure.

Checkpointing

  • How can the DBMS identify the transactions which must be recovered in the event of a system failure?
  • One effective mechanism is “checkpointing” i.e. flushing all in-memory buffers to physical storage at regular intervals.

Given the asynchronous nature of writes, there is no guarantee that the updated database records for this transactions were physically written to the database before the system failure. One solution is to use the technique of checkpointing. A simple form of this is as follows. The first stage would be that at certain intervals all main memory database buffers containing updated records are physically written out and a checkpoint record is physically written to the log. This lists all transactions in progress at that point. In the event of a system failure, only transactions in progress at the time of the last checkpoint and later need to be considered. Since main memory database buffers are physically written at the time of a checkpoint, any transaction already committed at that point must have had all its updates physically written to the database.

Recovery example

             |               |
-------------|---------------|----> TIME
T1  o--->    |               |
T2  o--------|------>        |
T3  o--------|---------------|-->
T4           |  o------>     |
T5           |  o------------|-->
             |               |
         CHECKPOINT        SYSTEM
                           FAILURE

The process after the system failure and we do a system restart is that we restore the system to the state it was in at the last checkpoint, and then we’re going to need to consider each of the transactions in turn and what we need to do with them.

  • T1 needs no action at all because it had completed at the points of the last checkpoint.
  • T2 and T4 are going to have to be “redone” because they hadn’t completed at the time of the system failure.
  • T3 and T5 are going to have to be “undone” because they were in progress at the time of the system failure.

We now follow the write-ahead log protocol to recover the system. We do that by creating two lists of transactions.

An “undo list”, which is the list of transactions recorded in the last checkpoint record which is T2 and T3. Those are the transactions that were still in progress at the point of the checkpoint and we create a “redo list” which is initially empty.

Then we work forwards through the log from the checkpoint record. If a log record for a starting transaction is found, add that transaction to the undo list, which in this particular case is T4 and T5. If a commit log entry is found, move the committed transaction from the undo list to the redo list, which is T2 and T4.

Then we work backwards through the log, undoing the transactions in the undo list, which is now T3 and T5 and we restore the before images of those transactions to the database. Then we work forwards through the log, redoing the transactions in the redo list, which is now T2 and T4 by restoring the after images for those transactions to the database.

At that point, we can restart the database. It is now in a recovered state and we can start normal database processing again.

SUMMARY

  • If a log record for a starting transaction is found, add that transaction to the UNDO list (here T4 and T5).
  • If a COMMIT log entry is found, move the committed transaction from the UNDO list to the REDO list (here T2 and T4).

補完

ロールバック(UNDO)は、現在のデータベースの状態から時間的に遡って、過去のある時点の状態に復旧する方法。巻き戻すためにログの中のビフォアイメージを使用する。

ロールフォワード(REDO)は、障害のあった直前の状態に復旧させる方法。事前に保存していたデータベースダンプを磁気ディスクに移した後、ログの中のアフターイメージを使用する。

ログはトランザクションを実行する前に書き出す(WAL, Write Ahead Log)。こうすれば途中で障害が発生してもログは残る。

データベースへの更新はメインメモリ上で行われる。その情報を定期的なタイミングでディスクに反映する。それがチェックポイント。

Week 7: Transaction Management 2 – Concurrency #

Principles of Concurrency Control in the Transaction Manager #

Concurrency Control phenomena #

Our DBMS have a mechanism to ensure concurrency control, that is a mechanism to enable multiple concurrent transactions to safely access the same data at the same time.

Concurrency

Ensure multiple concurrent transactions safely access the same data at the same time.

When we cannot support concurrency three phenomena may be occur:

  1. The lost update problem.
  2. The uncommitted dependency problem.
  3. The inconsistent analysis problem.

Concurrency control is used to avoid these problems.

1. The lost update problem

Update of TA is lost. TB overwrite TA’s update.

[TA]          [TB]
:             :
READ R;       :
R := R * 2;   :
:             :
:             READ R;
:             R := R * 3;
:             :
WRITE R;      :
:             :
:             WRITE R;
:             :

If the isolation property of these transactions were maintained, what would happen is that we would consider TA as having run first, then TB or vice versa.

2. The uncommitted dependency problem (dirty read)

Reading uncommitted data in this way is also sometimes called a “dirty read”.

[TA]          [TB]
:             :
READ R;       :
R := R * 2;   :
WRITE R;      :
:             :
:             READ R;
:             :
ROLLBACK;     :
:             :

3. The inconsistent analysis problem.

We can see that transaction A reads R1 before this transfer takes place in TB, but then it reads R2 after this transaction takes place. The sum produced by transaction A is inconsistent, even though it only reads R2 after TB has committed the change. This is where it differs from the uncommitted dependency problem.

[TA]              [TB]
:                 :
READ R1;          :
Sum := R1;        :
:                 :
:                 READ R1;
:                 READ R2;
:                 R1 := R1 - 10;
:                 R2 := R2 + 10;
:                 WRITE R2;
:                 WRITE R1;
:                 COMMIT;
:                 :
READ R2;          :
Sum := Sum + R2;  :
:                 :

Locking #

A simple Locking algorithm #

Locking

  • The most common solution to address these problems is locking; a mechanism to manage access to shared resources
  • The DBMS manages locks on:
    • logical objects, such as relational tables or individual table rows, or
    • physical objects, such as records, pages or files
  • An individual transaction may acquire a lock on an object

Types of locks

  • “Share” or “S-locks” (also known as a read or read-only lock) ensure that another transaction may read but not write the object.
  • “Exclusive” or “X-locks” (also known as a write or no-read lock) ensure that another transaction may neither read nor write the object.

Simple locking algorithm

The following locking algorithm might be used to deal with the concurrency problems identified:

  • A transaction that wishes to read a record must first obtain an S lock on that record.

  • A transaction that wishes to write a record must first obtain an X lock on the record.

  • A transaction can only obtain a lock on a record already locked by another transaction if both locks are S locks.

  • If a transaction is unable to obtain a lock it goes into a wait state until the lock is released.

  • Any lock obtained by a transactions released at the end of that transaction.

Compatibility of S- and X-Locks

  • The table indicates whether a request for a lock on an object by one transaction can be satisfied if a lock is already held on the object by another transaction
  • A “Y” indicates that the request may be granted, an “N” indicates a conflict with the second transaction going into a wait state.
|---|-----|
|   | S X |
|---|-----|
| S | Y N |
| X | N N |
|---|-----|

Locking and the Lost Update Problem

We’ve avoided the lost update problem, but we have a new problem now, each transaction is waiting for the release of a lock held by the other. This is termed “deadlock”.

[TA]                        [TB]
:                           :
READ R;                     :
  (S lock on R obtained)    :
R := R * 2;                 :
:                           :
:                           READ R;
:                             (S lock on R obtained)
:                           R := R * 3;
:                           :
WRITE R;                    :
  (X lock on R requested)   :
wait                        :
wait                        WRITE R;
wait                          (X lock on R requested)
wait                        wait
wait                        wait

Locking and the Uncommitted Dependency Problem

In this case, transaction B sees the original value of R as TA has rolled back. If TA had committed instead, TB would see the updated value of record R. So we can see, we have avoided the uncommitted dependency problem by using the locking algorithm, our simple locking algorithm with this schedule.

[TA]                        [TB]
:                           :
READ R;                     :
  (S lock on R obtained)    :
R := R * 2;                 :
WRITE R;                    :
  (X lock on R obtained)    :
:                           :
:                           READ R;
:                             (S lock on R requested)
:                           wait
:                           wait
ROLLBACK;                   wait
  (lock on R released)      wait
:                             (S lock on R obtained)
:                           :

Locking and the Inconsistent Analysis Problem

In a similar way to the last update problem, we’ve ended up with deadlock.

[TA]                          [TB]
:                             :
READ R1;                      :
  (S lock on R1 obtained)     :
Sum := R1;                    :
:                             :
:                             READ R1;
:                               (S lock on R1 obtained)
:                             READ R2;
:                               (S lock on R2 obtained)
:                             R1 := R1 - 10;
:                             R2 := R2 + 10;
:                             WRITE R2;
:                               (X lock on R2 obtained)
:                             WRITE R1;
:                               (X lock on R2 requested)
:                             wait
READ R2;                      wait
  (S lock on R2 requested)    wait
wait                          wait
wait                          wait

Deadlock and Livelock #

Deadlock

When two transactions are waiting on each other to release a lock we end in a deadlock.

If we don’t handle deadlocks, we’re going to end up in situations   where transactions just never end, and therefore, application code   will potentially hang.

Dealing with deadlocks:

  • Deadlocks can be avoided by linearly ordering the objects which may be locked.
    • This strategy aims to avoid deadlock.
    • This is sometimes also called total ordering or simple ordering.
  • If deadlocks are allowed to occur, they may be detected by using a wait-for graph.
    • This strategy aims to detect deadlock.
    • This is the more common strategy in transaction managers.
    • To create a wait-for graph, we create a node for each transaction, and an arc indicating where a transaction is awaiting a lock that another transaction holds. We then identify cycles in the graph. If there are any cycles in the graph, then we know we have deadlock transactions, and any nodes that are part of that cycle are deadlocked. We are able to utilize graph theory since there are well-known algorithms for identifying cycles in graphs. Once we’ve identified a deadlock transaction via this graphical approach, we can roll back one of the transactions in the cycle to break the deadlock. The locks of the transaction rolled back will be released,thereby allowing some other transaction to proceed.
  • Alternatively, a timeout mechanisms can be used to rollback transactions.
    • Rather than detecting deadlocks, some DBMSs use a timeout mechanism to roll back any transaction which has done no work for a specified period on the assumption that it is deadlocked.

Livelock

Livelock may occur where transactions, through not waiting through deadlock, are unable to complete successfully.

For example, if a transaction is rolled back through use of a timeout mechanism to break deadlock, a livelock will result if each time a new replacement transaction starts, it itself waits on a locking conflict, and rollbacks repeatedly occur.

デッドロックと同様、排他制御によりロックされた資源に、複数のユーザからアクセス要求が出されたときに、お互いに資源が解放されるのをビジー状態で待つという状況が発生する。デッドロックでは個々のユーザにおける資源獲得のための処理が進行しないのに対し、ライブロックでは資源獲得の処理が進行しているにも拘らず、どのユーザも資源が獲得できない状況である。

例えば、狭い道を歩いていて対面した歩行者 2 名が、お互いに相手が避けようとした方向に動いてしまい、避けられないという事が有る。次に、逆の方向に避けようして避けられない。このような状況が続いて、何時まで経ってもすれ違うことができないという状況にあたる。

Serializability #

Schedules and serializability #

  • Schedules refers to the order in which the statements of a set of transactions is executed.
  • A schedule is serial if there is no interleaving of transactions within the schedule.
    • Serial schedules are essentially schedules without concurrency.
    • Each transaction executes one at a time, with one transaction completing before the next one starts.
  • A schedule is serializable if its effect is equivalent to some serial schedule.

Serial and Serializable

A schedule is serializable if it’s effect is equivalent to some serial schedule. It doesn’t mean that a serializable schedule is a serial schedule, it’s just equivalent to some serial schedule.

スケジュールは、その実行結果がシリアルスケジュールと同等である場合、シリアル化可能です。シリアル化可能なスケジュールがシリアルスケジュールであるという意味ではなく、シリアルスケジュールと同等であるというだけです。

The definition of a serial schedule is that there is no interleaving of transactions.

Not Serial

The lost update problem below, there’s interleaving on this schedule, so this cannot be serial. The effect of TA followed by TB, or TB followed by TA is different.

For the uncommitted dependency problem and the inconsistent analysis problem, there’s interleaving of transactions, therefore, this is not serial.

[TA]          [TB]
:             :
READ R;       :
R := R * 2;   :
:             :
:             READ R;
:             R := R * 3;
:             :
WRITE R;      :
:             :
:             WRITE R;
:             :

Regarding the schedules we applied the locking algorithm, the last update problem and with locking interleaved transactions again, therefore this is not serial. Similarly, for the other two examples with the locking have interleaved transactions therefore none of these schedules are serial.

[TA]                        [TB]
:                           :
READ R;                     :
  (S lock on R obtained)    :
R := R * 2;                 :
:                           :
:                           READ R;
:                             (S lock on R obtained)
:                           R := R * 3;
:                           :
WRITE R;                    :
  (X lock on R requested)   :
wait                        :
wait                        WRITE R;
wait                          (X lock on R requested)
wait                        wait
wait                        wait

Serializable

We understood none of those schedule are “serial”. Then, let’s consider if any of the schedules are “serializable”. Let’s remember the definition of a serializable schedule is that it’s effect is equivalent to some serial schedule.

We know that none of the schedules without locking are serializable because the effect of TA followed by TB, or TB followed by TA is different than the effect of the interleaved schedule.

And it turns out that all three of the schedules with locking are indeed serializable.

Locking and the Uncommitted Dependency Problem

This is the schedule of uncommitted dependency problem. This schedule is serializable. This is equivalent to a serial order of TA followed by TB,or a serial order of TB followed by TA.

[TA]                        [TB]
:                           :
READ R;                     :
  (S lock on R obtained)    :
R := R * 2;                 :
WRITE R;                    :
  (X lock on R obtained)    :
:                           :
:                           READ R;
:                             (S lock on R requested)
:                           wait
:                           wait
ROLLBACK;                   wait
  (lock on R released)      wait
:                             (S lock on R obtained)
:                           :

Locking and the Lost Update Problem & Locking and the Inconsistent Analysis Problem

Let’s now consider whether the schedules for the locking and the lost update problem and the inconsistent analysis problem is serializable. In both these examples, a deadlock arose, which could be broken by either TA or TB being rolled back and a replacement transaction started. Let’s consider for the lost update problem the scenario where TB was rolled back.

[TA]                        [TB]
:                           :
READ R;                     :
  (S lock on R obtained)    :
R := R * 2;                 :
:                           :
:                           READ R;
:                             (S lock on R obtained)
:                           R := R * 3;
:                           :
WRITE R;                    :
  (X lock on R requested)   :
wait                        :
wait                        WRITE R;
wait                          (X lock on R requested)
wait                        wait
wait                        wait

When TB was rolled back, a new replacement transaction would then be started. Let’s call that TC. The schedule would be then equivalent to a serial schedule of TA followed by TC. If TA were roll back and a new replacement transaction TC started to replace TA, the schedule would be equivalent to a serial schedule of TB followed by TC.

So, again, this schedule is “serializable”.

Let’s finally consider locking and the inconsistent analysis problem. This is exactly the same as for the lost update problem,because we’ve got a deadlock here, so one or other these transactions is going to have to be rolled back and replaced with a replacement transaction that we can call TC.

[TA]                          [TB]
:                             :
READ R1;                      :
  (S lock on R1 obtained)     :
Sum := R1;                    :
:                             :
:                             READ R1;
:                               (S lock on R1 obtained)
:                             READ R2;
:                               (S lock on R2 obtained)
:                             R1 := R1 - 10;
:                             R2 := R2 + 10;
:                             WRITE R2;
:                               (X lock on R2 obtained)
:                             WRITE R1;
:                               (X lock on R2 requested)
:                             wait
READ R2;                      wait
  (S lock on R2 requested)    wait
wait                          wait
wait                          wait

In the event that TB were rolled back, a new replacement transaction, TC, started, the schedule will be equivalent to a serial schedule of TA followed by TC, and conversely if TA were rolled back, a new replacement transaction, TC, started to replace TA, the schedule will be equivalent to a serial schedule of TB followed by TC.

NOTE

We’ve seen that the three example schedules here with locking were serializable, but note that in the general case, a schedule with locking is “not” necessarily serializable even though it avoids the three problems that we’ve discussed.

Let’s look at another example schedule and see whether it’s serializable or not. Let’s assume that the records R1, R2 and R3 originally contained the value two and no locking is taking place. With the schedule as we show in this slide, the final value of R1 is 4, R2 is 8, and R3 is 4. Therefore we can determine that the schedule is serializable because it’s equivalent to a serial schedule with TB followed by TA. The fact that TA actually started before TB started and finished before TB finished is irrelevant. Note that a schedule has to be equivalent to some serial order in order to be serializable. It is not necessary for the schedule to be equivalent to a specific serial order, such as the order in which the transaction starts or finish.

    [TA]             [TB]
    :                :
    READ R1;         :
    R1 := R1 + 2;    :
    WRITE R1         :
    :                :
    :                READ R2;
    :                R2 := R2 + 2;
(I) :                WRITE R2;
    :                :
(J) READ R2;         :
    R2 := R2 * 2;    :
    WRITE R2;        :
    :                :
    :                READ R3;
    :                R3 := R3 + 2;
    :                WRITE R3;
    :                :

Something to note about this schedule here is, if we reverse steps “I” and “J”, the schedule would no longer be serializable.

This example also illustrates another point, even though the schedule as shown is serializable, the locking protocol that we’ve given in the previous lectures, the simple locking algorithm, would actually prevent this schedule. Concurrency mechanisms do generally prevent some schedules, which in fact would’ve turned out to be serializable.

Conflict serializability #

Serializability

  • Recall that a schedule is serializable if its effect is ’equivalent’ to some serial schedule
    • The notion of ’equivalent’ in that definition was not defined precisely.
  • There are alternative interpretations of the notion of ’equivalence’ in this context
  • The concept of “conflict equivalence” underpins the definition of “conflict serializable schedules”

Conflict serializable schedules

What do we mean by conflicting operations in the context of conflict serializability?

  • Two operations by different transactions on a record conflict if at least one of them is a “write” operation
    • in a “write-read conflict”, TB reads a record previously written by TA
    • in “read-write” and “write-write” conflicts are defined similarly
  • Two schedules are conflict equivalent if they order pairs of conflicting operations of committed transactions in the same way
  • A schedule is conflict serializable if it is conflict equivalent to some serial schedule

We can say that two schedules are conflict equivalent if they order pairs of conflicting operations of committed transactions in the same way. A schedule is conflict serializable if it is conflict equivalent to some serial schedule.

Conflict Serializability example

To illustrate this concept of conflict serializability, let’s consider the following schedule.

We want to establish whether this schedule, which has interleaved operations is conflict serializable or not. We have two transactions, transaction A and transaction B, and you can see that transaction A reads R1, and then writes it TB, then reads the same record and writes it. Then TA reads a second record R2 and writes it and TB then reads that second record and writes it.

[TA]          [TB]
:             :
READ R1;      :
WRITE R1;     :
:             READ R1;
:             WRITE R1;
READ R2;      :
WRITE R2;     :
:             READ R2;
:             WRITE R2;
:             :

Let’s now annotate our schedule with conflicting operations. Remember for an operation to be conflicting, the two operations must belong to separate transactions. In this particular case, one must belong to TA and one must belong to TB, but this principle is extensible so the principle applies to schedules with many, many transactions, potentially hundreds. The second criteria for conflicting operations is that both the operations must be operating on the same record. The third criteria is that at least one of the operations must be a write.

Let’s represent conflicts on the schedule as directioned edges.

Identify operations which conflict:

- TA's READ  R1;  ->  TB's WRITE R1;  = read-write
- TA's WRITE R1;  ->  TB's READ  R1;  = write-read
- TA's WRITE R1;  ->  TB's WRITE R1;  = write-write
- TA's READ  R2;  ->  TB's WRITE R2;  = read-write
- TA's WRITE R2;  ->  TB's READ  R2;  = write-read
- TA's WRITE R2;  ->  TB's WRITE R2;  = write-write

These operations are not conflicts:

- TA's READ R1;  ->  TB's READ R1;  = read-read
- TA's READ R2;  ->  TB's READ R2;  = read-read

This approach is extensible to any number of transactions, so, we could have hundreds or thousands of transactions, and we would have to compute these conflicts to determine whether a particular schedule is conflict serializable or not.

Swapping non-conflicting operations

The principle we now follow, that is as long as the order of operations within a transaction are maintained, the outcome of a schedule depends only on the order of the conflicting operations. We can therefore swap other operations, non-conflicting operations around and to establish whether we’ve got a serializable schedule or not.

  • As long as the order of operations within a transaction are maintained, the outcome of a schedule depends only on the order of conflicting operations.
  • Using the property that if O(a) and O(b) are two non-conflicting operations in transactions A and B of some schedule S, and O(a) < O(b) (i.e. O(a) is executed before O(b)), we can swap O(a) and O(b) without effecting the schedule.

What we’re going to do is swapping non-conflicting operations and see if we can intuitively arrive at a serial schedule by doing that such that at the end of this process the order of the operations within each of the transactions is still the same.

For the previous schedule, we can do:

  • Swap positions of READ R2 in TA and READ R1 in TB
  • Swap positions of WRITE R2 in TA and WRITE R1 in TB

The first step we’re going to swap the positions of read R2 in transaction A and read R1 in transaction B, then we do the same thing for write R2 in transaction A and write R1 in transaction B. They’re non-conflicting since they’re operating on different records, so we can swap like this.

BEFORE                --> AFTER
[TA]        [TB]       |  [TA]        [TB]
:           :          |  :           :
READ R1;    :          |  READ R1;    :
WRITE R1;   :          |  WRITE R1;   :
:           READ R1;   |  READ R2;    :
:           WRITE R1;  |  WRITE R2;   :
READ R2;    :          |  :           READ R1;
WRITE R2;   :          |  :           WRITE R1;
:           READ R2;   |  :           READ R2;
:           WRITE R2;  |  :           WRITE R2;
:           :          |  :           :

Now, we arrived at a serial schedule.

Since there is no interleaving of transactions, therefore, the original schedule is “conflict serializable”.

We need to note that the converse is not necessarily true. Not every serializable schedule is conflict serializable.

Serialization graphs

Now, the example we’ve just looked at was simple enough that we was able check operations to determine whether it was conflict serializable or not.

But, in the general case, the transaction manager is going to need to be able to compute whether a schedule involving many hundreds of transactions is conflict serializable or not. One approach we can use for this is that of a serialization graph. There are well-known algorithms in graph theory that enable us to establish this from arbitrarily complex graphs.

The two-phase Locking protocol #

Mainly, There are three different approaches to concurrency control, which all ensure that schedules are “conflict serializable”.

We saw how our simple locking algorithm works to avoid the problems. We’re now going to formalize our simple locking algorithm into what is termed a “two-phase locking algorithm”.

Two-phase locking

  • Requires that in any transaction, all locks are obtained before any lock is released
  • Transactions following this protocol are said to be two-phase:
    • first phase: lock acquisition phase
    • second phase: lock release phase

This protocol requires that in any transaction all locks are obtained before any lock is released. Transactions obeying this protocol as said to be two-phased. The first phase of this protocol is the lock acquisition phase where each transaction acquires the locks it requires. The second phase is the lock releasing phase where all the locks from a particular transaction are released all at the same time.

This protocol is particularly important since it can be proved that for any schedule S of two-phase transactions, S is “conflict serializable”. (But, we’re not going to cover the proof of this in this module.)

Because we’ve established that the protocol is “conflict serializable”, we can state that any set of transactions obeying the two-phase protocol is “serializable”.

The two-phased is also called the “strong”, “strict”, “rigorous” two-phase locking protocol or algorithm.

Timestamping #

A Timestamping algorithm #

We’re going to look at alternative approach.

Timestamp ordering

  • An alternative to locking to ensure only serializable schedules arise in concurrent transactions is to use timestamps:
    • each transaction receives a unique timestamp, e.g. the system clock time when a transaction starts or, alternatively, when it first accesses a record, and
    • the DBMS ensures that access to records is compatible with the serial ordering of transactions following the timestamp ordering

How the DBMS arrives at a timestamp for a transaction is not actually important for the working of this algorithm. The important thing is that timestamps are unique so that transactions have a unique ordering given by their timestamps.

The timestamping algorithm allows serializable schedules which are equivalent to the serial schedule that reflects the timestamp ordering. Now you’ll recall that for the locking algorithm, the equivalent serial schedule is equivalent to the serial schedule reflected by commit order. There’s a fundamental difference here between timestamping and locking in terms of the equivalency or schedule.

Algorithm

We associate a unique timestamp with each transaction.If we have two transactions, TA and TB, we’d have a timestamp for TA and a timestamp for TB. We’ll assume this is when this transaction starts for the purpose of the algorithm. It could equally be when it first accesses a record. The algorithm would work correctly either way.

We associate with each record. If we’ve got a record, R1, R2, and R3, for example, that are accessed by our transactions in our schedule, we’d associate each of those records with the greatest timestamp of any transaction to read it, T(R) and the greatest timestamp of any transaction to write it, T(W). So, each record will have two timestamps associated with it, T(R) and T(W).

When we say the greatest timestamp, we mean the most recent timestamp. Then we proceed through the schedule in chronological order.

There are two parts of the algorithm, one for when a transaction is attempting to read a record and one when it’s attempting to write a record.

READ:

Now, let’s consider the scenario where we’re reading a record. We’re going to represent our algorithm using pseudo code here.

If the write timestamp of the record that we’re reading is more recent than the timestamp at the transaction, then we roll back the transaction and start a new replacement transaction. That’s since a later transaction has already written the record.

Otherwise, we allow the read to continue and we update that read timestamp for that record to be the timestamp of the transaction if the read timestamp of the record is older than the timestamp of the transaction.

If a transaction with timestamp 'T' attempts to read a record:

if T(W) > T:
  then rollback T and start a new replacement transaction

else:
  allow the read and set timestamp T(R) = T if T(R) < T

WRITE:

Now, let’s consider the scenario where we’re writing a record.

If a transaction with timestamp 'T' attempts to write a record:

if T(R) > T:
  then rollback T and start a new replacement transaction

else if T(W) <= T:
  then allow the write and set T(W) = T if T(W) < T

else if T(W) > T:
  then do nothing

There’s three different parts through this part of the algorithm.

The first path is where the read timestamp of the record we’re writing is more recent than the timestamp of the transaction. In that scenario, we roll back the transaction and start a new replacement transaction. That’s since a later transaction has already read that record.

Otherwise, if the write timestamp of the record we’re trying to write is the same as or older than the timestamp of the transaction, we allow the write to proceed and we update the write timestamp at that record to the timestamp of the transaction if the write timestamp of the record is older than the timestamp of that transaction.

The final path through our algorithm if neither of these conditions apply, where the write timestamp at the record we’re trying to write is more recent than the timestamp of the transaction, we do nothing at all.

Cascade rollback problem

  • When a transaction rolls back, the records it wrote may have been read by later transactions - any such transactions must be rolled back too.
  • So a “cascade” effect can occur with this method since the rollback of the further transactions may themselves necessitate further rollbacks.

We’ve avoided deadlock but we’ve got this disadvantage with this approach of these potential cascades.

Thomas Write Rule

Let’s consider when we’re writing records. This do nothing operation down here. It’s known as the “Thomas Write Rule”.

...

else if T(W) > T:
  then do nothing
  • The “do nothing” action for certain writes - known as the Thomas Write Rule - requires further consideration.
  • If a “later” transaction T has already written the record, then it can be considered to have overwritten the current transaction T’s write, which can therefore be ignored.
  • The only problem would be if a transaction had read the record in the intervening period. But such a transaction could not have a timestamp earlier than T since the rule for reads would prevent such a read, and if its timestamp is later than T it has correctly seen the T write.

Example 1

Let’s consider how timestamping algorithm works.

    [TA]          [TB]
    :               :
    READ R;         :
    R := R * 2;     :
    :               READ R;
    :               R := R * 3;
(1) WRITE R;        :
(2) :               WRITE R;
    :               :

We’ve got our two transactions TA and TB. First question is, is TA’s timestamp lower than TBor higher than TB’s? Well, we don’t actually know because this is just a fragment of the schedule.

Let’s consider the two different scenarios.

Firstly, where TA’s timestamp is lower than TB’s timestamp. In this scenario, TA will roll back at step (1) since the later transaction TB has already read R and so the records read timestamp will be greater than TA’s timestamp.

Then the second scenario, if TB’s timestamp is lower than TA’s timestamp, then TB will roll back at step (2) since the later transaction TA has already read R. The records read timestamp will be greater than TB’s timestamp.

We can see we’ve avoided deadlock but we’ve got these rollbacks occurring that could cause this cascading effect that we described a few moments ago.

Example 2

Let’s now consider the uncommitted dependency problem.

    [TA]          [TB]
    :               :
    READ R;         :
    R := R * 2;     :
    WRITE R;        :
(1) :               READ R;
    :               R := R * 3;
(2) ROLLBACK;       :
    :               :

Again, two scenarios depending on whether TA’s timestamp is greater than TB’s or less than TB’s.

Let’s consider the scenario first of all where TA’s timestamp is lower than TB’s timestamp, then TB, as well as TA, will roll back at step (2) since TB is a later transaction which has read a write of a failed transaction.

Then we consider the scenario where TB’s timestamp is lower than TA’s timestamp. Then TB will roll back at step (1) since the later transaction, TA, has already written record R and so the record’s write timestamp will be greater than TB’s timestamp. In the scenario of the uncommitted dependency problem, both transactions are going to roll back.

Example 3

Lastly, let’s consider the inconsistent analysis problem with timestamping.

    [TA]              [TB]
    :                 :
    READ R1;          :
    Sum := R1;        :
    :                 READ R1;
    :                 READ R2;
    :                 R1 := R1 - 10;
    :                 R2 := R2 + 10;
    :                 WRITE R2;
(1) :                 WRITE R1;
    :                 COMMIT;
(2) READ R2;          :
    Sum := Sum + R2;  :
    :                 :

Again, the two scenarios depending on whether TA’s timestamp is more recent or older than TB’s timestamp.

Let’s start off with when TA’s timestamp is lower than or older than TB’s timestamp, then TA will roll back at step (2). This is since the later transaction, TB, has already written R2 and so the record’s write timestamp will be greater than TA’s timestamp.

In the second scenario where TB’s timestamp is older or lower than TA’s timestamp, then TB will be rolled back at step (1). Since the later transaction, TA has already read record R1 and so the record’s read timestamp will be greater than TB timestamp.

Multiversioning #

A Multiversioning algorithm #

We’ve already looked at the two phase locking protocol, Two-phase locking and Timestamping. The third approach is the multiversioning. A multiversioning algorithm is a mixed variant of the algorithm which is Timestamping and Two-phase locking.

Consider with uncommitted dependency problem

Remember TB is dirty read.

Dirty read was avoided with both the locking and time stamping algorithms.

But transaction TB was forced to wait with the locking approach, and was rolled back with the timestamping approach.

[TA]          [TB]
:             :
READ R;       :
R := R * 2;   :
WRITE R;      :
:             READ R;
ROLLBACK;     :
:             :

Multiversioning is an approach that ensures that the dirty read is prevented but also prevents the wait that we have with the locking approach and the rollback with the timestamping approach.

  • It is possible to avoid the problems identified with locking and timestamping.
  • Prevents TB from reading the latest version of R as written by TA, but allow TB to read the earlier version of R in its state before TA wrote it.
  • This method relies on earlier versions of records being available
  • This is likely in practice because “before images” are typically maintained to enable recovery

Algorithm

  • On start, each transaction specifies whether it is read-only or read-write.
  • A read-only transaction is assigned a timestamp before it is first read.
  • A read-write transaction is assigned a timestamp when it commits.
  • Since time stamps are assigned in increasing order, a read-only transaction’s timestamp is smaller than the timestamp that any concurrently executing read-write transaction receives.
  • Each time a record is written, its timestamp ’t(w)’ is set to the timestamp of the transaction that wrote it. its previous version (before image) is maintained.
  • Read-write transactions obey the locking protocol and are unaffected by previous versions of records.
  • When a read-only transaction with timestamp ’t’ attempts to read a record, it reads the version of the record with the highest ’t(w)’ < ’t’. This version is guaranteed to have been written by a transaction that was committed before ’t’ was assigned.

Observation

  • The algorithm employs the ’t(w)’ of record versions to determine which copy is read, but its value is not known until the transaction which writes the version commits.
  • This presents no problem:
    • If a version of record has not yet received its ’t(w)’ timestamp because the writing transaction has yet to commit, then this version shall not be read by a concurrent read-only transaction.
    • Hence, a read-only transaction need only identify the version which already has the highest ’t(w)’ < ’t'.

Example 1

Let’s consider how multiversioning algorithm works.

This is lost update problem.

You can see we’ve now annotated each of the transactions as to whether they’re read-only or read-write.

Since both transactions in this case are read-write the locking protocol is applied, therefore in this scenario, multiversioning algorithms bives no advantage. These result is the same with the locking algorithm.

[TA](read-write)  [TB](read-write)
:                 :
READ R;           :
R := R * 2;       :
:                 READ R;
:                 R := R * 3;
WRITE R;          :
:                 WRITE R;
:                 :

Example 2

Now, turn into the uncommitted dependency problem.

We have a read-write transaction and read-only transaction.

At point (1), TB will correctly read the original version, since the version written by TA does not have a timestamp lower than TB’s timestamp.

In fact, TA rolls back and does not get a timestamp but had it committed its timestamp, it would be higher than TB’s. So here we see a situation where the multiversioning algorithm benefits us. Transaction TB correctly runs without waiting or rolling back.

    [TA](read-write)  [TB](read-only)
    :                 :
    READ R;           :
    R := R * 2;       :
    WRITE R;          :
(1) :                 READ R;
    :                 R := R * 3;
    ROLLBACK;         :
    :                 :

Example 3

Then, turing into the inconsistent analysis problem.

We have a read-only transaction which is TA and read-write transaction which is TB.

We can see that TA’s timestamp will be lower than TB’s timestamp. Hence TA will correctly read the original version of R1 at (1) and the original version of R2 at (2). It will not read the version of R2 written by TB at step four since the write timestamp will be greater than TA’s timestamp.

    [TA](read-only)   [TB](read-write)
    :                 :
(1) READ R1;          :
    Sum := R1;        :
    :                 READ R1;
    :                 READ R2;
    :                 R1 := R1 - 10;
    :                 R2 := R2 + 10;
    :                 WRITE R2;
    :                 WRITE R1;
    :                 COMMIT;
(2) READ R2;          :
    Sum := Sum + R2;  :
    :                 :

Week 8: Database design, Normalisation and SQL Data Definition Language(DDL) #

Normalisation(正規化) #

  • Normalization is the application of a set of common-sense rules to organise data into tables so that the result of using the database are always unambiguous and as intended.
  • Normalization theory defines a series of normal forms which precisely captures the features of a relation giving rise to these kinds of problems, and it provides a method for arriving at a design without such problems.

Database normalization aims to minimize data redundancy. Why data redundancy can create problems in database?

  • The first problem it can create is that it increases the size of the database. Perhaps unnecessarily since the same data is going to be repeated in many different tables.
  • The second problem that data redundancy can create in databases is inconsistency. This arises with insert, delete, and update operations on data that is repeated in many tables.

Normal Form(正規形) #

  • First Normal Form = 第1正規形
  • Second Normal Form = 第2正規形
  • Third Normal Form = 第3正規形

(各正規形への正規化の作業手順についての内容、既知の内容だったため詳細は省略)

Constraints #

(テーブル制約についての内容、既知の内容だったため詳細は省略)

Views and Privileges #

Views

  • Views are tables derived from one or more tables (base tables or views) by the evaluation of a SELECT query expression
  • Typically, the defining query expression of a view is typically stored (in the data dictionary/catalog)
  • The view table rows are formed when required as a matter of query evaluation

Advantages of using views:

  • Logical data independence
    • The application will depend on the view rather than the base tables. We can potentially change the base tables and then we will only have to update the view in order to make sure our application works as before.
  • Security mechanism
    • We can control what data any particular application or user within the application can see.
  • Simplified application view of the content of a database
    • View can provide subset of the schema application needs.
CREATE VIEW RED_PARTS (PID, PNAME, WEIGHT, CITY) AS
  SELECT PID, PNAME, WEIGHT, CITY -- or *
  FROM PART
  WHERE COLOUR = 'RED';

To execute a query on the view, the DBMS can merge the query with the view definition to produce an equivalent modified query, so

SELECT *
FROM RED_PARTS
WHERE WEIGHT > 13;

-- The query above will be rewritten as follows by DBMS

SELECT PID, PNAME, WEIGHT, CITY
FROM PART
WHERE COLOUR = 'RED' AND WEIGHT > 13;

Privileges

  • The person who creates and so “owns” an object has certain privileges enabling them to use the object
  • Privileges include SELECT, DELETE, INSERT and UPDATE
  • The owner of an object may use the GRANT statement to grant and revoke privileges on an object to other users
    • e.g. GRANT SELECT, INSERT ON TABLE PART TO TOM, ANNE
    • e.g. REVOKE SELECT ON TABLE PART FROM TOM

Use of roles can simplify the administration of privilege. Rather than granting each privilege separately to each user, a suitable role could be created e.g. a ‘developer’ role and the privileges granted to the role.

CREATE ROLE DEVELOPER;

GRANT SELECT, DELETE, INSERT, UPDATE ON TABLE SUPPLIER TO DEVELOPER;

GRANT DEVELOPER TO TOM, ANNE, MARY, BILL;

Week 9: DBMS architectures, implementations and Query Optimisation #

DBMS architectures and implementation #

Storage Structures #

Overview

  • Relational DBMS implementations typically use conventional file organization techniques to store database data.
  • The precise details are dependent on the particular DBMS, but the storage structures are usually similar based on indexes, pointer-based structures, hash-based structures and others.
  • Similar techniques are used independent of the type of DBMS including relational, non-relational, and object-oriented.

Most relational database management system implementations use the file organization techniques offered by the operating system of that particular server to store the database data.

For more detailed information on storage structures,you should look at the documentation for the specific DBMS you’re using.

In this lecture, we’re going to consider three storage structures, and we’ll see how they form a hierarchy.

1 Tablespace : N Segments, 1 Segment : N Extents

Tablespaces

The first of these is the tablespace.

  • Storage space is divided into logical units called tablespaces.
  • Tablespaces consist of one or more files - files may be added to existing tablespaces.

There could be one or more tablespace per database instance.

  • A tablespace is logically contiguous sequence of fixed-size pages (e.g. 1K for small systems to 32K for data warehouses)

Segments

We now introduce the second of our storage structures which is a segment.

  • Tablespaces contain a variety of different kinds of data:
    • base table rows, index nodes, rollback ‘undo’ or ‘before image’ data, temporary results generated while processing SQL statements involving large amounts of data
  • A “segment” is a series of blocks allocated for a particular use.

Segments are contained within tablespaces. (= N segments in 1 tablespace)

A Segment is a series of blocks allocated for a particular use. This is where it differs from a tablespace that contains data for different kinds of data.

Example:

  • Each table has a single data segment i.e. a series of blocks allocated to hold the table’s row data.
  • Each index has a single index segment to hold the index nodes.

Extents

We now turn to the third and final storage structure, extents.

  • Groups of blocks called extents are allocated dynamically to a segment as required to accommodate the space requirements of a table as inserts and updates occur.
  • The blocks of an extent are logically contiguous and may not span files.
  • When a table is created, an initial extent of logically contiguous blocks is allocated.
  • If the space requirements of the segment exhaust the space available in the initial extent, e.g. as rows are inserted into the table, further extents of logically contiguous blocks will be allocated.

Indexes #

  • Indexes provide efficient physical access paths for processing DML statements.
  • Typical DBMS implementations support B-tree indexes on base tables.
  • Indexes may be created or deleted without modifying application code, thus giving good physical data independence.
-- PX provides efficient access to rows in PART with
-- a given PNAME value or a combination of PNAME, COLOUR values.

CREATE UNIQUE INDEX PX ON PART (PNAME, COLOUR)

An index can be created on one or more columns. Each column may be indexed in ascending or descending order. In this particular example, there are in ascending order.

The normal behavior in relational database management systems such as MySQL and Oracle is to automatically create an index or column specified as a candidate key with primary key or unique.

So, if we create our table ‘parts’, and we identify the ‘PID’ column as the primary key the database will automatically create an index on that attribute.

Consideration

  • If multiple indexes are defined, they can consume more space in the table’s rows.
  • Indexes introduce maintenance overheads as rows are modified.
    • There are algorithms that are used to balance the trees that need to run each time that we actually insert and delete and update rows.

Query Optimisation #

Query Trees and Transformations #

We’re going to look at some typical approaches a query optimizer might use in a relational database management system such as MySQL or Oracle.

DBMS query optimiser

  • SQL defines queries expressed purely in terms of conceptual operations on relational tables.
  • No reference is made to the physical storage structures and access paths available such as indexes.
  • The DBMS “query optimizer” must compute an efficient strategy for executing the query.

Query processing

There are five stages in query processing. To process a query, a DBMS must:

  1. Scan and parse the query, and check the user has the required privileges to execute the query.
  2. Convert the query into an internal form such as a query tree representation.
  3. Convert the query tree using transformation rules into an equivalent tree which will be more efficient to execute.
  4. Generate possible query plans for executing the query tree and choose the one with the lowest estimated cost.
  5. Generate code to execute the selected query plan.

We’re going to be focusing on stages two, three, and four during the lecture. Because the first stage is no different to what a Java compiler or the Python interpreter might do with Java and Python code, and the final stage is very DBMS specific.

Query trees (stage 2)

Query trees is output of stage two of the query processing process.

  • A query tree represents a query in a form suitable for manipulation during the query optimization.
  • The query tree references abstract operations rather than SQL (or other relation DML) syntax.
  • The query tree has no reference to physical access paths which are determined at the query plan stage.
  • Formally, a query tree is a tree structure that corresponds to a relational algebra expression by representing:
    • input relations as leaf nodes, and
    • relational algebra operators as internal nodes.

Using transformation rules (stage 3)

  • The query optimizer transforms a query tree reducing the size of intermediate results by:
    • performing selection operations as early as possible to reduce the number of rows, and
    • performing projection operations as early as possible to reduce the number of columns.
  • The selection heuristic is likely to lead to a greater reduction in the size of intermediate results.
  • Where there is a choice between two or more selection operations the more restrictive one i.e. the one that is likely to result in the fewest rows should be executed first.

Query plans (stage 4)

  • The optimized query tree produced by query optimization still has no reference to low-level procedures implementing the operators, nor to physical access paths available in the database.
  • The query optimizer must produce a series of query plans to implement the abstract operations of the query tree and choose the one with the lowest estimated cost.

There are two main approaches to generating and selecting query plans, rule-based and cost-based.

  • Rule-based optimization uses a predefined ranked order of all possible access paths and algorithms supported by the DBMS to guide the generation and selection of candidate query plans.
  • Cost-based optimization considers all of the candidate query plans and uses statistics maintained by the DBMS relating to the current database state in order to estimate the cost of each query plan.

Cost-based optimization is generally at least as effective as rule-based optimization and generally better as it makes use of database statistics.

  • In estimating the cost of a query plan, a query optimizer is likely to use heuristics regarding:
    • preferred access paths
    • database statistics on quantity and distribution of actual data in base tables and indexes
  • Query plan generation:
    • Aim for best throughout i.e. the minimum elapsed time to process all rows accessed by a statement
    • Aim for best response time i.e. the minimum elapsed time to process the first row accessed by a statement
  • The optimizer can employ:
    • data dictionary information on what access paths exist (indexes, clusters…)
    • data distribution statistics for tables, clusters and indexes (can also be stored in the data dictionary)
    • estimates of I/O, CPU and memory requirements

Week 10: Enhanced database capabilities, non-relational DBMS and Distributed Databases #

Enhanced database capabilities #

Procedural extensions for SQL (SQL/PSM) #

SQL is not computationally complete(=チューリング完全)language. It doesn’t include some of those control flow structures that we are familiar with from Python and Java. Things like, If statements and while and four loops.

*チューリング完全:あらゆる計算を記述、実行できること。

To address this, SQL standards support something called SQL Persistent Stored Modules or SQL/PSM, which adds these additional programming constructs to the SQL language.

Now adherence to SQL/PSM in the relational database field is actually very poor. A lot of commercial databases have chosen their own proprietary extensions. Specifically Oracle use something called PL/SQL and Microsoft SQL server uses something called T-SQL or transactional SQL. The poor adherence to standards is a bit of an issue if you choose to use procedural SQL extensions in your application. The main issue being portability, if you decide to change your database platform, then you may have to change a lot of your code if you are using procedural SQL extensions.

Example 1: PROCEDURE

ストアドプロシージャ:命令を保存したもの。戻り値は持たない。ストアドプロシージャは他の SQL ステートメントの中からは呼び出せず、コンソールから使用する。

DELIMITER //

CREATE PROCEDURE smallest_n (IN n INT)
BEGIN
  DECLARE uka VARCHAR(40);
  DECLARE uke INT(11);
  DECLARE end_point INT DEFAULT 0;
  DECLARE c1 CURSOR FOR
    SELECT UKC.UKAREA, UKELECTORS
    FROM UKCONSTS UKC
    ORDER BY UKC.UKELECTORS ASC;
  OPEN c1;
  my_loop: LOOP
    SET end_point = end_point + 1;
    FETCH c1 INTO uka, uke;
    SELECT CONCAT('Area ', uka, ' Number of Electors ', uke); -- just print to console.
    IF end_point = n THEN
   LEAVE my_loop;
  END IF;
  END LOOP;
  CLOSE c1;
END //

DELIMITER ;
-- コンソールから実行
call smallest_n(3)

Example 2: FUNCTION

ストアドファンクション:命令を保存したもの。戻り値を持つ。ストアドファンクションは他の SQL ステートメントの中から呼び出す。

DELIMITER //

CREATE FUNCTION orderSupply(stockLevel INT, inStock INT)
RETURNS INT
BEGIN
  DECLARE myOrder INT;
  SET myOrder = stockLevel - inStock;
  RETURN myOrder;
END //

DELIMITER ;
-- 他の SQL の中で呼び出す
SELECT orderSupply(10000, 'QUANTITY') FROM SUPPLY WHERE SID = 'S1';

Example 3: CASE, WHEN

SELECT SID, SNAME,
CASE CITY
  WHEN 'LONDON' THEN 'UK'
  WHEN 'PARIS' THEN 'EUROPE CENTRAL'
  WHEN 'ATHENS' THEN 'EUROPE SOUTH'
  WHEN 'ROME' THEN 'EUROPE SOUTH'
  WHEN 'OSLO' THEN 'EUROPE NORTH'
  ELSE 'NOT KNOWN'
END AS REGION
FROM SUPPLIER
SELECT SID, SNAME,
CASE CITY
  WHEN STATUS < 15 THEN 'LOW'
  WHEN STATUS > 25 THEN 'HIGH'
  ELSE 'MEDIUM'
END AS STATUS_CLASS
FROM SUPPLIER

Triggers and Active Databases #

Triggers are mechanisms to recognize application-defined events in the database, which automatically trigger some application-defined action. These are automatically triggered when defined events occur. Database incorporating triggers are sometimes called active databases.

Example

delimiter //

CREATE TRIGGER LOG_EMP_DELETED
AFTER DELETE
ON EMP FOR EACH ROW
BEGIN
  INSERT INTO EMP_LOG(LOG_DATE, EMPNO, ACTION)
    VALUES(SYSDATE(), OLD.empno, 'deleted');
END //

delimiter ;
delimiter //

CREATE TRIGGER LOG_ALL_EMP_DELETED
AFTER DELETE
ON EMP FOR EACH ROW
BEGIN
  DECLARE emp_count INT;
  SET emp_count = (SELECT COUNT(*) FROM EMP);
  IF emp_count = 0 THEN
    INSERT INTO EMP_LOG(LOG_DATE, ACTION)
      VALUES(SYSDATE(), 'All employees deleted');
  END IF;
END //

delimiter ;

NoSQL (Not only SQL) Database #

(NoSQL の概要についての内容、詳細は省略)

  • Document-oriented DBMS の例として MongoDB を用いて説明
  • Graph DBMS の例として Neo4J を用いて説明

Distributed DBMS #

Up until now, we’ve only discussed what the term centralized database management systems where the DBMS software, including the query processor and the transaction manager, is running on a single server. A distributed database system consists of several databases stored at different sites on a computer network often called nodes. The data at each site is managed by a database server running some DBMS software. These servers can co-operate in executing global queries and global transactions.

What is a distributed database?

  • In a client-server architecture, the processing is distributed: DBMS server software manages a database, while client applications request information from the server.
  • Despite running a client-server architecture, applications will still access data in a single database.
  • A distributed database is a collection of distinct database running on multiple nodes which appears to the client applications as a single database.

Motivation

  • Distributed databases can mirror the structure of organisations that are distributed - they will store data logically into departments and physically in separate sites.
  • Distributed databases offer technical advantages such as reducing reliance on the operation of a single server.
  • Multiple and relatively cheap machines rather than one expensive machine can be used resulting in lower cost.
  • A distributed database may provide improved performance since data needed locally can be stored locally, thus reducing communication costs.
  • But extra costs may arise from teh extra technical.
  • The user has no knowledge that a distributed database is being accessed, thus achieving transparency.

Example of distributed query processing

Consider the case where the UKCONSTS table is stored locally, while UK_RESULTS is stored remotely.

SELECT AREA
FROM UKCONSTS C,
     UK_RESUTS@remote_node R
WHERE C.UKNUM = R.UKNUM
AND PARTY = 'eco';

remote_node is “Oracle-specific” terminology to refer to a database link that describes a path from one database to another.

Transaction

  • A DBMS must guarantee that all statements in a transaction either succeed or fail in their entirety.
  • This must also hold true in the case of a distributed database.
  • In particular, the situation must not arise where one node’s transaction manager commits the effects of a transaction at that node, while at another node the effects of the same transaction are rolled back by the transaction manage at that other node.
  • The “two-phase commit” mechanism is used to ensure all database servers participating in a distributed transaction either all commit or all rollback.

Two-phase commit roles:

  • The “coordinator transaction manager” at one node is responsible for deciding weather to commit or rollback a particular transaction.
  • This decision must be followed by all “subordinate nodes” participating in the distributed transaction no matter what site or communication failures occur.

Two-phase commit overview:

[COORDINATOR]                [SUBORDINATES]
                  Prepare
             ---------------->
                Ready/Abort
             <----------------
              Commit/Rollback
             ---------------->
                Acknowledge
             <----------------

Week 11, 12 #

最終課題の期間。課題は、データベースに関連する様々な問題が提示されるのでそれに回答するもの。