GRANULARITY
Granularity is the level of depth represented by the data in a fact or dimension table in a data warehouse.
High granularity means a minute, sometimes atomic grade of detail, often at the level of the transaction.
What is a level of Granularity of a fact table?
A fact table is usually designed at a low level of Granularity.
This means that we need to find the lowest level of information that can store in a fact table.
e.g.Employee performance is a very high level of granularity. Employee performance daily, employee performance weekly can be considered lower levels of granularity.
What is a level of Granularity of a fact table?
The granularity is the lowest level of information stored in the fact table. The depth of data level is known as granularity. In date dimension the level could be year, month, quarter, period, week, day of granularity.
The process consists of the following two steps:
- Determining the dimensions that are to be included
- Determining the location to place the hierarchy of each dimension of information
The factors of determination will be resent to the requirements.
How Is It Measured?
The granularity in a parallel section of an algorithm is generally classified by 1 of 3 relative values:
fine, medium, or coarse.
Notice that I refer to a parallel section of an algorithm instead of the algorithm itself when determining granularity.
An algorithm may contain many different grain sizes and in fact, even a section of an algorithm may have one grain size nested within another.
Granularity is determined by three characteristics of the algorithm and the hardware used to run the algorithm.
The structure of the problem
- As stated in the opening paragraph, the size of a task can vary tremendously.
- In data parallel programming, a few operations or possibly a single operation are performed on many pieces of data.
- These operations are performed in parallel over the data set, often with each processing element (PE) communicating with its neighboring PEs.
- Referring to the definition of granularity, this task would be considered to have a small granularity (fine-grained).
- On the other hand, if large subroutines of an algorithm are independent of one another, they can all be executed in parallel fashion.
- These subroutines require many calculations with little communication and are coarse-grained.
The size of the problem
Assume an algorithm where 10 numbers are to be incremented. With 10 PEs, the algorithm should require 1 clock cycle.
Now assume the problem size is increased and 100 numbers are to be incremented.
Each PE now has 10 numbers to increment and so the size of its task has increased.
- The larger task size implies a coarser granularity.
- The number of processors available
This argument corresponds directly to the previous one. - If the number of processors is reduced while holding the problem size constant, the task size increases and the granularity becomes coarser. (Mapping a large number of data to fewer processors requires the use of virtual processors.)
Why Is It Important?
A study of granularity is important if one is going to choose the most efficient paradigm of parallel hardware for the algorithm at hand.
SIMD machines are the best bet for very fine-grained algorithms.
These machines are built for efficient communication, usually with neighboring PEs.
MIMD machines are less effective on fine-grained algorithms because the message passing system characteristic of these machines causes much time to be wasted in communication.
These machines p erform best with larger grained algorithms.
Another parallel paradigm is a network of workstations.
Very slow communication classifies this paradigm.
It is recommended for coarse-grained algorithms only.
In fact, it is often more efficient to utilize fewer workstations than are available thereby reducing the amount of communication.
Being able to recognize the parallelism within an algorithm and analyze its granularity will guide a programmer to the best parallel paradigm for the task at hand.
Multiple Granularity Locking Protocol
Define what is Granularity ?
Granularity is the size of data item allowed to lock.
Multiple Granularity:
Multiple granularity is the hierarchically breaking up the database into portions which are lockable and maintaining the track of what to be lock and how much to be lock so that it can be decided very quickly either to lock a data item or to unlock a data item.
Example of multiple granularity :
Suppose a database is divided into files
Files are divided into pages
Pages are divided into records.
If there is a need to lock a record, then a transactioncan easily lock it.
But if there is a need to lock a file, the transactionhave to lock firstly all the records one after another, then pages in that file and finally the file.
So, there is a need to provide a mechanism for locking the files also which is provided by multiple granularity.
Why there is a Need to provide a Mechanism for Locking Files as well as Records ?
- If we allow a mechanism for lockingrecords only, then to lock afile, the transactionwill have to lock allrecords in that file(say 10000 at one time) one after another, which is a wastage of time.
- If we allow a mechanism for locking fileonly, then for a transaction to lock only five records, it will haveto lock the wholefile and therefore noother transactionwill be able to use that file.
Granularity of Locking
GRANULARITY OF DATA ITEMS AND MULTIPLE GRANULARITY LOCKING
All concurrency control techniques assumed that the database was formed of a number of named data items.
A database item could be chosen to be one of the following:
• A database record.
• A field value of a database record.
• A disk block.
• A whole file.
• The whole database.
The granularity can affect the performance of concurrency control and recovery.
Granularity Level Considerations for Locking:
The size of data items is often called the data item granularity.
There are three types of classified
Fine granularity.
Coarse granularity.
Medium granularity.
Fine granularity refers to small item sizes.
coarse granularity refers to large item sizes.
Several tradeoffs must be considered in choosing the data item size.
We shall discuss data item size in the context of locking, although similar arguments can be made for other concurrency control techniques.
First, consider that the larger the data item size is, the lower the degree of concurrency permitted.
For example, if the data item size is a disk block, a transaction T that needs to lock a record B must lock the whole disk block X that contains B because a lock is associated with the whole data item (block).
Now, if another transaction S wants to lock a different record C that happens to reside in the same block X in a conflicting lock mode, it is forced to wait.
If the data item size was a single record, transaction S would be able to proceed, because it would be locking a different data item (record).
On the other hand, the smaller the data item size is, the more the number of items in the database. Because every item is associated with a lock, the system will have a larger number of active locks to be handled by the lock manager.
More lock and unlock operations will be performed, causing a higher overhead. In addition, more storage space will be required for the lock table.
For timestamps, storage is required for the read TS and write TS for each data item, and there will be similar overhead for handling a large number of items.
What is the best item size? The answer is that it depends on the types of transactions involved.
If a typical transaction accesses a small number of records, it is advantageous to have the data item granularity be one record. On the other hand, if a transaction typically accesses many records in the same file, it may be better to have block or file granularity so that the transaction will consider all those records as one (or a few) data items.
Multiple Granularity Level Locking
Since the best granularity size depends on the given transaction, it seems appropriate that a database system support multiple levels of granularity, where the granularity level can be different for various mixes of transactions.
Allow data items to be of various sizes and define a hierarchy of data granularities, where the small granularities are nested within larger ones
Can be represented graphically as a tree (but don't confuse with tree-locking protocol)
When a transaction locks a node in the tree explicitly, it implicitly locks all the node's descendents in the same mode.
Granularity of locking (level in tree where locking is done):
- Fine granularity (lower in tree): high concurrency, high locking overhead
- Coarse granularity (higher in tree): low locking overhead, low concurrency
Granularity Hierarchy
The levels, starting from the coarsest (top) level are
database
area
file
record
Intention Lock Modes
In addition to S and X lock modes, there are three additional lock modes with multiple granularity:
intention-shared (IS): indicates explicit locking at a lower level of the tree but only with shared locks.
intention-exclusive (IX): indicates explicit locking at a lower level with exclusive or shared locks
shared and intention-exclusive (SIX): the subtree rooted by that node is locked explicitly in shared mode and explicit locking is being done at a lower level with exclusive-mode locks.
intention locks allow a higher level node to be locked in S or X mode without having to check all descendent nodes.
Multiple Granularity Locking Scheme
Transaction Ti can lock a node Q, using the following rules:
1.The lock compatibility matrix must be observed.
2.The root of the tree must be locked first, and may be locked in any mode.
3.A node Q can be locked by Ti in S or IS mode only if the parent of Q is currently locked by Ti in either IX or IS mode.
4.A node Q can be locked by Ti in X, SIX, or IX mode only if the parent of Q is currently locked by Ti in either IX or SIX mode.
5.Ti can lock a node only if it has not previously unlocked any node (that is, Ti is two-phase).
6.Ti can unlock a node Q only if none of the children of Q are currently locked by Ti.
Observe that locks are acquired in root-to-leaf order, whereas they are released in leaf-to-root order.
Rule 1 simply states that conflicting locks cannot be granted. Rules 2, 3, and 4 state the conditions when a transaction may lock a given node in any of the lock modes. Rules 5 and 6 of the MGL protocol enforce 2PL rules to produce serializable schedules.
Disadvantageof Granularity at High Level:
Less concurrency level
Advantageof Granularity at High Level :
Locking is allowed at high level
Less complex
Easy to implement.
Type of granularity:
There are three types of granularity
Coarse granularity.
Medium granularity
Fine granularity.
1.Coarse granularity:
Each process holds a large number of sequential instructions and takes a substantial amount of time to execute.
2.Medium:
A middle ground where communication overhead is reduced.
3.Fine:
Each process contains a few sequential instructions.
Use mostly medium or coarse granularity .
SERVICE RELATED GRANULARITY:
There are different levels of granularity.
Service Granularity.
Constraint Granularity.
Capability Granularity.
Data Granularity.
GENERAL ARCHITECHUREOFTHEGRANULARITY: