CS222P Fall 2017: Principles of Data Management, Quiz 4

Fall 2017, ICS # UCI, Prof. Chen Li

1. Answer the following short questions.

a. Why do we store sibling pointers on B+ tree leaf nodes.

b. Explain the benefits of linear hashing compared to extensible hashing.

c. Explain the advantages of B+ tree over hash indexes.

2. Consider the following B+ tree

Draw the result B+ tree after deleting key 50. Assume we need to redistribute data entries if possible. If there is no ambiguity, you can just draw the part that is changed.

3. Consider the linear hashing technique. Suppose you keep inserting entries into a hash index using linear hashing, which initially has 16 buckets. Buckets are splitted in a round-robin way as discussed in lecture. What is the maximum possible number of overflow buckets when this index reaches level 1? Explain your answer.

1.

a. We store them to facilitate range scans, since we can easily find the next leaf page. We also use them during delete/insert when we re-distribute entries among pages.

b. Linear hashing grows slower than extensible hashing; it also does not have an explicit directory structure.

c. B+ tree supports range search and equality search, while hash index only supports equality search. B+ tree itself is balanced, while hash index has overflow pages.

2.

3. Image an adversary workload which inserts all entries into the first bucket. The first bucket is split once, and all the rest become overflow buckets. Thus, before we reach level 1, we can have 16 - 1 = 15 overflow buckets.