Exam0

COSC 6340 (Data Management)

Feb. 13, 2001

Spring 2001

Your Name:

Your SSN:

For Spring 2001, I agree that my COSC 6340 grades and homework scores are posted using the last 4 digits of my ssn:

………………….(signature, if you like us to post your grades)

Problem 1 [19]:

Problem 2 [13]:

Problem 3 [8]:

Problem 4 [8]:

Problem 5 [20]

Problem 6 [8]

:

Grade:

The exam is “open books” and you have 75 minutes to complete the exam. The exam is slightly too long; I expect you to complete 90% of the problems in the available time.

1) Designing E/R Schemas [19]

Design an Entity-Relationship Diagram that models the following objects and relationships in the world of football (NFL): teams, players, games, managers and contracts. Each (NFL-) team has a

unique team name, and a city it plays in. Each person being part of the NFL-world has a unique ssn and a name. Additionally, for players their weight, height, position and birth dates are of importance. Players have a contract with at most one team and receive a salary for their services, and teams have at least 24 and at most 99 players under contract. Each team has one to three managers; managers can work for at most 4 teams and receive a salary for each of their employments. Players cannot be managers. A game involves a home-team and visiting-team; additionally, the day of the game, and the score of the game are of importance; teams play each other several times in a season (not on the same day!). Moreover, for each game played we like to know which players participated in the game and how many minutes they played.

Indicate the cardinalities for each relationship type; assign roles (role names) to each relationship if there are ambiguities! use sub-types, if helpful to express constraints!


























































2) B+-Trees [13]

Assume a relation R(A,B,C) with primary key A is stored using a B+-tree using attribute

C as the search key of the B+-tree (the intermediate nodes only contain attribute C and node pointers, whereas the leafs contain the tuples of the relation R). Attribute A requires

4 Byte of storage, attribute B requires 4 Byte of storage and attribute C requires 8 byte of

storage; B+-tree node pointers require 2 Byte of storage; R contains 1000000 (1 million) tuples, and we assume that one block can store 4096 Byte.

a. What parameters p and m would you choose for the B+-tree? [4]

p*2 + (p-1)*8 <= 4096 p = 410

m = 4096/(4+4 + 8) m = 256

b. *Now assume that the following query Q1 is given that returns 2000 answers:

Select A

from R

where C < 5523 and C > 2345

How would you implement Q1 trying to take advantage of the B+-tree? Using the results of question a, compute the number of block accesses of the best implementation of the query as precisely as possible, and explain your computations! [9]

Assume that nodes are completely filled. Use attribute C as search key of the B+-tree, start from the root, search for an entry with C > 2345, then go to the next level until leaf. The number of blocks accessed is the sum of the intermediate nodes and the leaf nodes accessed.

Because p * m = 410 x 256 < 1000000

p * p * m = 410 * 410 * 256 > 1000000

the height of the tree is 3; number of intermediate nodes accessed is 2.

number of blocks accessed = 2 + 2000 / 256 = 10

3) Extendible Hashing [8]

Consider an extensible hash structure where buckets can hold only two records is used. Initially the structure is empty. Assume the records, listed below, are inserted using the extensible hashing algorithm (we indicate the hashed key in parenthesis in binary format). Show how the hash-table and directory looks like after these records have been inserted (also indicate the local depth of each bucket)! Assume that the least significant bits are used for hashing records to buckets.

a [000010]

b [111110]

c [010111]

d [000000]

e [101011]

f [010111]

g [101100]

h [011011]

i [011010]

j [001110]

000

001

010

011

100

101

110

111
4) Questions concerning files and disks [8]

a) What is the most important difference between a tape and a disk? [2]

tape: sequential access only

disk: direct (random) access

b) What is the main drawback of using ordered files? [2]

certain operations such as insertion, deletion, et al, are expensive because of reordering the files

c) Disk arrays improve their performance through the use of data striping. What is data striping and why does it improve the performance of disk arrays? [4]

Data striping distributes data over several disks to give the impression of a single, very fast disk.

Data striping allows to access disks in parallel, which increases the I/O bandwidth.

5) SQL and Relational Algebra [20]

All the following queries you have to write refer to the Sailor, Boat, Reserves relations of page 121 of the textbook.

a)*Give a relational algebra expression that correctly implement the following query: “Give the names(s-name) of all sailors that reserved a green boat for 11/06/98 and a red boat for 11/06/98.” [9]

Temp1, sid ((color = 'green' Boats ) |X| (day = '11/06/98' Reserves)))

Temp2, sid ((color = 'red' Boats ) |X| (day = '11/06/98' Reserves)))

sname ((Temp1 ń Temp2) |X| Sailors)

b)Write SQL-queries that satisfy the following information requirements:

B1) “Give the identifier(sid) and names of all sailors that have a reservation for a green boat” [3]

Select S.sid, S.sname

from Sailors S, Boats B, Reserves R

where S.sid = R.sid AND R.bid = B.bid AND B.color = 'green';

B2) “Give the name, and color of all boats that have reservations for 11/14/98(that have been reserved by at least one sailor for this day) [2]

Select B.bname, B.color

from Boats B, Reserves R

where B.bid = R.bid AND R.day = '11/14/98';

B3) “Give name, and sid (sailor#) for all sailors that have reservations for 11/14/98 and for 11/15/98. [6]

Select S.sname, S.sid

from Sailors S, Reserves R1, Reserves R2

where S.sid = R1.sid AND S.sid = R2.sid And AND R1.day = '11/14/98' AND R2.day = '11/15/98';

6) Mapping E/R Diagrams to relational schemas [8]

Assume the following E/R diagram, given below is given. Map the E/R diagram to a relational database schema. Indicate primary keys of the relational schema by underlining the attributes that are part of the primary key. Specify foreign keys by using arrows.








U

PERSON (ssn, name)

EMPLOYEE (ssn, salary) COMPANY (cname)

HEALTH-INSURANCE-PROVIDER (pname)

HEALTH-INSURANCE (ssn, cname, pname, max-coverage)