<!Doctype html public "-//W3O//DTD/ W3 HTML 2.0//EN">

<HTML>

<Body>

<!—The XML DTD schema for the Movies Database -- >

<!ELEMENT main (maindirectors*)>

<!ELEMENT maindirectors (mainheader, films*) >

<!ELEMENT mainheader (dirid, dirfirst, dirname, format) >

<!ELEMENT films (film_id, title, year, director, producers, studios, fname, prc*, cat*, awards*, loc*, notes*)>

<!ELEMENT <film_id> </film_id>=<Comment: An internally generated id for the film. This is the key of the

relation entries and is unique. It is composed of director_id and a sequence number. .

<Comment+: All movies of a director are listed together in sequence, but only for some directors have all movies been entered..

<Comment+: The sequence numbers often have gaps to allow insertions when all movies for this director were not known at entry time at time of entry, a common occurrence. </Comment>

<!ELEMENT title=<Comment: The film's title. It is preceded by T: or Tn: depending on the source of the data. This field is not necessarily unique. If unknown, it is coded NKT for “No Known Title”</Comment>

<!ELEMENT year=<Comment: Year the movie was released. This is assumed to be an event (i.e., to take zero time)</Comment>

<!ELEMENT director=<Comment: Director of the movie, preceded by D:. The standardized id-name is used. All directors must appear as DIRECTORS in people.html, so that we have a proper reference constraint.

<Comment+: If there are multiple candidate directors the primary one or the one who finished the movie is chosen and other candidates are given in the notes field as CoD().</Comment>

<!ELEMENT producers=<Comment: Producer(s) of the movie, preceded by P: if shown in people.html and hence referencable by id_name . <Comment+: P: alone shows that there was no specific producer. <Comment+: If prefaced by PN: then the full name(s) is(/are) given; if prefaced by PS: then the spelling is uncertain. In both cases no reference to people can be expected.

PN: is common, since only few id-names for producers exist yet in the people.html files, except for producers who also

were DIRECTORS

<Comment+: :PU alone means the producer is unknown to me.

<Comment+: Multiple producers are permitted and common.</Comment>

<!ELEMENT studios=<Comment: Studio(s) where the movie was filmed. Common studio names appear in STUDIOS. If the studio is not known or uncommon its location may be given as SL:{COUNTRY-CODE}.

<Comment+: Unknown studios are prefixed by SU:

<Comment+: sometimes the distributing studio, where the distributor differs from the production studio, is shown prefixed by SD:.

</Comment>

<!ELEMENT prc=<Comment: Process used to make the movie (e.g. black and white as `bnw', col). Color processes may by specified as \COLOR-CODES. The code `cld' is used for black-and-white movies that have been colorized. Unknown is coded prc.</Comment>

<!ELEMENT cat=<Comment: Category of the film (e.g., suspense, mystery), as given in the

list of CATEGORIES. Unknown is coded Ctxx.</Comment>

<!ELEMENT awards=<Comment: Awards received by the film, separated by commas. The awards are

listed in AWARD (optionally followed by keywords such as 'Special')

and included actual awards as well as favorable (mostly) mentions

in compendia as Halliwell and Roger Ebert's books, with the

appropriate number of stars. A + symbol is a half star and

a - after the awardee code indicates a negative mention.

Unknown is coded aw. -H means not in Halliwell [4].</Comment>

<!ELEMENT lc=<Comment: Location where the film plays. Multiple locations are separated

by semicolons (;), multiple levels in any location hierarchy are

separated by commas, as `high-school, csd, CA'; indicating movie

location is a California high-school in the countryside. Codes used are

listed in the preamble of main.html. For countries other than the

USA the country name is given as well. Alternatives for country

names are `space' or `xxx ocean'. Unknown is lc.

If the period of the film is significant it is given as

T([[dd]mmm]yyyy).</Comment>

<!ELEMENT notes=<Comment: Here a variety of notes is kept. The preferred order is

chronological, as Book before Writer before Cost before Rating,

but this has not been entered consistently.

<Comment+: . All entries have a FIELD-IDENTIFIER

designator, as W(writer), R(rating), ... .

<Comment+: Fields as writers can have multiple entries, separated by commas.

If an award is associated with an entry, as an academy award for the

writers of a movie, it follows the name(s) after a semicolon (;).

For authors (also music directors), the title is specified as

B(author:book: "title")

<Comment+: Alt(T:title; reason) lists alternate titles, with a reason or date for the change, if known.

<Comment+: There is a general Notes field (Nt) which mainly record firsts,

as first sound movie, etc.

. <Comment+: Er() means possible error in the record, to be checked sometime in the

future from some source..

<Comment+: These note fields can be used to demonstrate the flexibility of

object-based structures, but are best place in distinct fields in relational

models.

<Comment+: Note entries SEEN and VT are private, indicating `when seen' or `have video

tape' information.</Comment>

<H4>2.2 -- ACTORS -- </H4> The file

<A href=" has

6813 `tr/td’ table row entries (July.1999) for many of the actors appearing in CASTS.<BR>

Also <A href=" part</A> and

<A href=" part </A> of actors.html, not maintained. <BR>

The key of the relation is

"stagename", and there are intervals indicating the dates that the

actor worked and the actor's lifetime. Other information in this relation is

the actor's real name, background, and the type of roles he/she typically

plays. References to images are kept here too.

<H4>2.3 -- DIRECTORS -- </H4> in file

<A href="

The file lists all directors, as well as some other movie people, as producers and cinematographers

(A total of 3290 `tr/td’ table row entries, 3011 `@’ directors as of July 1999). <BR>

The directors table is similar to the actors table in that it contains

intervals for when the director worked and when he/she lived. The key

of the relation is the field "name", which is the name under which the

director directed. Director's key names do not contain any blanks.

Typically the last name is used, when needed prefaced by an initial.

A secondary unique key is defined for each director, up to three letters,

based on the initial letters of the first, middle, and last names.

This key will provide HTML HREF linkages among many of the files.

As with the actors table, this table also includes the real name of

the director among its fields ("lastname" and "firstname"). It also

contains importat producers, cinematographers, musicians and composers, etc.

<H4>2.4 -- STUDIOS -- </H4> in file

<A href="

are important studios only (203 `tr/td’ entries, sparse information).<BR>

The key of the studios relation is the name of the studio. The temporal

information that is included is an interval indicating the years the

studio was (or is) in operation, represented by the fields "startdate" and

"enddate". This is a history relation.

<H4>2.5 -- CASTS -- </H4> in file

<A href="

This is a large (too large?) file of who acted as what in which movie.

(46 009 tr/td entries, only partial for movies and roletypes, July 1999).

Casts is an association relation, linking actors with movies. The key of the

relation is the catenation of the two fields "film_id" and "actor"; no

temporal information is included in this relation. <BR>

This file was too big for Netscape in 1996, so that also

five working subsets were made available, however these are not kept up-to-date.<BR>

<H4>2.6 -- REMAKES -- </H4> in file

<A href="

(1278 `tr/td’ entries in July 1999).<BR>

This table (which is not extensively used in the temporal DB paper) gives

information about movies that are remakes of other movies. It is very

useful to test recursion in databases.

<H4>2.10 -- AWARDS-RECEIVED -- </H4> is no longer a distinct file

Awards received for special occasions are listed with individual entries

in the files for ACTORS (actors.html).or MOVIE PEOPLE (people.html).

Regular awards associated with a particular movies are given in

<A href=" (main.html)</A>

and with a particular performance are listed in

<A href=" (casts.html)</A>.

<H4>2.11 -- REFERENCES -- </H4>

Books that provided material for this database are listed within this documentation file as

<A href=" A</A>.

<H4>2.12 -- GEOGRAPHY -- </H4>

Codes for countries and origins are listed within this documentation file as section 4.3:

<A href=" GEO</A>.

<H4>2.13 -- CATEGORIES -- </H4>

Codes for movie categories are listed within this documentation file as Section 4.4:

<A href=" CATS</A>.

<H4>2.14 -- COLOR-CODES -- </H4>

Codes for color processes used for movies are listed within this documentation file as Section 4.5:

<A href=" COLS</A>.

<H4>2.15 -- ROLE-TYPES -- </H4>

Codes that specify role-types for actors </H4>

are listed in the preamble for

<A href="

ROLES</A>.

<H4>2.16 -- FIELD-IDENTIFIERS -- </H4>

Codes that identify subfields

in various files are listed within this documentation file as Section 4.2:

<A href="

FIELDS</A>.

<H4>2.17 - AWARD TYPES -- </H4>

Lists the <A href=" types</A>

used in MAIN, ACTORS, and PEOPLE,

with the organizations who award them, and the span of years they were awarded.

<H4>2.19 -- IMAGES -- </H4>

there is a small collection of .tiff files for actors and directors.

They are kept individually in an images subdirectory.

<H4>2.20 -- ICONS -- </H4>

There are about a dozan icons to be used to identify

subfiles. Some of them come from the New Yorker Magazine Jan.1993.

There are kept individually in an icons subdirectory.

<HR>

<H2<A NAME="Sec3">3. Schema Definition</A> for the Movies Database</H2>

Here we give a detailed description of the schema of the movies

database, which is used for all examples in this paper and was used to

implement the temporal SQL additions. General descriptions are given

in Section 2, above.<BR>

This file is being updated to desctribe the HTML version. Where

updates were made, the old material is in curly {brackets}.

<H4>3.1 The MOVIES Table</H4>

Col-Name= Description<BR>.

There is a distinct table for each director (Hitchcock has multiple tables,

one for early silent, one for British, one for American, and one for TV movies).<BR>

The tables are broken up p by year of first known film by the directors. There are some

break and header records for each year.<BR>

Each director table has two types of records:

<OL>

<Comment+: one header record for the director, with the director id, as shown in people,

the first year known for movies by that director, prefixed by an @ symbol, matching the

people entry, and the standard name for the director, also matching the people entry.<BR>

The remainder shows the format for the data records that follow below..

The note field is often used to describe the set of detail records<BR>

For movies where the director in not known there is a dummy entry, either by topic (Unknown) or

by year (UnYear), as shown in the people file.

<Comment+: any number of records, one per film, formatted as shown below.

</OL>

<H4>3.2. The ACTORS Table</H4>

There is one record for each actor listed, but not all actors listed in CAST are documented.

There are also break and header records for each letter of the alphabet.<BR>

Col-Name = Description

<P>

<!ELEMENT stagenm=<Comment: Stagename of the actor. This is nearly the key of the table.

When an actor has used multiple names the last one used is preferred.

There are a few actors with identical names. Then the birthyear

(dob) becomes important.</Comment>

<!ELEMENT dowstrt=<Comment: Beginning of the "dates of work" interval: year of first movie</Comment>

<!ELEMENT dowend =<Comment: End of "dates of work" interval.</Comment>

<!ELEMENT birthnm=<Comment: Original last name.</Comment>

<!ELEMENT firstnm=<Comment: Original first name. Nick-names or other assumed names in ().</Comment>

<!ELEMENT gender=<Comment: coded as M,F, and X for unknown, G for group, and A for Animal.</Comment>

<!ELEMENT dob=<Comment: Date of Birth. If not found in [ref]. If found, but date unknown *.</Comment>

<!ELEMENT dod=<Comment: Date of Death, if unknown or alive coded as \UN. Year+ indicates also still alive in that year, mainly used for oldies.</Comment>

<!ELEMENT type=<Comment: Types of roles played by the actor; e.g., leading man, hero.</Comment>

<!ELEMENT origin=<Comment: Country of origin using COUNTRY-CODES</Comment>

<!ELEMENT photo=<Comment: Photos in reference books may be cited as [book.page(s)]</Comment>

<!ELEMENT notes=<Comment: Used mainly for Marriages(Mt), Lived-with(Lw), and Worked-with(W).

<Comment+: A code Cit(n) indicates how frequently the actor is cited in

CASTS.html. This field is used for maintenance, as a weight

of importance for completion of the data.</Comment>

<H4>3.3 The PEOPLE Table</H4>

Directors are the major subset of the general people.html table.

Other entries are significant producers, writers, art directors and some authors.

Being a director is indicated in the Pcode field, and has some effect on

other fields.

There are also break and header records for each letter of the alphabet.<BR>

Col-Name = Description

<P>

<!ELEMENT id-name=<Comment: The name of the movie person in standardized form. These names are made to be unique. Intials may be prependended, and special character codes omitted. This field is referenced by the

"director" field and by P:{references} in the MOVIES table. </Comment>

<!ELEMENT Pcode=<Comment: Code {PDWACGV} indicating that the movie person a Producer, Director, Writer, Actor, Cinematographer, choreoGrapher, or a Visual or art director. Just being an actor does not justify an entry here, for those

see the ACTORS table. </Comment>

<!ELEMENT Did=<Comment: If the person is a director (Pcode includes D), then this field contains an internally defined, unique 1, 2, or 3 letter identification code for the director, the director_id. It is made up by taking one or two letters of the first name, no or one letter of the middlename, and no, one or two letters of the family name of the director. Because of the high frequency of `John', it is encoded as `I'. This code is used a prefix to generate unique film_id's for all films directed by this director. </Comment>

<!ELEMENT yearstart=<Comment: First year of work, for directors the first year he/she directed a movie, it is preceded by a @. (start of the years interval).</Comment>

<!ELEMENT yearend=<Comment: Last year of work, or that the director directed.</Comment>

<!ELEMENT lastnm=<Comment: Given last name of the movie-person, may be spelled more precisely here than in the id-name field.</Comment>

<!ELEMENT firstnm=<Comment: Given first name of the movie-person. Nick-names or other assumed names in ().</Comment>

<!ELEMENT dob=<Comment: Date of Birth. If not found in [ref]. If found, but date unknown *</Comment>

<!ELEMENT dod=<Comment: Date of Death, or 190x</Comment>

<!ELEMENT backgrd=<Comment: The director's birth country. If unknown \Un.</Comment>

<!ELEMENT notes=<Comment: This field is as in "actors". Female movie-people are identified as Ge(F), as a partial index. Special awards (not associated with a film) are shown as Aw().</Comment>

<H4>3.4 The STUDIOS Table</H4>

Col-Name = Description

------

<P>

<!ELEMENT name=<Comment: Short name of the studio, may be standarized for reference.</Comment>

<!ELEMENT company=<Comment: Company that owns the studio.</Comment>

<!ELEMENT city=<Comment: City where the studio is located.</Comment>

<!ELEMENT country=<Comment: The studio's country.</Comment>

<!ELEMENT fddate=<Comment: Date the studio was founded or first opened.</Comment>

<!ELEMENT enddate=<Comment: Last date represented by the studio.</Comment>

<!ELEMENT founder=<Comment: The studio's founder.</Comment>

<!ELEMENT successor =<Comment: The fate of the studio.</Comment>

<!ELEMENT notes=<Comment: co founders, etc.</Comment>

<H4>3.5 The CASTS Table</H4>

The large CAST file is broken up into sections by initial letter(s) of the directors identifying code.

In each section will be a number of directors, ordered by code. There is a distinct table for each director.<BR>

Col-Name = Description

<OL> Each director table has two types of entries

<Comment+: one header record giving the director's id, name, and format information.

<Comment+: multiple records for each movie and listed actor. There are no headers for distinct movies.

</OL>

<P>

<!ELEMENT film_id=<Comment: Identifier of the film. All film_ids used here appear in MOVIES, and can be used as references.</Comment>

<!ELEMENT title=<Comment: Title of the movie, prefixed by T.

<Comment+: A prefix of TZ is used when he entry is uncertain, as the actor's name.

<Comment+: The title field is actually redundant here, because

is also given in MOVIES with the film_id. It is used to reduce the

the chances of errors and to reduce requirements for "joins".</Comment>

<!ELEMENT actor=<Comment: Name of the actor in this role, always using the standardized

stage_name, if the actor is listed in ACTORS.html

<Comment+: This field presents only a partial reference, illustrating dangling pointers.

<Comment+: If unknown, but role is important, then `sa' is used for `some actor'.</Comment>

<!ELEMENT roletype=<Comment: Type of the role. Similar to the "type" field in the ACTORS

table, but always encoded by a ROLE-TYPE. \Und means unassigned.</Comment>

<!ELEMENT role=<Comment: short description of the role prefixed by R:

<Comment+: If the trole is uncertain, the RZ: is used as the prefix

<Comment+: If the name used in the role is significant (as in Biographical Movies),

this role name follows in “quotes”, as R:king “Henry V”

<Comment+: If only the role name is known, then the prefix is RN:

<Comment+: If the role is unknown, then only RU: is entered.

</Comment>

<!ELEMENT awards=<Comment: Awards given to this actor for this role. Optional field.</Comment>

<!ELEMENT notes=<Comment: Rarely used; only for something exceptional in the performance, as `Nt(Garbo laughs)'. or Debut</Comment>

<H4>3.6 The REMAKES Table</H4>

Col-Name = Description