Temporal DB

1. Valid time – the time in which information is true .

2. Transaction time – time associated with transaction that inserted this record.

- [ts, te] is associated with each record in DB

[start time, end time] – time range.

- continuous view of temporal data /printouts of heartbeats/

- timestamps – equal intervals (time series data).

Types of Queries Vd = [tsd, ted] - valid time for tuple d.

1)  Intersection query q: tuple d is retrieved if Vd Ç Vq ¹ Æ

2)  Inclusion query q: tuple d is retrieved if tsq ≤ tsd ≤ ted ≤ teq

3)  Containment query: tuple d is retrieved if tsd ≤ tsq ≤ teq ≤ ted

4)  Point query : tuple d is retrieved if tsd = tsq = teq = ted

(tuple has to be valid at a particular point in time)

Four types of DB:

1)  Snapshot – no support for temporal attribute

2)  Transaction time only given

3)  Valid time only given

4)  Bitemporal (both transaction and valid time given)

now – current time value.

Modeling Temporal Events:

1)  Markov Model (MM) – directed graph [V,A], V={v1, v2,…, vn} – states,

A = {<i,j>: vi, vj Î V }- arcs (show transitions between states).

Each arc is labeled with probability pij of transitioning from vi to vj .

At time t, one state is designated as current state vt , and probability of any future transitions depends only on vt . Transition probabilities are learned in training phase.

2) Hidden Markov Model (HMM) - directed graph [V,A], V={v1, v2,…, vn} –

states, A = {<i,j>: vi, vj Î V }- arcs (show transitions between states). Also, we assume:

Initial state distribution used to determine starting state.

Arc <i,j> is labeled with probability pij transitioning from vi to vj (this value is fixed).

O{o1, o2,…, ok} set of possible observations (given).

Each vi contains set of probabilities for each observation {pi1,…, pik}.

Example: Tossing two coins. First coin [ p(H)=1/2, p(T)=1/2]. Second coin [ p(H)=1/3, p(T)=2/3] is not fair. p11 = p12 = p21 = p22 = ½.

3) Time series for attribute A: {<t1, a1>, <t2, a2>,…, <tn, an>}. If points in time are well defined, we take < a1, a2,…, an> vector.

If <y1, y2,…, yn> - time series, then its subsequence is called time subseries.

Problems: Similarities between different time series. Predicting future value of an attribute.

Time Series Analysis (finding patterns in data):

Trends – systematic nonrepetitive changes to attribute values over time.

Cycles – observed behavior is cycle.

Seasonal – detected pattern based on time of a year.

Outliers.

Trend Analysis.

Smoothing – finding moving averages of attribute values (local average in a window is computed).

Correlation between two attributes with time series X, Y and means X, Y (Pearson’s coefficient)

R = [ å(xi - X )(yi - Y )]/sqr[å(xi - X )2 å(yi - Y ) 2].

Value of R close to 1 – attributes strongly correlated.

Value of R close to 0 – attributes not correlated at all.

Pattern Detection (time series): KMP, Boyer-Moore algorithms.

Sequences – ordered list of itemsets {s1, s2 ,…, sn}, where si Î I. (set of items).

Subsequence T = {ti1 ,…, tim} of S if (" j)( $ k)[( tij ≤ sk) & ij ≤ ij+1)].

Customer-sequence: sequence of itemsets purchased by customer.

Example:

Customer / Time / Itemset
C1 / 10 / AB
C1 / 20 / BC
C1 / 30 / D
C2 / 15 / ABC
C2 / 20 / D
C3 / 15 / ACD

S = {{A},{C}} – sequence.

Support of sequence S – percentage of total customers whose customer-sequence contains S. s(S)= 1/3.

Confidence of S Þ T : ratio of the number of customer-sequences that contain both S and T to the number that contain S.

SPADE Algorithm.

A,B,C,D – attributes.

{A}

Customer / Time
C1 / 10
C2 / 15
C3 / 15

{B}

Customer / Time
C1 / 10
C1 / 20
C2 / 15

{C}

Customer / Time
C1 / 20
C2 / 15
C3 / 15

{D}

Customer / Time
C1 / 30
C2 / 20
C3 / 15

Candidates (2-sequences):

{AB}

Customer / Time
C1 / 10
C2 / 15

({A}, {B})

Customer / Time
C1 / 10, 20

{AD}

Customer / Time
C3 / 15

({A},{D})

Customer / Time
C1 / 10, 30
C2 / 15, 20

{AC}

Customer / Time
C2 / 15
C3 / 15

({A},{C})

Customer / Time
C1 / 10, 20

{BC}

Customer / Time
C1 / 20
C2 / 15

({B},{C})

Customer / Time
C1 / 10, 20

Example explaining the next step:

({B}, {BC}, {DE}), ({AB}, {BC}, {D}) – have the overlap ({B}, {BC}, {D}) and because of this, two candidate sequences are generated:

({AB}, {BC}, {DE}), ({AB}, {BC}, {D}, {E}).

In our example, from {AB}, ({B}, {C}) we generate:

({AB},{C})

Customer / Time
C1 / 10, 20

From ({A}, {B}), {BC} we generate:

({A},{BC})

Customer / Time
C1 / 10, 20