Temporal DB
1. Valid time – the time in which information is true .
2. Transaction time – time associated with transaction that inserted this record.
- [ts, te] is associated with each record in DB
[start time, end time] – time range.
- continuous view of temporal data /printouts of heartbeats/
- timestamps – equal intervals (time series data).
Types of Queries Vd = [tsd, ted] - valid time for tuple d.
1) Intersection query q: tuple d is retrieved if Vd Ç Vq ¹ Æ
2) Inclusion query q: tuple d is retrieved if tsq ≤ tsd ≤ ted ≤ teq
3) Containment query: tuple d is retrieved if tsd ≤ tsq ≤ teq ≤ ted
4) Point query : tuple d is retrieved if tsd = tsq = teq = ted
(tuple has to be valid at a particular point in time)
Four types of DB:
1) Snapshot – no support for temporal attribute
2) Transaction time only given
3) Valid time only given
4) Bitemporal (both transaction and valid time given)
now – current time value.
Modeling Temporal Events:
1) Markov Model (MM) – directed graph [V,A], V={v1, v2,…, vn} – states,
A = {<i,j>: vi, vj Î V }- arcs (show transitions between states).
Each arc is labeled with probability pij of transitioning from vi to vj .
At time t, one state is designated as current state vt , and probability of any future transitions depends only on vt . Transition probabilities are learned in training phase.
2) Hidden Markov Model (HMM) - directed graph [V,A], V={v1, v2,…, vn} –
states, A = {<i,j>: vi, vj Î V }- arcs (show transitions between states). Also, we assume:
Initial state distribution used to determine starting state.
Arc <i,j> is labeled with probability pij transitioning from vi to vj (this value is fixed).
O{o1, o2,…, ok} set of possible observations (given).
Each vi contains set of probabilities for each observation {pi1,…, pik}.
Example: Tossing two coins. First coin [ p(H)=1/2, p(T)=1/2]. Second coin [ p(H)=1/3, p(T)=2/3] is not fair. p11 = p12 = p21 = p22 = ½.
3) Time series for attribute A: {<t1, a1>, <t2, a2>,…, <tn, an>}. If points in time are well defined, we take < a1, a2,…, an> vector.
If <y1, y2,…, yn> - time series, then its subsequence is called time subseries.
Problems: Similarities between different time series. Predicting future value of an attribute.
Time Series Analysis (finding patterns in data):
Trends – systematic nonrepetitive changes to attribute values over time.
Cycles – observed behavior is cycle.
Seasonal – detected pattern based on time of a year.
Outliers.
Trend Analysis.
Smoothing – finding moving averages of attribute values (local average in a window is computed).
Correlation between two attributes with time series X, Y and means X, Y (Pearson’s coefficient)
R = [ å(xi - X )(yi - Y )]/sqr[å(xi - X )2 å(yi - Y ) 2].
Value of R close to 1 – attributes strongly correlated.
Value of R close to 0 – attributes not correlated at all.
Pattern Detection (time series): KMP, Boyer-Moore algorithms.
Sequences – ordered list of itemsets {s1, s2 ,…, sn}, where si Î I. (set of items).
Subsequence T = {ti1 ,…, tim} of S if (" j)( $ k)[( tij ≤ sk) & ij ≤ ij+1)].
Customer-sequence: sequence of itemsets purchased by customer.
Example:
Customer / Time / ItemsetC1 / 10 / AB
C1 / 20 / BC
C1 / 30 / D
C2 / 15 / ABC
C2 / 20 / D
C3 / 15 / ACD
S = {{A},{C}} – sequence.
Support of sequence S – percentage of total customers whose customer-sequence contains S. s(S)= 1/3.
Confidence of S Þ T : ratio of the number of customer-sequences that contain both S and T to the number that contain S.
SPADE Algorithm.
A,B,C,D – attributes.
{A}
Customer / TimeC1 / 10
C2 / 15
C3 / 15
{B}
Customer / TimeC1 / 10
C1 / 20
C2 / 15
{C}
Customer / TimeC1 / 20
C2 / 15
C3 / 15
{D}
Customer / TimeC1 / 30
C2 / 20
C3 / 15
Candidates (2-sequences):
{AB}
Customer / TimeC1 / 10
C2 / 15
({A}, {B})
Customer / TimeC1 / 10, 20
{AD}
Customer / TimeC3 / 15
({A},{D})
Customer / TimeC1 / 10, 30
C2 / 15, 20
{AC}
Customer / TimeC2 / 15
C3 / 15
({A},{C})
Customer / TimeC1 / 10, 20
{BC}
Customer / TimeC1 / 20
C2 / 15
({B},{C})
Customer / TimeC1 / 10, 20
Example explaining the next step:
({B}, {BC}, {DE}), ({AB}, {BC}, {D}) – have the overlap ({B}, {BC}, {D}) and because of this, two candidate sequences are generated:
({AB}, {BC}, {DE}), ({AB}, {BC}, {D}, {E}).
In our example, from {AB}, ({B}, {C}) we generate:
({AB},{C})
Customer / TimeC1 / 10, 20
From ({A}, {B}), {BC} we generate:
({A},{BC})
Customer / TimeC1 / 10, 20