Tracking the Evolution of News Events

Christopher C. Yang

Department of Systems Engineering and Engineering Management

The ChineseUniversity of Hong Kong

Xiaodong Shi

Department of Systems Engineering and Engineering Management

The ChineseUniversity of Hong Kong

Abstract

In topic detection and tracking, news stories are monitored and automatic techniques are used to spot new events and track the progress of previously spotted events. However, the traditional document clustering techniques only organize news stories into a flat hierarchical structure like uncorrelated clusters that represent news events or topics. Such an organization is not capable of presenting the complex relationships between news events. Users are unable to capture how the events start, evolve and end without a graphical evolution presentation. An event evolution graph is effective in presenting the rich underlying structure of events and allows efficient and meaningful information browsing. As a result, users are not only able to retrieve news stories of the same topic but also able to capture their evolution.

In this paper, we define three rules of event evolution relationships. The event evolution includes event threading and event joining, where an event may evolve to several events or an event may be evolved from several events. The events and event evolution relationship will be presented in an event evolution graph. The construction of event evolution graph has two phases, (i) event clustering, and (ii) event evolution relationship identification. In this paper, we focus on the second phase. Several techniques are investigated and examples are presented.

Keywords: Event evolution, Event threading,Topic detection,Tracking

1. Introduction

Due to the popularity of the Internet and electronic commerce, news documents are available at the news information providers’ Web sites. Users can easily go to any newswires such as CNN, BBC, CBS, etc. to download the news articles any time and anywhere. There are infomediaries available to search across multiple newswires for any queries made by users. For example, users may make a query of “Hurricane Katrina” at Google. Several hundred thousands of news articles from a few dozens of newswires will be returned as result in less than a second (Figure 1). It is extremely convenience for users to retrieve documents for a particular query; however, the tremendous volume of news documents makes it impossible for the users to capture the flow of the stories (i.e. evolution of events) efficiently and effectively.

Figure 1 Searching news documents related to “Hurricane Katrina” at Google News.

Research in topic detection and tracking (TDT) supports the monitoring of news stories and automatically spots new events and tracks the progress of previously spotted events. The tasks in TDT [4] include

  • Story segmentation
  • Topic detection
  • First story detection
  • Link detection
  • Topic tracking

Story segmentation determines the boundaries for cohesive text fragments. It is used for segmenting stories obtained form audio sources but it is not required for newswire sources. Topic detection clusters stories of the same topic. First story detection determines if the new document reports a new topic that has not been reported yet in the current identified topics. Link detection determines if two documents are of the same topic. Topic tracking tracks the topic to find all following stories. Allan et al. [5, 6] and Yang et al. [14, 15, 16]have investigated several techniques such as group-average clustering, single-pass clustering, kNN, and natural language processing. However, TDT only organize news stories in a flat hierarchical structure like uncorrelated clusters that represent news events of the same topics. As a result, users are only given lists of news stories belonging to different events of the news topic. Users are unable to capture how events start, evolve and end in a news topic. It is difficult to capture the underlying complex structure of news events within a news topic without identifying the evolution relationship between events.

Figure 2 Flat hierarchical structure of organization of documents in TDT.

Temporal text mining (TTM) is a research area concern with the discovery of temporal patterns in text information over time based on the time stamps of the text streams. Mei and Zhai [11] has proposed a new approach to discover the evolutionary theme patterns of news stream and present the result as a them evolution graph. A theme is defined as a probabilistic distribution of words that characterizes a semantically coherent topic or subtopic. The text stream is partitioned into n non-overlapping sliced intervals. The theme of each interval is identified and the evolution of theme between successive intervals is extracted. An example of a theme evolution graph is given in Figure 3. However, such theme evolution graph only presents the transition of themes across successive time intervals. It does not capture the events of the news topic nor extract the event evolution relationship among the events. Users can only obtain the information about the theme for a predefined interval and realize if there is a theme transition between consecutive time intervals.

Figure 3 An example of a theme evolution graph.

Event evolution is a concept developed recently. Makkonen [10] was the first to conduct investigation on event evolution as a subtopic of TDT. The news documents within a topic are temporally linearly ordered. A narrative begins when the first story of the topic is detected. A seminal event may lead to several other events. The events at the beginning may have more influence on the events coming immediately after than the events at the later time. As we go through the event in the temporal order, we may see the evolution of events within a topic. For example, the out break of a war may evolve to the economic crisis and then evolve to the problem of refugees. The events and the event evolution relationship can be represented as a graph structure (Figure 4). Nallapati et al. [12] has investigated the dependencies among events and developed a few simple models to determine the event threading relationships. However, the existing model is rather too simple to capture the complex relationships among the events in a large collection of documents within a topic. The concept of event evolution has not been well defined. In this work, we formally define the event evolution by three logical rules. We also introduce some techniques to capture the complicated event threading and event joining relationships.

Figure 4 An example of event evolution graph.

2. Definitions

The definition of event evolution is based on the definitions of story, event, and topic in TDT. Without a clear understanding of story, event and topic, there will be many possible interpretations of event evolution. In this section, we shall revisit the definitions of story, event and topic with some modifications to support the definition of event evolution.

Story – A story is a news article delivering some information to users (i.e. reporting a particular event).

Each news document describes a unique story and each story discusses a single event. As a result, a story is the smallest atomic unit in the hierarchy of [topic / event / story]. Practically, each story is a news document downloaded from the newswire. Each story includes a few components such as title, timestamp, source, abstract, full text, and images.

Event – An event is something that happens at some specific time but may or may not be at a specific place.

In the previous definitions of event, it always specifies a place that the event occurs. It is true for most cases. However, there are some events that do not have any specific location. For example, the event about the reaction of the people after the event of the death of the Pope John XII describes the mourning of the people all around the world. It does not have any specific location of the event. Each event is composed of a set of stories. Practically, there is usually more than one story from the newswire that is describing an event. For example, a story may describe the mourning of the people in New York while another story may describe the mourning of the people in Italy. The stories within an event are semantically coherent.

Topic – A topic is a set of events that are strongly interconnected with each other.

In general, a topic is started by a novel seminal event that is followed by other related events.

Event evolution is a concept that is developed recently. Makkonen [10] described event evolution as the changing nature of a topic. Nallapati et al. [12] defined event threading as detecting events within a topic and capturing the dependencies among the events. In general, the evolution of events is describing some kinds of relationships (or dependencies) between the events within a topic where these relationships are narrating the changes of events from the seminal events to the final events along the timeline. There are variations of the interpretations of event evolution although there are some initial techniques to determine the relationships between events. Without a formal definition of event evolution, it is impossible to determine the objectives and evaluate and benchmark the proposing techniques. In this work, we define the event evolution as follow.

Event Evolution – When an event evolves to another event within a topic, there is an event evolution relationship between these two events.

In addition, we formally define the event evolution relationship with three rules.

Event Evolution Relationship – An event evolution relationship is the directional logical dependencies or relatedness between two events within a topic. The event evolution relationship from event A to event B must follow the three rules in below.

  1. Event A must temporally precede event B.
  2. Event A must be the necessary and/or the sufficient condition of event B.
  3. The event evolution relationship must coincide with the user information needs.

The event evolution relationship must be directional. There must be a starting event and an ending event. The event evolution relationship between any two events is not symmetric. “Hurricane Katrina hits New Orleans” evolves to “Flooding caused severe damage in New Orleans” but “Flooding caused severe damage in New Orleans” cannot evolve to “Hurricane Katrina hits New Orleans”.

The above three rules describe the conditions to validate the existence of the event evolution relationships. In the first rule, it describes the constraint of the temporal order between event A and event if event A evolves to event B. Event A must occur before event B. In the next section, we shall further discuss the temporal preceding relationship in terms of event timestamp and Allen’s temporal relationships [8].

In the second rule, when event A evolves to event B, there are some necessary and/or sufficient conditions that event A possesses so that event B to occur. In some cases, event B will not occur unless event A occurs. In these situations, event A is considered as the necessary condition for event B to occur. In some other cases, event A occurs, and therefore, event B or other events may occur. In this satiation, event A is considered as a sufficient condition for event B to occur. The dependences between event A and event B can be a causal relationship or corollary relationship. For example, “Columbia space shuttle crash” evolving to “NASA conducted investigations into the shuttle crash” is a casual relationship. “Columbia space shuttle crash” can be considered as the necessary condition for “NASA conducted investigations into the shuttle crash”. NASA will not conduct any investigations unless there is a shuttle crash. Considering another example, “Yasser Arafat passed away” evolves “The new leader of Palestinian Liberation Organization was elected”. “Yasser Arafat passed away” is a sufficient condition of “The new leader of Palestinian Liberation Organization was elect”. Election of a new leader is a natural consequence of the death of the current leader; however, it is not a necessary condition. Election of a new leader can also be the consequence of other events, for example, the term of the current leader is ended.

In the third rule, it eliminates any unrelated relationship according to the information needs of users. In some situations, two events may follow the first two rules of event evolution relationship. However, the dependency between theses events does not provide any information value for the users to understand the evolution of the events within the news topic. Including such relationship will only create excessive information overheads but it is not helpful to the users.

3. Temporal Relationships

Event timestamps are used to determine the temporal relationships of the corresponding events. The event timestamp of an event can be determined from the stories that the event contains. Event timestamp is implemented as a time interval, which is the duration of time between two spot times along a linear timeline with one ahead or equal to another. The anterior spot time is the start time (s) and the posterior spot time is the end time (e) of the timestamp (t). The temporal distance between two timestamps, t1 = [s1,e1] and t2 = [s2,e2], is denoted as d(t1,t2). d(t1,t2) is computed as the difference between e1and s2 if e1 is before or equal to s2; otherwise, d(t1,t2) equals to 0. The computation of temporal distance is illustrated in Figure 5.

Allen [3] has presented 13 possible temporal relationships. These relationships are presented and illustrated in Table 1. According to the first rule of our definition on event evolution relationship, “Event A must temporarily precede event B if event A evolves to event B”, there are only 8 temporal relationships in Table 1 obey this rule if the timestamps of event A and event B are t1 and t2 respectively. These temporal relationships are t1 < t2, t1 m t2, t1 o t2, t1 di t2, t1 s t2, t1 si t2, t1 fi t2, and t1 = t2.

Figure 5. Computation of temporal distance between t1 and t2.

Table 1. Allen’s Temporal Relationships

Temporal Relationships / Illustration / Obey 1st rule of event evolution relationship?
1 / t1 < t2(t1 before t2) / s1 e1 s2 e2
/ Yes
2 / t1 > t2(t1 after t2) / s2 e2 s1 e1
/ No
3 / t1 m t2(t1 meets t2) / s1 e1
s2 e2 / Yes
4 / t1 mi t2(t1 met by t2) / s1 e1
s2 e2 / No
5 / t1 o t2(t1 overlaps t2) / s1 e1
s2 e2 / Yes
6 / t1 oi t2(t1 overlapped t2) / s1 e1
s2 e2 / No
7 / t1 d t2(t1 during t2) / s1 e1
s2 e2 / No
8 / t1 di t2(t1 contains t2) / s1 e1
s2 e2 / Yes
9 / t1 s t2(t1 starts t2) / s1 e1
s2 e2 / Yes
10 / t1 si t2(t1 started by t2) / s1 e1
s2 e2 / Yes
11 / t1 f t2(t1 finishes t2) / s1 e1
s2 e2 / No
12 / t1 fi t2(t1 finished by t2) / s1 e1
s2 e2 / Yes
13 / t1 = t2(t1 equal to t2) / s1 e1
s2 e2 / Yes

The other 5 temporal relationships will be utilized to filter those pairs of events that have invalid event evolution relationships.

4. Event Similarity

After filtering the pairs of evens that do not follow the first rule of event evolution relationship, we make use of the event similarity to identify the pairs of events that follow the second and third rules of event evolution relationship. When there are some types of dependency between two events, the content of these events usually have certain level of similarity. For example, the persons involved in these events are most likely the same. The location where these events occur is the same. The stories contained in the second event usually describe briefly what has happened in the event that it evolves from. On the other hand, if two events are independent from each other, their similarity is usually low. As a result, we utilize the event similarity to determine if there is an event evolution relationship between any two events.

In this work, we utilize the cosine similarity of terms spaces that are representing two events to measure the event similarity. A k-term feature vector, ωi = {ωi1,ωi2, …, ωik}, is used to represent a story i. The traditional TF-IDF function is utilized to compute ωik.

where tfik is the frequency of term k in story i. N is the total number of documents in that topic and dfk is the number of stories which contains term k in that topic.

We compute the event term vector of event j using the average of the document term vectors of stories that belong to event j. We define the event term vector for event j as ω’j = {ω’j1,ω’j2, …, ω’jk} where,

where njis the number of stories in S that belongs to event j.

The event content similarity is computed as follow:

5. Event Evolution Graph

Given the temporal relationships and event similarity, we build an event evolution graph to provide a graphical presentation of how events evolve within a news topic to the users. An event evolution graph is a directed acyclic graph,G,consisting of events as the nodes and event evolution relationships as the directed edges between nodes.

Given a set ofn distinctnews stories S = {s1, s2,∙∙∙, sn} on a given news topic,we have a set of m events E = {e1, e2, ∙∙∙, em} and their event timestamps T = {t1, t2, ∙∙∙, tm}. We can build a one-to-one mapping function τ that maps each event to its corresponding event timestamp:

τ(ek) = tk (1 ≤k≤m)

Each of the n stories is assigned to one of the m events.

ƒ(si) = ek iff siek (1 ≤i≤n, 1 ≤k≤m)

We denote the event evolution relationship from event ei to ej, if existing and valid, as (ei, ej) (i≠j). A directed edge from vertex ei to ejis created in the event evolution graph accordingly. Event ei the parent of event ej and event ejis the child of event ei.

The set of valid event evolution relationships betweenm events is represented a set of directed edges L = {(ei, ej)} in the event evolution graph where ei, ejE. Combining the set of m events E as the nodes and the set of event evolution relationships L as the directed edges between the nodes, we have a directed acyclic graph G = {E, L}.

There are two steps in verifying the validity of an event evolution relationship between any pair of events, ei and ej, in E.

  1. ei evolves to ej if and only if ei and ej follow any one of the eight temporal relationships as described in Section 0. i.e. ti < tj, ti m tj, ti o tj, ti di tj, ti s tj, ti si tj, ti fi tj, and ti = tj.
  2. ei evolves to ej if and only if cos_sim(ei,ej) > threshold.

6. Examples

In this section, we use two examples to illustrate the construction of event evolution graphs.

Example 1