I Assume That the Two Poisson Model Is Better for Ranking When You Want to Search Similar

Matthew Carroll

INLS 172: Losee

I assume from my tests with the twoinformation retrieval (IR) models for this homework that the Two Poisson model is better for ranking when you want to search documentsthat are both relevant and non-relevant yet have key terms. Binary does not discern term frequency, but neither does it inadvertently punish docs that have fewer occurrences of relevant terms, assuming that you don’t recalculate for document length.

In my first set, (a), Two Poisson Independence (TPI) and binary Independence (BI) have the same ASL, even though document two has many relevant terms while document one only has one. As long as both documents’ relevance was stated as True, the search lengths were always the same.

In my second set, (b), I had the same model setup but the second document was changed to be non-relevant.It seems from this test that document relevance doesn't matter when factoring ASL between TPI and BI. As long as the term occurrences and relevancy arguments are the same, the search lengths appear to always be the same here as well. Term weights do vary however, but with these two models, the different term weights do not change rankings.

In my third set, (c), when terms appear in multiples between the two documents, and one document is relevant while the other is not, the TPI model seems to works better. It both assigns relevant document weights and has a shorter ASL than BI.

(a)

TPI and BI have same ASL, even though document 2 has many relevant terms.

In[1]:= <myirbi.m

Losee's Nyltiac IR System Version 0.8

Similarity=InnerProduct Query uses weighting True for type bitdv

Feature Vector: {t1, t2}

Query={{1, 1}}

Documents={{{100, 1, 0}, {200, 1, 1}}}

Relevance={{True, True}}

p1 = 0.999 q1 = 0.001

p2 = - q2 = 0.001

t1 Weight= 19.9287

t2 Weight= 9.96434

{100, 1, 0} Doc Weight= 19.9287 Relevant= True

{200, 1, 1} Doc Weight= 29.893 Relevant= True

{Eprd (Average Search Length) =, 1.5}

In[2]:= <myirtpi.m

Losee's Nyltiac IR System Version 0.8

Similarity=InnerProduct Query uses weighting True for type tpitdv

Feature Vector: {t1, t2}

Query={{1, 1}}

Documents={{{100, 1, 0}, {200, 6, 1}}}

Relevance={{True, True}}

avg rel 1 = - avg nonrel 1 = 0.001

avg rel 2 = - avg nonrel 2 = 0.001

t1 Weight= 11.7731

t2 Weight= 8.96578

{100, 1, 0} Doc Weight= 11.7731 Relevant= True

{200, 6, 1} Doc Weight= 79.6046 Relevant= True

{Eprd (Average Search Length) =, 1.5}

(b)

ruby(mcarroll): ~/IRClass -> math

Mathematica 4.1 for Sun Solaris

-- Terminal graphics initialized --

In[1]:= <xmytpi.m

Losee's Nyltiac IR System Version 0.8

Similarity=InnerProduct Query uses weighting True for type tpitdv

Feature Vector: {t1, t2}

Query={{1, 1}}

Documents={{{100, 2, 0}, {200, 1, 1}}}

Relevance={{True, False}}

avg rel 1 = 2 avg nonrel 1 = 1

avg rel 2 = 0.001 avg nonrel 2 = 1

t1 Weight= 1.

t2 Weight= -9.96578

{100, 2, 0} Doc Weight= 2. Relevant= True

{200, 1, 1} Doc Weight= -8.96578 Relevant= False

{Eprd (Average Search Length) =, 1.}

In[1]:= <xmybi.m

Losee's Nyltiac IR System Version 0.8

Similarity=InnerProduct Query uses weighting True for type bitdv

Feature Vector: {t1, t2}

Query={{1, 1}}

Documents={{{100, 1, 0}, {200, 1, 1}}}

Relevance={{True, False}}

p1 = 0.999 q1 = 0.999

p2 = 0.001 q2 = 0.999

t1 Weight= 0.

t2 Weight= -19.9287

{100, 1, 0} Doc Weight= 0. Relevant= True

{200, 1, 1} Doc Weight= -19.9287 Relevant= False

{Eprd (Average Search Length) =, 1.}

(c)

In[1]:= <xmytpi.m

Losee's Nyltiac IR System Version 0.8

Similarity=InnerProduct Query uses weighting True for type tpitdv

Feature Vector: {t1, t2}

Query={{1, 1}}

Documents={{{100, 3, 1}, {200, 2, 3}}}

Relevance={{True, False}}

avg rel 1 = 3 avg nonrel 1 = 2

avg rel 2 = 1 avg nonrel 2 = 3

t1 Weight= 0.584963

t2 Weight= -1.58496

{100, 3, 1} Doc Weight= 0.169925 Relevant= True

{200, 2, 3} Doc Weight= -3.58496 Relevant= False

{Eprd (Average Search Length) =, 1.}

In[1]:= <xmybi.m

Losee's Nyltiac IR System Version 0.8

Similarity=InnerProduct Query uses weighting True for type bitdv

Feature Vector: {t1, t2}

Query={{1, 1}}

Documents={{{100, 1, 1}, {200, 1, 1}}}

Relevance={{True, False}}

p1 = 0.999 q1 = 0.999

p2 = 0.999 q2 = 0.999

t1 Weight= 0.

t2 Weight= 0.

{100, 1, 1} Doc Weight= 0. Relevant= True

{200, 1, 1} Doc Weight= 0. Relevant= False

{Eprd (Average Search Length) =, 2.}