Knowledge Domain Visualization NWB Portal Interface Design Specification (KDVis-PISpec)

Katy Borner, Indiana University,

October 21st, 2005

Katy responded to Shashi’s comments on Nov 22nd, 2005

This specification explains the interface intended for use by non-experts with an interest to design simple knowledge domain visualizations for specific areas of sciences or geospatial regions. Based on the interface spec a services model will be detailed and subsequently a software design specification will be developed.

A user of this portal will go though five mMajor steps are:

·  Specify information need (select specific analyses/visualizations)

·  Upload or select dataset

·  Examine and confirm dataset (based on simple statistics)

·  Request specific aAnalysis and visualization

·  Download results and documentation

Note that this portal uses the IVCDB and the IVCSF.

None of the specified functionality requires research. We know exactly how to do provide these services and all that needs to be done is to implement/offer these services in an easy to use way.

Obviously, all user input needs to be checked for completeness and -- insofar possible -- for correctness. I will specify this in more detail in the future.

Conventions used:

[Label] Button with Label

{Choice} Pull Down Menu

j Toggle Button

e Check Box

… Input Field

IF (condition) GOTO xx // do if condition holds

Comments that are not shown on the user interface are given in green.

Subsequently, I specify the interfaces that a user would get to see when using this portal.

0. Specify Information Need

a) Simple Statistics

I am interested in

e author(s)

e institution(s)

e geospatial region(s)

e scientific domain(s)

Top n Authors with many

e papers

e citations

e co-authors

e patents

e grants

e co-PIs

Institutions with many

e papers

e citations

e patents

e grants

Geospatial regions with many

e papers

e citations

e patents

e grants

Knowledge domains with many

e papers

e citations

e patents

e grants

[Select all] – selects all check boxes

b) Evolution over time

any of the above over time

e Bursts of Activity

(sudden changes of relative word frequency over time)

c) Cross Maps (by S. Morris)

d) Networks

All networks can be presented ordered over time or topics as well as overlaid over geospatial or semantic space.

e Co-Author Networks (Evolving? j yes j no)

Click on image leads to sample vis and short description of what can be learned.

{See http://iv.slis.indiana.edu/ref/iv04contest/Ke-Borner-Viswanath.gif for an example}

e Paper-Citation Networks (Evolving? j yes j no)

The images here are too rich – we need simpler ones that make clear that the same data can be displayed using different substrate maps.

All subsequent user selections will be matched against the users’ information needs. The user will be informed if the data s/he selected does not support the generation of any of the selected analyses and visualizations.

[Next]

IF (next) was selected GOTO 1

1. Get Dataset

j Upload Dataset j Select Dataset

(must be in ISI, bibtex, or EndNote format)

IF (upload dataset was selected) GOTO 2.1

IF (select dataset was selected) GOTO 2.2

2. Select Dataset

2.1 Upload Dataset

……………………… [Browse]

Multiple files can be uploaded.

System Do users need to define tautomatically detectshe file type or can we automatically detect it?.

[Next]

[Back]

IF (next was selected) GOTO 3.1

IF (back was selected) GOTO 1

2.2 Select Dataset

Specify

Author Set

Institution Set

Geospatial Area {Worldwide | US | US States}

Knowledge Domain {Major knowledge domains: Math | Physics | Chemistry | CS} multiple domains can be selected

[Show available data]

IF (show available data was selected) then list datasets available in the IVCDB for selection:

Datasets on Physics in US Years Covered
e Physical Review xxxx-xxxx

e United States Patents xxxx-xxxx

e NSF Grants xxxx-xxxx

e NIH Grants xxxx-xxxx

Etc.

‘Physics in US’ depends on selection in 1.

Years: …………

[Next]

[Back]

IF (next was selected) GOTO 3

IF (back was selected) GOTO 1

3. Examine and Confirm Dataset

Simple statistics of the dataset are provided – analogously to the IVCDB statistics. In general, the number of unique documents, unique authors, unique institutions, etc. is given for each dataset selected in 2.

The dataset can be downloaded in tab delimited format for closer examination, cleaned by hand if necessary and uploaded again as needed. For licensing reasons, people might not be able to download the complete dataset selected from IVCDB.

If a dataset was uploaded, its format needs to be examined; available fields need to be identified, data cleaning needs to be performed, etc. I hope we can largely use the code that Weimao and Colin worked on.

[Next]

[Back]

IF (next was selected) GOTO 4

IF (back was selected) GOTO 2

4. Analysis and Visualization

The user sees an interface that acts as a portal to all analyses and visualizations s/he specified under 0. AND that can be computed given the data available.

Let’s assume s/he is interested to get an evolving co-author network and a paper-citation network. In this case s/he would see:

Selected Information Needs

Evolving co-author network [Generate] IF selected GOTO 4.1

Paper-citation network [Generate] IF selected GOTO 4.2

[Next]

[Back]

IF (next was selected) GOTO 5

IF (back was selected) GOTO 3

4.1 Generate Evolving Co-Author Network

Specify Reference System

j Ordered in Time j Geospatial Map j Semantic Space

(provide sample visualizations here)

Specify Number of Time Steps

Start Year: {years covered}

End Year: {years covered}

Years per time slice: {1, 2, 3, …}

j Disjoint slices j Accumulating slices j Overlapping slices (# years for overlap: …)

Define Data Mapping

Node size: {options}

Node color: {options}

Edge width: {options}

Edge color: {options}

Labels: {Names, numbers, and in what format}

Label size: ……..

Label type font: {options}

Label color: ………….

(provide default values as much as possible)

[Next]

[Back]

IF (next was selected) GOTO 4.1b

IF (back was selected) GOTO 4

4.1b Generate Visualization

The visualization together with a legend and ‘copyright for the KDVis-Portal’ is given as an image together with text of what data was cleaned, analyzed and laid out in what way.

[Back]

IF (back was selected) GOTO 4

4.2 Generate Paper-Citation Network

Specify Reference System

j Ordered in Time j Geospatial Map j Semantic Space

(provide sample visualizations here)

Specify Number of Time Steps

Start Year: {years covered}

End Year: {years covered}

Years per time slice: {1, 2, 3, …}

j Disjoint slices j Accumulating slices j Overlapping slices (# years for overlap: …)

Define Data Mapping

Node size: ……….

Node color: ………

Edge width: ……..

Edge color: ……..

Labels: {Names, numbers, and in what format}

Label size: ……..

Label type font:……….

Label color: ………….

(provide default values as much as possible)

[Next]

[Back]

IF (next was selected) GOTO 4.2b

IF (back was selected) GOTO 4

4.2b Generate Visualization

The visualization together with a legend and copyright for the KDVis-Portal is given as an image together with text of what data was cleaned, analyzed and laid out in what way is shown.

[Back]

IF (back was selected) GOTO 4

5. Documentation

A protocol of all visualizations generated (with option to download them in diverse formats) and all explanatory text is automatically generated. Citation references are given and pointers to documentation in the IVC SW and LM part are given.

[Back]

[Save]

[Start Over]

IF (back was selected) GOTO 4

IF (save was selected) then save protocol in html format

IF (start over was selected) GOTO 0