NBII: General Structure and Understanding

NBII (as seen the the repgen.xml document sent by Lisa Zolly on March 10, 2009) shows only three level hierarchy that makes up the NBII thesaurus xml document.

Level 1: Thesaurus

Level 2: Concept

Level 3:-- Descriptor or Non-Descriptor

--UF, USE

-- BT , RT, NT

-- SC, SN

-- STA

-- TYP

--INP, UPD

Understanding NBII

Level 1:

Thesaurus – root element/wrapper for entire “thesaurus”. Contains many Concept elements

Level 2:

Concept—multiple concepts occur within one thesaurus.

Level 3:

The rest of the tags occur within the tag Concept – flat (on the same level) but Concept can be divided into two ways – a descriptor or non-descriptor. The Level 3 elements are interesting because they are all on the same level, but there seem to be relationships and patterns within the different elements and they appear to occur in a certain order. Below is an example of how these tags can be divided:

Example 1: Example 2:

Preferred concepts Non-preferred concepts

Concept Concept

Descriptor Non-Descriptor

UF USE

BT SC

RT SN

NT STA

SC TYP

SN INP

STA UPD

TYP

INP

UPD

BT -- Broader Term

Descriptor – Used for approved terms that

INP – Date (expressed numerically, YEAR-MO-DA) when concept was input into the system

Non-Descriptor

NT – Narrow Term

RT – Related Term

SC – Source (organization, controlled vocabulary) of term?

SN -- ? (see SN note in next section)

STA – Status

TYP – Type of Concept

UF – Use for

UPD—Date (expressed numerically, YEAR-MO-DA) when concept was updated

USE – Use this concept instead

Example from the xml

HIVE and NBII: Relationship to Consider and Questions

· Descriptor and Non Descriptor show a relationship between preferred and non-preferred terms. These tag are paired with the UF and USE elements respectively.

· UF relates to the element USE. Anything found within the USE element should also be in the UF element of the preferred concept. For example, the relationship between Zygote and Ookinetes (see example below)

Example from the xml:

Preferred Term: Zygotes

<CONCEPT>

<DESCRIPTOR>Zygotes</DESCRIPTOR>

<UF>Ookinetes</UF>

<BT>Ova</BT>

<NT>Oocysts</NT>

<RT>Hemizygosity</RT>

<RT>Reproduction</RT>

<RT>Zygosity</RT>

<SC>ASF Aquatic Sciences and Fisheries</SC>

<SC>LSC Life Sciences</SC>

<STA>Approved</STA>

<TYP>Descriptor</TYP>

<INP>2007-08-14</INP>

<UPD>2007-08-14</UPD>

</CONCEPT>

Non-preferred Term: Ookinetes

<CONCEPT>

<NON-DESCRIPTOR>Ookinetes</NON-DESCRIPTOR>

<USE>Zygotes</USE>

<SC>LSC Life Sciences</SC>

<STA>Approved</STA>

<TYP>Non-descriptor</TYP>

<INP>2007-08-14</INP>

<UPD>2007-08-14</UPD>

</CONCEPT>

· BT, NT, RT are all present in NBII and show relationships between preferred terms.

· SN (which serves a definitional purpose) does not occur in every Concept—this appears to be an optional element and may be linked in some way to the information in SC.

· In the document I have, I could not find a STA element that did NOT have the status as “Approved”. I would assume that there would be another entry available for this?

· Relationship between Type and Descriptor/NonDescriptor—this relationship seems redundant, but the Type element is present in every Concept and just repeats if the first element on the Level 3 is Descriptor” or NonDescriptor. Is there something missing in this relationship that I just don’t understand?

· While INP and UPD are used internally by USGS for tracking and updating, these elements are not completely irrelevant to HIVE. Since USGS has offered to send ups updates, these elements (and the data in them) could be used to update terms in HIVE through scripting, etc. UPD especially could be very valuable