Class 2 – Information systems as boundary objects

Last week we began investigating the core concepts of information infrastructures, information systems, information institutions and information organization. This week we are going to explore in more detail the relationships between information systems, users, documents and institutions by exploring a case study of HTML documents used to deliver text-based encyclopedic information. In this exercise we will experiment with some of the technical elements that currently influence information visualization and use and explore the connection between the intellectual content of documents and their organizational structures.

Instructions:

Work individually or in groups to complete the worksheet. When you get to a section that requires you to select a resource to explore – pick one resource (please don’t always choose the first one!). When asked to ‘discuss as a group’, consider your response and continue completing the worksheet..

We’re going to work with computer coding today and here’s an important note as you follow the exercises. Computer code is shown on numbered lines and are enclosed in boxes. The numbered lines are simply to help as a reference during instruction and should not be copied into your program. For example a line that reads 56. p { visibility:hidden; } should simply be typed in as p { visibility:hidden; }

Suggested Readings

  1. Mitchell, E. (2015). Chapter 1 in Metadata Standards and Web Services in Libraries, Archives, and Museums. Libraries Unlimited. Santa Barbara, CA.
  2. Listen: With modern makeovers, Americas libraries are branching out.
  3. Listen: Computers Are The Future, But Does Everyone Need To Code?. NPR News Story 1/25, 2014.
  4. Listen: Do we really need Libraries (2015 – NPR story):
  5. Kernighan, B. (2011). D is for Digital. Chapter 6: Software systems, Chapter 2: Bitys, Bytes and Representation of Information
  6. Read - “User-centered models of information retrieval.” Introduction to modern information retrieval. Pp 249-261. –
  7. Explore: DCC Curation Lifecycle Model. (2012).
  8. Explore: Records and Information Life Cycle Management.

Optional readings

  1. M. K. Buckland. (1997). What is a document?48, 804-809.

Discussion of readings

Information needs/seeking, retrieval and use

On the surface it may appear easy to design and implement information systems. All it takes is creating information resources and making them available to a user. Behind the scenes however there are complex information systems that store the documents, store representations of documents to facilitate retrieval and use and create indexes that help match a user's information query. These information systems run on computers that are designed to serve thousands of users at a time. These two elements combined, the software and hardware could be considered to be an "information system."

These systems have to be designed with users in mind and need to take into consideration the client platform (e.g. a laptop, smartphone, tablet), knowledge level (e.g. information literacy) and information need (e.g. find a book) of the user. Two of these items, the user knowledge level or cognitive state and information need inform the information seeking behavior of the user. The final piece, the client technology influences the interaction between the human and computer (e.g. Human Computer Interaction or HCI).

Let's group each of these items in to a simple visual model:

In the above model we have a hardware environment on top of which a set of software is deployed. The entire information system is developed to implement the abstract model that matches a user's information need with available documents through the interaction between the user's query and an information system. The core abstract model is pulled from an information retrieval model that we will be reading more about later in the semester.

The interaction between each of these elements can be visualized differently. For example, in our reading you will find Saracevic's Information Stratification model. The model expands on our simple model by introducing situational factors and by breaking down the component parts of an information seeking activity (e.g. - query characteristics, interface elements, system engineering).

At the moment you might find some of the acronyms and systems mentioned in these models confusing. Don't get too caught up in understanding each system in depth yet. In this class we will focus on three of these elements: computing infrastructure, information seeking behavior and digital document structure. Let'sstart by exploring the technology building blocks of an information system.

Explore computing infrastructure

Before we move on to understanding documents and document representations and the roles they play in our information system, let's briefly consider the physical building blocks of computers and software stack that makes up an information system.

In class 1 we learned about the building blocks of computers including hard drives, RAM, motherboards, CPUs and peripherals. It turns out that, for the most part, the only difference between the servers that Google uses and the laptop on your desktop is the form factor of the machine (e.g. it fits neatly in a rack mount), the amount of RAM and power of the CPU and the number of machines combined to make up a server farm. There are other differences including the robustness of the power supply and the number of redundant parts (e.g. extra hard drives, power supplies and RAM for failure tolerance). In general however, information systems often get their robustness by using lots of smaller computers rather than using one really big computer. This is also known as "commodity computing" and is sometimes referred to as "computing as a utility."

In addition to the physical hardware, information systems have a base operating system (e.g. Windows 7, OSX, Linux, Unix), general software that is designed to re-usable across the computer (e.g. Libraries) and applications that serve specific purposes (e.g. Webservers, Word Processors, Indexing software). The model below provides a view of the relationship between these four building blocks.

Exploring your computer

Let's spend a moment exploring our own computer including its hardware, operating system and running applications. To explore your computer open task manager (Windows - Ctrl-Alt-Del click on task manager), Activity Monitor (OSX - search for activity monitor), About this mac (Apple -- About this mac). Look around the activity monitor/task manager applications. Can you find out how fast your CPU is? How much ram do you have?

Table 1 Activity monitors for Windows (Left) and OSX (Right)

Question 1.CPU speed and type

Question 2.Amount of RAM in system

Question 3.Operating system

Question 4.Running applications

Turning your computer from a word-processing focused machine into a web server or document indexing server is actually quite easy. In later classes we will use virtual machines to explore these types of applications. While it may seem complex, the world of system administration revolves around tuning machines and software using tools similar to the application monitoring tools we just explored.

Understanding the logical structure of a hard-drive

In addition to understanding the building blocks of your computer it also worth understanding the logical structure of your hard drive. Modern operating systems tend to interface with the user using skeuomorphic principles. In skeuomorphic design " a derivative object retains ornamental design cues to a structure that was necessary in the original" ( Skeuomorphic design is a popular design principle because it helps users infer functionality by recognizing objects that they are familiar with.

Take for example the toolbar of Word:

The toolbar shows a new document (physical paper), a file folder to indicate opening a document, a floppy disk to indicate the "save" function, a clipboard to represent "copy" functions and a paint brush to represent formatting options. Critics of Skeuomorphic design counter that objects loose relevance over time and are meaningless to users who, for example, never have seen a floppy disk. In addition, critics of skeuomorphic design assert that using these constructs in design are a barrier to a deeper or different understanding of system functionality.

This explanation of skeuomorphic design is a long-winded way of pointing out the rather obvious fact that our hard drives do not actually have "folders" and "documents" on them. As we learned in our reading this week, files and folders are ultimately represented as bits on the physical media on our disk drives. Take a few minutes and browse through Finder (OSX) or My Computer (Windows).

Question 5.What Skeuomorphic design elements do you see used to represent the information stored on your hard drive?

Question 6.Reflect back on the classification structures we touched on in Class 1 (e.g. Enumerative, Analytico-Synthetic, Faceted). Is there a classification structure that seems to fit the folder structure of your hard drive? Why?

Step 1:Take a few minutes and revisit Kernighan's discussion of bits and bytes (p28-30) and answer the following questions.

Question 7.A byte is comprised of how many bits of information?

Question 8.How many bytes are in a kilobyte?

Question 9.How many bytes are in a megabyte

Question 10.How many bytes are in a gigabyte?

Question 11.A hard drive has 10,000 images, each 30 megabyte in size. How many of those files will fit on a 1 gigabyte flash drive?

Question 12.How much space ( in gigabytes) do the images in the previous question take up on the hard drive?

Explore information seeking behavior

Step 2:Chowdhury looked at a number of information behavior and information seeking models. While the reading does a good job of describing the model there are few visual models in the reading to help. In order to enhance your understanding read the Chowdhury reading alongside the ppt slides for this class, and use them to inform your understanding. Rather than focusing on every model I would recommend selecting one or two.

  1. Models
  2. Wilson’s problem solving model
  3. Dervin’s sense-making approach
  4. Ellis’s information seeking process
  5. Kuhlthau’s information seeking model
  6. Ingwersen’s model
  7. Beklin’s ASK model
  8. Saracevic’s stratified interaction model
  9. Questions
  10. Briefly review the model’s components – be prepared to answer the question “what process does this model describe?” Can you think of a real-world situation in which this model applies?
  11. Is your model focused on user behavior, cognition, an information seeking process or the interaction between a user and a system?
  12. Does this model fit with your own view of how you seek information? Why or why not?

Understanding document structure

In later classes we will explore the indexing and querying processes in more detail. In order to understand documents and the role that their representations play in information systems let us focus on a particular type of digital document - HTML documents. How we structure documents is central to our use of them. For example, recipes tend to be structured in a specific way to help us differentiate between ingredients and cooking instructions. Nearly every text or multi-media based document has their own model, or general structure, that help us recognize how to use them.

Explore the recipe – circle and label different types of information (e.g. quantities, procedures, ingredients, etc.
Consider the structure of this recipe and answer the following questions

  1. What are the main sections of the recipe?
  2. What terms/formatting are used to indicate each section?
  3. What assumptions does the recipe make about your knowledge level?

Although this recipe is relatively simple, it is actually rather complex to represent in a digital document form. We need ways to group sections including ingredients, directions, submission information and prep data. We need a way to include a picture with the document, give it a title and need a presentation model that makes sense to cooks. In fact, there are two equal problems here. First, we need the intellectual content of the recipe to be captured and made available. Second, we need that content to be presented in such a way as to be useable by our readers.

Consider the impact of some of the information seeking models that we explored. How would a clear layout help with the Sense-making process? How might the design and presentation of the content influence your level of need satisfaction in Belkin’s ASK model? This screen shot accomplishes all of this through a suite of technologies including HTML, CSS, data modeling and programming. In this class we are primarily focused on HTML so lets spend a bit more time exploring the HTML standard.

Case study in HTML

HTML (HyperText Markup Language) is the primary document encoding scheme of the web. HTML is a text-based document format that serves as the foundation of every webpage. HTML has seen quite a few versions and is managed by a consortium known as the World Wide Web consortium (W3c). In the remainder of this worksheet we will explore the structure of HTML documents and consider their relationship to information organization.

Document structure overview

HTML documents are primarily comprised of elements and attributes. For each element/attribute there is a name (e.g. element name, attribute name) and value (e.g. the value of the element/attribute). These elements and attributes are arranged in a hierarchical manner. Exact elements and attribute names and the rules governing their values are defined by HTML standards maintained by the W3c. Figure 1 shows us an example HTML page. Take a moment to review the HTML document and acquaint yourself with the following concepts:

1.Elements: In HTML, Elements are surrounded by >. An element must be “opened” (e.g. <html>) and “closed” (e.g. </html>) and must follow hierarchical rules (more on this below). Take a look at line 2. The element defined on line 2 is <html>. For this element the name is “html” and the value is all of the sub-elements (and their values) of the html element.

2.Attributes: Attributes are enclosed with element declarations (e.g. the attribute title is attached to the element meta). An example of this is seen on line 5. The element <div> has an attribute “id”.

3.Values: Attribute values are the text in quotes after the = sign for values (e.g. title=”Sample page”). Element values are the text and all of the child elements contained in between the opening and closing elements in an HTML page. For example, lets look at line 7. Find the element <p>. The value of the element <p> is “This is a very simple webpage.”

In the software development world these name/value combinations are also called variables. A variable is a name/value combination such as title=”sample page”. Although not represented explicitly, this also applies to elements. For example in regards to line 7, <p> = “This is a very simple webpage.” Likewise, on line 5 the element <div> has an attribute id which has the value header (id=”header”).

Question 13.Briefly review line 5 in Figure 2to identify the element, attribute, attribute value and element value and fill out table 1.

Table 2: Map of element / attribute names and values

Element Name / Attribute Name / Attribute Value / Element Value

Figure 3: Example HTML document

  1. <!DOCTYPE html>
  2. <html>
  3. <head<title>Sample page</title</head>
  4. <body>
  5. <div id="header">Hello World</div>
  6. <div id="body">
  7. <p>This is a very simple webpage</p>
  8. <ul> <li>It has just a few basic elements</li>
  9. <li>It has a meta tag to provide descriptive metadata</li>
  10. <li>It has div elements to facilitate styling with cascading stylesheets</li>
  11. </ul>
  12. </div>
  13. <div id="footer">
  14. <p>2011</p>
  15. </div>
  16. </body>
  17. </html>

One additional important feature of HTML documents it that they follow a hierarchy. The hierarchical nature of HTML may be familiar from our work last week with hierarchical classification systems.

To review, some of the features of hierarchy include (Adapted from Kwasnik, 1999 -

  1. Inclusiveness: The top element of the hierarchy contains all sub-classes
  2. Super / Sub – class distinctions: Elements interact via a super/sub class distinction. Also known as parent/child/sibling relationships, super/sub-class distinctions describe “is-a” relationships (e.g. head “is a” child of html)
  3. Inheritance: Sub-elements are members of not only their parent class but all other parents of those super classes.

In HTML documents these rules have some special implications. First, HTML document elements must respect the idea of inclusiveness. This means that each element that is open (e.g. <html>) must also be closed (e.g. </html>). The concept Super/sub class distinctions means that an element can have only one parent. This is represented by opening and closing a child element inside of the parent element (e.g. <html<head</head</html>). Finally, as we will see in our next worksheet that the concept of inheritance is applied when HTML documents are processed by web browsers.

Using Figure 2as a guide answer the following questions.

Key Questions

Question 14.What is the top element of the hierarchical document in figure 1?

Question 15.What element is the parent of the element <li>?

Question 16.In the <div>, elementswhat attributes are defined?

Question 17.On line 6, what is the value of the attribute “id”?

Question 18.On line 5, what is the value of the element <div>?

HTML document structure

You may have already noticed that HTML documents are a bit odd. For example, while we say that each element can have only one parent, you may have noticed that the <p> element occurs under two parents (see lines 7 and 14 and find the parents of these two elements. The HTML standard allows the repeating of elements but still requires each element to exist under a parent element.

While not comprehensive Table 2 provides us with a quick overview of how the HTML standard uses each of the elements defined in Figure 1.

Table 3: Brief table of HTML elements

HTML element / Function
html / This is the root element of HTML documents. This element helps the web-browser understand that the document follows the HTML standard
head / The head element stands for “header” and contains information that the HTML document uses to store information about the page (e.g. style sheets, javaScript, document meta tags)
body / The body element is designed to contain all of the HTML contents that will be shown in your web-browser when the page is displayed
div / The div element has little default functionality but is often used as a container for other elements (more on this later!)
ul / The ul (or Unordered List) element helps create a bullet list of content in your HTML document. This element is used in conjunction with multiple li (list) elements to show individual bullets. Another element that behaves similarly to the unorder list is ol (ordered list). When you use an ol element in place of a ul element, the list is created with numbers instead of bullets. Note: Both <ul> and <ol> elements represent individual items with repeating <li> elements.
li / See the ul element for use. The li element is used to contain the text that is shown in individual bullets
p / The p (or paragraph) element is used to contain larger blocks of text that is typically represented in paragraph form. While the <p> element has some default behavior it, like all other elements can have its behavior modified through the use of cascading style sheets.

The role of web-browsers

Although HTML documents are text-based they create GUI (Graphic user interface) web-sites when they are displayed in a web-browser like Chrome, Firefox or Internet Explorer. In the software development world the function that the web-browser performs is known as an “interpreter.” There are interpreters for every different programming languages. This semester we will be working with a few different interpreters, each designed to work with a different type of document.