CAST, Inc.; 40 Harvard Mills Square #3; Wakefield, MA 01880
(781) 245-8547; TTY (781) 245-9320; Fax (781) 245-5212;
NIMAS
DAISY and NIMAS in HTML
A Guide to Accessible HTML Production for DAISY 3 and NIMAS 1.1
Updated March 2013
Prepared by Valerie Hendricks
This report was updated with support from the AIM andNIMAS centers, cooperative agreements between CAST and the U.S. Department of Education, Office of Special Education Programs (OSEP), cooperative agreement no.s H327T090001and H327P090001. The opinions expressed herein do not necessarily reflect the policy or position of the U.S. Department of Education, Office of Special Education Programs, and no official endorsement by the Department should be inferred.
Table of Contents
Introduction
DAISY, NIMAS, and HTML
Comparing the Standards
Document Structure
Mark-Up: Elements and Attributes
Mark-Up: Separation of Content and Presentation
The Need for Accessible HTML Materials
HTML Production Using DAISY and NIMAS
Review Source Content
Edit Source Content
Presentation: Formatting, Navigation, Rendering
Testing and Review of HTML Conversion
References
Resources
Acknowledgments
Introduction
The National Instructional Materials Accessibility Standard (NIMAS) is a technical standard used to produce source files in XML format that may be used to develop multiple specialized outputs in a variety of formats for students with print disabilities. The XML and image source files of a NIMAS fileset can be used to create Braille, large print, HTML, DAISY digitaltalking books (DTBs) using human voice audio or text-to-speech synthetic audio, and more. (The NIMAS applies to instructional materials published [available for purchase or in print] on or after 7/19/06.)
The NIMAS is a sub-set of the DAISY Standard. DAISY stands for Digital Accessible Information SYstem and refers to an international standard for creating a variety of DTBs: digital books that are a combination of synchronized text and audio. The DAISY specification is made up of internationally agreed-upon rules and requirements necessary to create digital and audio books, including XML and SMIL file requirements and structural and other aspects.
This guide is based on theDAISY 2005-3 and NIMAS 1.1
technical specifications
Both DAISY and the NIMAS use XML and therefore XML’s most important asset: the separation of content from its presentation. Using content’s structure and components as a source, a variety of outputs or products with very different format, layout, presentation, and features can be created.The results may then be used to support a diverse group of learners and users with print disabilities.
This guide is intended to assist publishers and other producers of accessible instructional materials (AIM) to create accessibleHTMLoutputs sourced from NIMAS filesets. Several of the elements of the DAISY Standard and the NIMAS technical specification do not have HTML equivalents. This guide was prepared to address the question of how DAISY/NIMAS elements without direct corresponding elements can be converted into HTML and to cover important aspects of a NIMAS to HTML conversion project.
DAISY, NIMAS, and HTML
Comparing the Standards
While DAISY, the NIMAS, and HTML share many fundamental characteristics, they also differ in significant ways. Requirements for DAISY files, NIMAS filesets, and HTML documents and some useful information for comparison are outlined below.
HTML
An HTML document consists of one or more files: one or moreHTML files and one or more optional additional files, such as a stylesheet, images, or javascript files. HTML elements are usually conceptualized as being made up of three main kinds of elements:
- those that describe the structure and the organization of content components (such as paragraphs, headings)
- those that describe the format and layout of the content(such as emphasis [<i>, <b>], spacing [<colgroup>, <sup>])
- those that link one part of content to another or to outside content (such as link to a glossary, link to a web page)
Semantic HTML refers to HTML code that is devoted to content rather than presentation. Since HTML was originally intended to be semantic HTML, it is an obvious best practice to use it (and in fact, it is consistent with the NIMAS, which requires content and presentation separation) especially since all HTML will increasingly be semantic HTML again.HTML is most compatible with multiple browsers and with assistive technology (AT) when it strictly follows the HTML standards and semantic definitions of the HTML elements
DAISY
A DAISYfileset consists of four or more files: an (optional) XML text content file, an OPF file (metadata), an NCX file (navigation), a SMIL file (synchronization), and (optionally) one or more audio files. Image files are often present as well.
Within the XML text content file, DAISY elements are classified as major structural, block, and inline elements. Major structural elements include levels (<level1> through <level6>), <frontmatter>, <bodymatter>, and <rearmatter>. Examples of block elements include <list>, <paragraph>, and <sidebar>. Examples of inline elements include <a>, <pagenum>, and <span>.
DAISY filesets often have very many files for two fundamental reasons: to support enhanced navigation, and to support synchronization of text and audio. The NCX, for example, can contain information that supports navigation by heading and by page, as well as information to allow direct navigation to tables, figures, sidebars, and other content within the book. DAISY filesets can contain textual content only, audio content only, or a mixture of the two.In the latter instance, a mechanism is needed to allow text to be displayed by a DAISY reader while an audio version of that same text is being played. Readers capable of processing DAISY DTBs provide synchronized playback of mixed text/audio books using information in SMIL files; others can provide audio versions of textual content using only automatically-generated synthesized speech.
NIMAS
A NIMAS filesetconsists of three or more files: an XML source file, an OPF file (metadata), one or two PDF files (print work title and copyright page[s]), and any image files present in the print source work.
The NIMAS Technical Specification contains a Baseline Element Set that is made up of a relatively short list of elements meant to cover all of the basic, but no more than the basic, needs of publishers and other producers of educational works, primarily textbooks. The entire remaining list of elements available in the DAISY standard are included in the NIMAS as optional elements. Producers are encouraged to use optional elements where appropriate.
When creating an HTML conversion from a DAISY or NIMAS source, typically the only files used will be the XML source file and all of the image files.
Document Structure
In many print works, particularly textbooks and other structured works,it is often important to indicate that certain segments of text or other content be designated as having significant aspects to them independent of their fundamental content (such as “text” or “image”); for example, a common meaning, a relationship to each other, or a shared appearance. Properly structured documents are critical for accessibility.In DAISY, the NIMAS, and HTML, these aspects of a work are addressed by structure and mark-up. (Presentation aspects such as format, layout, style, etc., are added at a later date.) Creating a structure and mark-up that accurately reflects the content of the work in question is the goal of both DAISY and NIMAS source files and of HTML outputs.
Structural Elements
In DAISY, level elements are perhaps the most fundamental of structural elements. The following example provides some additional consideration of this basic element.
<level1 class=”part”>
<h1>Part 1</h1
<p>This is an introductory paragraph for Part 1.</p>
<level2 class=”chapter”>
<h2>Chapter 1</h2
<p>This paragraph is part of Chapter 1.</p>
</level2
<level2 class=”chapter”>
<h2>Chapter 2</h2
<p>This paragraph is part of Chapter 2.</p>
</level2
</level1
The example above shows a top-level section of a textbook identified using <level1 class=”part”>, which has two sections identified using <level2 class=”chapter”>. These levels could have other content components (nested properly) within, such as additional chapters, sidebars, blockquotes, etc.
The example above illustrates a structural organization of content components (part, chapter, and lessons) and corresponding possible mark up in DAISY or NIMAS XML.
Mark-Up: Elements and Attributes
DAISY, the NIMAS, and HTML share a number of common document structures (structural components), but may implement them differently. Two representative examples where DAISY/NIMAS and HTML have conceptual differences are outlined as follows:
Titles
HTML has a <title> element, for use in designating the title of a document. It is intended to signify the title of the work as a whole and to be placed at the top of an HTML document.
DAISY and the NIMAS have the elements <doctitle> and <covertitle>. The <doctitle>element serves to indicate the official, formal title of the entire document and is placed at the top of an XML document. The <covertitle> element contains the title that appears on the cover of the work, which often differs from a work’s full title.
<doctitle>Jane Eyre: An Autobiography/doctitle>
<covertitle>Jane Eyre</covertitle>
<doctitle>Through the Looking Glass, and What Alice Found There/doctitle>
<covertitle>Through the Looking Glass/covertitle>
Captions
HTML uses the <caption> element to indicate the caption of a table. It appears only once per table and is must be placed at the top of the table mark-up, just after the opening <table> tag.
DAISY and the NIMAS also use the <caption>element to indicate the caption of a table, and, when so used, must be coded in the same way. However, <caption> is also used within the <imggroup> element. When used with images, additional options and placements are available.
<imggroup>
<img id=”u01.c04.011” src=”./images/unit1/chapter4/img001.jpg” alt=”Photo of a panda eating bamboo”/
<caption imgref=”u01.c04.011”The diet of the Giant Panda consists of up to 90% bamboo</caption>
Mark-Up: Separation of Content and Presentation
One of the main reasons that a DAISY or a NIMAS fileset is a source file, appropriate for use in creating a wide variety of outputs, is the fact that content and its presentation are separated. This separation is crucial to the usefulness of a source file for the creation of specialized formats. However, it is still almost always necessary for output conversions to provide a styled and formatted finished product. One of the most common ways to include information about display details for HTML outputs is the use of CSS. The DAISY Consortium distributes a basic, default stylesheet for use with DAISY files meant to be used online, with a browser. Such a stylesheet may be modified for HTML display purposes. Note that NIMAS filesets, and, in many cases, DAISY filesets do not properly contain stylesheets and the use of CSS is described here as an appropriate addition for HTML conversions.
The Need for Accessible HTML Materials
HTML is a format widely used around the world and supported by a vast array of hardware, software, and web-based applications. Much of available freeware, shareware, public domain content, and other no-cost products and resources support the HTML format. In addition, virtually everyone with access to computers and to the Internet is thoroughly familiar with and comfortable using HTML. Given HTML’s worldwide distribution, use of this format to provide accessible instructional materials is logical, efficient, and allows enormous variety of delivery and use. Additional considerations include use of many of HTML’s advantages, associating additional supplementary materials, integrating additional components such as MathML or MusicML content, and secondary concerns such as user preference for HTML and the relative ease of HTML conversion from a DAISY or NIMAS source.
However, HTML in and of itself is not necessarily an accessible medium. Creation of an accessible HTML document requires the inclusion ofadditional components, such as information that allows effective navigation (such as hyperlinks), the provision of alternative text (alt text/alt tags and long descriptions) for images, and ensuring that the structure of asource work is adequately preserved and portrayed. (One or more of any additional components may be already present in DAISY or NIMAS sources files, allowing for an easier conversion to HTML.)
Web sites/pages devoted to accessible HTML:
- Creating Accessible HTML
- HTML Techniques for Web Content Accessibility
- Web Standards Project Accessible HTML/XHTML Forms
For additional information regarding the instructional need for accessible HTML, please refer to the following resources:
- Accessible Media: Text
- Accessible Textbooks in the K–12 Classroom II
- All About AIM
HTML Production Using DAISY and NIMAS
Thanks to the power of XML, used for mark-up of content by both standards, the conversion process from DAISY or NIMAS to HTML is fairly straightforward. The process includes the following:
- Review the source content
- check to be sure the source files are valid XML, conformant to either the DAISY standard or to the NIMAS, and use consistent mark-up that is accurate to the print source throughout (identify any structural or content flaws for correction)
- check the source files for missing components (for example, missing images)
- check the structure and components of the source for unique or seldom-used organization, parts, or sequence
- Edit the source content if needed and/or desired
- correct any errors or inconsistencies found during review
- address any missing parts issues
- address unusual or challenging pieces
- choose alternate mark-up for elements without an HTML equivalent
- create any additional navigation to be included in the work (for example, attributes or hyperlinks)
- Convert the source content
- to HTML format
- and create any visual presentation additions for the work (for example, a CSS stylesheet)
- Test and review the HTML conversion
- verify content, structure, styling, navigation, accessibility,etc.
- check to be sure that the conversion can be used with the applications, tools, etc., for which it was made
- validate HTML code (see below for validation resources)
In the sections that follow, each of these steps is discussed in more detail and information included regarding specific issues and strategies for resolving them. It’s important to note that not all of the issues identified in these sections may be present or need to be addressed in any one particular HTML conversion, nor is coverage exhaustive. However, using these guidelines will help produce an HTML output that is readable, navigable, and useful.
Review Source Content
Prior to conversion, it’s important to review source content to ensure that it can be used to produce accessible HTML. There are a number of checks and enhancements that can be performed prior to the actual conversion that will help ensure the resulting HTML conversion is of high quality and is useful to the widest variety of users.
Complete, Accurate Filesets
Check to be sure the source files are valid XML and are conformant to either the DAISY standard or to the NIMAS. Many XML editors include a validation feature that will check an XML file for well-formedness and against a DTD listed in the document declaration. Several XML editors that can validate a source file are XMLSpy, Dreamweaver, oXygen, and Stylus Studio. Online validators include the W3C’s Mark-Up Validation Service, Theano GmbH’s XML Validation, and Edinburgh Language Technology Group (LTG)’s XML well-formedness checker and validator.
Review the source file to identify any structural or content flaws for correction. Make sure that an actual image file is present for each <img> element in the XML, and that the filename path corresponds to the correct image. Review the source file for components such as one-time features or complex pieces. Check for missing or duplicated content.
Consistent, Correct Structure
Check the source file to be sure that structural mark-up is consistent throughout a work and accurately reflects its source. Examples of structural errors or mismatches to repair include—
- Content components of the same kind are marked up differently
- Example: A Q&A sidebar that appears at the end of each chapter is marked up one way in one chapter, and another way in other chapters
- Example: Recurring margin notes marked up differently each time they appear
- Content components are not marked up according to the source work
- Example: Page breaks in incorrect locations
- Example: Sidebar content marked up as a paragraph
- Excessive use of <sidebar>, <p>, or other elements that do not correspond to content type
It would not be possible to list all potential inconsistencies or errors that might be found in a source fileset, but these items concern the entire work as a whole and should go a long way to ensuring that a file is of high quality and accurately reflects its print source.
Edit Source Content
Editing of the source files can be done with any one of a number of XML editors or even in a text editor such as Notepad. Using XML editing software has several advantages over hand-coding; most programs correct errors as they are typed or on demand; save time by auto-completing tags and attributes; and often include rather sophisticated search, find, and replace features.
XML in XMLSpy