OCW PUBLICATION FORMATS:
USER NEEDS AND FUTURE DIRECTIONS

September 2006

CONTENTS

I.Executive Summary......

II.What are the optimal formats to meet user needs?......

A.User needs......

B.Suitability of PDF format......

C.Alternative formats......

1.Background on alternative formats reviewed......

2.User preference for alternative formats......

3.Follow-up research......

4.Note on XML/MathML format......

III.How should OCW respond in the short- and long-terms, and what are the implications?......

A.Balancing OCW priorities......

B.Publication format options considered......

C.Analysis of publication level of effort......

IV.Conclusions and proposed directions for OCW......

APPENDICES

A.OCW Document Scrubbing Checklist......

B.File Format Analysis......

C.Follow-Up Responses from PDF-Averse Users......

D.Insights from African Virtual University on Reuse of OER Materials......

E.Notes on Level of Effort Analysis......

For more information, please contact
Cecilia d’Oliveira
Technology Director
MIT OpenCourseWare
One Broadway, 8th Floor
Cambridge, MA02142
Phone: 617-253-6124
Fax: 617-253-2115
Email:

OCW Publication Formats:
User Needs and Future Directions

I.Executive Summary

OCW has recently researched document-type/format options and directions for future publication of MIT course materials on OCW. Today, most materials come to OCW as PDF or MS Office documents. The OCW publication process then takes these materials through various editing, scrubbing, and reformatting steps and posts them to the OCW web site in PDF format or sometimes HTML and other formats.

While PDF is sufficient for many users who wish merely to read or reference OCW materials, other users find this format limiting. In particular, some users wish to reuse or remix OCW materials in other contexts or other documents, and would benefit from a more structured, editable format that would allow easier manipulation and combination with other materials. Others foresee the integration of OCW content with applications such as modeling and classroom demonstrations, wherein “computable” math formulas, chemical equations, musical notations, etc. found in OCW materials could be “performed.” The Hewlett Foundation, one of OCW’s major funders, has urged consideration of a more advanced publishing format—specifically XML—to accommodate these needs in the future.

Accordingly, during spring and summer 2006, OCW undertook an analysis to understand user needs, format options, and the effect these various options would have on the OCW publishing process and costs. We also surveyed automated tools for converting between formats. This paper describes our findings and our recommendations for publishing in the future.

Summary of findings

While most current users find PDF suitable—in many cases optimal—for their purposes, many would prefer more options, particularly MS Office (Word, PowerPoint) formats.

PDF is limited in its support for users who wish to manipulate content, and it is also cumbersome (slow to download and slow to openon many platforms).

About half of educators, representing about 8% of OCW users but comprising a particularly important OCW constituency, seek to manipulate content.

XML is an emerging standard for Internet publishing, and represents the state-of-the-art ideal because compared with other publishing format options, it allows maximum flexibility for display, manipulation, and repurposing of content.

Creating rich XML documents requires familiarity with the course and subject matter to ensure proper coding and structuring content, and can be accomplished neither by automated conversion tools alone nor by human transcribers who lack knowledge of the content.

Unless original course materials were submitted to OCW in XML or in a structure that could easily be transformed into XML, publishing in XML would add an untenable level of effort (double or more in some cases) to the production process for most types of courses.

Though not as powerful as XML, publishing in MS Office formats would support manipulation and reuse better than PDF, and could be accomplished with only modest additional effort and cost for materials that are originally submitted to OCW in these formats (the typical case).

As we considered how to address the needs and issues identified in this study, we have been ever mindful of the challenge of finding the right balance among three sometimes-competing OCW priorities:

Optimizing usability and the user experience,

Maximizing the quantity of MIT content we publish or update in a given publication cycle, and

Keeping costs low and ensuring OCW sustainability.

Our direction on publication formats is further affected by the fact that MIT courses are heavily weighted toward engineering, math, science, and economics, all of which are characterized by complex content with many equations, charts, and graphs. This content is much more laborious to create, edit, and format than plain text-based lecture notes, hence the fact that OCW staff still work with faculty who are using fifteen year old hand-written notes as their primary handouts for class. This content is also more technically difficult to convert with existing automated format conversion tools. Accordingly, our recommendations are necessarily measured, but we believe they reasonably reflect all these factors.

Recommendations

In the short- to medium-term (until MIT establishes and implements a broader strategy for developing, managing, and delivering teaching materials, about 3 – 5 years we believe), OCW should publish materials in the original source format when feasible (typically MS Office or sometimes HTML). Note: We do not currently have the capability to republish LaTeX documents.

In the longer term, OCW should publish XML documents, after tools and incentives are in place to encourage faculty to author materials in XML or to use XML-compatible templates.

In both the short- and long-term, OCW should continue to publish PDF documents alongside the alternative formats.

Regarding existing OCW courses that have already been published, we should proceed by republishing this existing content as part of the planned steady state update cycle, about 150 courses per year.

II.What are the optimal formats to meet user needs?

A.User needs

OCW evaluation research[1] indicates that there are several use case scenarios that describe why and how different categories of users make use of OCW materials. The following table summarizes these scenarios by educational role of user and shows the proportion of primary use[2]:

Table 1. Summary of Usage by User Role

By their nature, some of these use case scenarios are fulfilled by reading or referring to OCW materials while others are accomplished by manipulating the materials—adapting, combining, or remixing OCW content, typically for use in teaching. In particular, the first two use case scenarios in the preceding table (“developing or planning a course” and “preparing to teach a specific class”) are more likely than the other scenarios to involve manipulation.

While the vast majority of current OCW visitors use the materials for one of the “read-only” purposes, a small—but vitally important—segment of users, primarily educators, engage in an activity that may involve manipulating or repurposing the materials. The following table shows that about half of educators currently reuse OCW materials and many more may do so in the future.

Table 2. Reuse/Adoption of OCW
Materials for Teaching Purposes

Materials Reuse / Current / Future
Yes / 46.2% / 62.8%
No / 53.8% / 5.7%
Not sure / 31.5%
Total / 100.0% / 100.0%

Note, however, that by the definitions used in the evaluation survey, not all reuse of materials involves manipulation. The following table identifies various modes of reuse. Respondents were asked to select all that apply, and many educators reuse the materials in multiple ways.

Table 3. Scenarios of OCW Reuse by Educators

Reuse Scenario / %
Recommended that students go to the site directly for additional subject information / 65.5%
Adapted syllabi or other content in developing the structure of a course / 40.5%
Incorporated OCW lecture notes, simulations or tools into preexisting course materials / 36.7%
Provided printed copies of unmodified site materials to students in class / 29.5%
Adapted MIT OCW assignments or exams / 25.8%
Provided electronic copies of unmodified materials to students via e-mail, file sharing, or LMS / 23.9%
Other / 1.9%

Finally, of the roughly half of educators who do reuse the materials, we found that 86% did indeed manipulate the materials to support localization. Localization refers to the process of adopting/adapting materials for local educational use, and may involve any one or more of the following manipulations:

Combining materials from more than one source, including adding OER materials to existing course materials.

Making academic/pedagogical changes. For example, educator might adapt an advanced MIT course to a lower level student audience by removing some material and stretching out remainder over longer teaching calendar.

Updating materials to reflect advancing knowledge in the field.

Making technical changes to accommodate local technology requirements.

Making cultural adjustments.

Educators who report making one or more types of modifications to the materials did so for the reasons shown in the following table.

Table 4. Types of Educator Modifications to OCW Content

Modification Type / %
Incorporation of MIT OCW content with materials from other sources / 61.7%
Adjustments for differences in academic level / 41.3%
Adjustments to update content to reflect most current field developments / 29.9%
Adjustments for technical format differences / 25.8%
Adjustments for cultural differences such as language and appropriateness of examples / 23.1%
Other / 2.3%

We conclude from this research that about 6% - 7% of current users manipulate OCW content and do so in a variety of ways primarily for teaching purposes. We acknowledge that our research may have a built-in bias insofar as a larger proportion of users might reuse OCW content were it presented in a format that made it easier to manipulate and repurpose. We also speculate that there could be ways of using OCW content such as performing simulations based on mathematical models, “playing” musical notations, and other renderings we cannot even imagine, if only the materials were presented in a way that supported such uses.

B.Suitability of PDF format

In the beginning, OCW chose the Adobe Portable Document Format (PDF) as the most logical for publishing most MIT educational content, based on the following considerations:

MIT faculty authors develop course content using a wide variety of tools, technologies, and formats. Faculty are under no obligation to follow any particular standard; OCW publication is not their first priority; and participation is strictly voluntary. Typical formats received by OCW include PDF, MS Word, PowerPoint, LaTeX, and sometimes HTML and handwritten notes. PDF was judged to be a reasonable “common denominator,” allow easy conversion from these diverse formats.

PDF keeps production costs relatively low, particularly because conversion to PDF avoids the possibility of introducing transcription errors in converting from the original. This in turn reduces quality assurance efforts, including having to involve faculty in rereading materials.

We believed that PDF is a widely accepted standard that would be accessible by the broadest possible audience using ubiquitous (and free) PDF reader software.

Our focus was on meeting aggressive publishing targets and schedules, with 500 courses being published in the first full year of operation, and the simplicity of PDF helped to streamline this process.

PDF is primarily a display format as opposed to an editing or data manipulation tool. Given that the vast majority of current OCW visitors use OCW for reading and reference, as discussed earlier, it is not surprising that most users—in fact over 97%—find PDF suitable for their purposes. The following table shows these evaluation results by role.

Table 5. PDF Suitability Ratings by Educational Role

PDF suitability / Educator / Student / Self Learner / Other / All roles
Very suitable / 55.8% / 59.0% / 58.8% / 54.3% / 58.1%
Suitable / 42.2% / 38.9% / 38.6% / 41.2% / 39.4%
Unsuitable / 1.5% / 1.7% / 2.0% / 3.0% / 1.9%
Very unsuitable / 0.5% / 0.4% / 0.7% / 1.5% / 0.6%
Total / 100.0% / 100.0% / 100.0% / 100.0% / 100.0%

Because PDF is not primarily designed to support text manipulation, a population of particular concern is that of educators reusing OCW materials. Educator ratings of PDF format, however, do not appear to bear a strong relationship to educator reuse, or interest in reuse, of OCW materials. The table below shows the suitability ratings for PDF among educators who have and have not reused OCW content are very similar, and both groups overwhelmingly consider the format to be very suitable or suitable. Educators who have not reused content are slightly less likely to rate PDFs as very suitable. However, among educators who were unsure if they would reuse materials in the future, 48% rated PDF as very suitable, as opposed to 60% of those who did plan to reuse material and 63% of those who did not.

Table 6. PDF Educator Suitability Ratings by Materials Reuse

Have Reused / Will Reuse in Future
PDF suitability / Yes / No / Yes / No / Not Sure
Very suitable / 59.8% / 52.9% / 60.3% / 62.9% / 47.7%
Suitable / 39.0% / 44.5% / 38.4% / 31.4% / 49.7%
Unsuitable / 0.8% / 2.3% / 1.0% / 5.7% / 1.6%
Very unsuitable / 0.4% / 0.3% / 0.3% / 0.0% / 1.0%
Total / 100.0% / 100.0% / 100.0% / 100.0% / 100.0%

Educator ratings of PDF suitability do not appear to be a predictor of likelihood of future materials reuse either. Again, among those who did plan to reuse material in the future, those who did not, and those who were unsure, all groups overwhelmingly consider PDF to be very suitable or suitable. We do note, however, that the unsuitable rating among those who did not plan to reuse material was 6%, which is higher than other groups, and this may indicate a reluctance to reuse the materials because of PDF-related difficulties in manipulating the content.

The only significant differences in visitor ratings of PDF suitability emerge when the data are examined by geographical location of visitor. As has been observed anecdotally in previous evaluations, PDF is less widely used in East Asia than in other regions; 12% fewer visitors from East Asia rate PDF as very suitable than do all visitors, but this does not translate into wide dissatisfaction with the format (see table below). Only 4% of visitors from the region rate PDF as unsuitable or very unsuitable. No statistically significant differences emerge when the ratings of only new visitors from regions are examined, suggesting that PDF format may be a slight influence in visitor likelihood of return in East Asia, but is likely a less significant reason than language or connectivity costs.

Table 7. PDF Suitability Ratings by Region

Region / Very Suitable / Suitable / Unsuitable / Very Unsuitable / Total
North America / 63.6% / 34.2% / 1.5% / 0.7% / 100.0%
East Asia / 45.6% / 50.6% / 3.5% / 0.3% / 100.0%
Western Europe / 60.7% / 37.7% / 1.0% / 0.6% / 100.0%
Latin America / 71.2% / 28.1% / 0.5% / 0.2% / 100.0%
South Asia / 53.3% / 42.8% / 2.8% / 1.1% / 100.0%
Eastern Europe / 50.3% / 47.8% / 0.6% / 1.3% / 100.0%
Mid East/N Africa / 66.9% / 32.2% / 0.8% / 0.0% / 100.0%
Pacific / 61.2% / 37.3% / 0.0% / 1.5% / 100.0%
Sub-SaharanAfrica / 55.9% / 44.1% / 0.0% / 0.0% / 100.0%
Central Asia / 46.4% / 42.9% / 7.1% / 3.6% / 100.0%
Caribbean / 71.4% / 14.3% / 14.3% / 0.0% / 100.0%
All regions / 58.2% / 39.3% / 1.9% / 0.6% / 100.0%

Data in gray represent statistically insignificant sample size.

C.Alternative formats

1.Background on alternative formats reviewed

Current-user satisfaction with PDF notwithstanding, we recognize that a“passive,” display-oriented format such as PDF is not optimal for all users or all purposes. Accordingly, we have researched alternative formats.

In 2003, OCW considered publishing in HTML those courses that are essentially all text with source files in MS Word; it is easy to produce both PDF and HTML versions of these courses. In fact, we do publish some materials in HTML under these circumstances or when the original materials are submitted in HTML. However, we determined at the time that it would be too costly and too disruptive to the publication process to convert to HTML routinely, particularly when:

Original source materials are submitted in other formats such as LaTeX and PowerPoint, or

Content is heavy in images or equations

as these conversions tend to require significant manual effort and raise additional quality assurance issues due to the possibility of introducing errors during the conversion process. Note that engineering and math documents created in LaTeX would typically result in hundreds of associated image files when converted to HTML, one for each equation for example. The potential problem associated with managing and publishing all these extra little files concerned us. With PDF, on the other hand, we have all the images contained neatly in one file.

For the current study, we considered and compared a wider set of options for publishing OCW materials:

a.PDF—no change from current practice

b.No conversion—publish original source format, typically MS Office, sometimes HTML. Note certain limitations, however: Faculty sometimes submit materials in LaTeX or other less common formats, and we do not have the means to scrub these documents in the source format. In other cases, faculty submit documents already converted to PDF because they have found that PDF is the best format for them to publish their lecture notes to their students; we do not have the original source documents to work with.

c.HTML

d.XML/MathML—and possibly other discipline-specific languages

The following summarizes these formats along with our findings regarding the major advantages and disadvantages of each. Appendix B provides our more detailed analysis of the features, drawbacks, and usage of these formats.

a.PDF—no change from current practice

Description[3]

Portable Document Format (PDF) is a file format proprietary to Adobe Systems for representing documents in a device-independent fixed-layout document format. Each PDF file encapsulates a complete description of a document that includes the text, fonts, images, and graphics that compose the document. PDF files do not encode information that is specific to the application software, hardware, or operating system used to create or view the document. This feature ensures that a valid PDF will render exactly the same regardless of its origin or destination (but depending on font availability). PDF can be used for documents of any size and can accommodate multiple fonts, graphics, colors, and images. PDF readers are freely available for most widely used hardware platforms, and Adobe considers PDF to be an open standard.