WLAP: The Web Lecture Archive Project
The Development of a Web-Based Archive of Lectures, Tutorials, Meetings and Events at CERN[1] and at the University of Michigan
NORA BOUSDIRA
CERN – Ecole Nouvelle d’Ingénieurs en Communication, Lille
E-mail:
Steven goldfarb
University of Michigan, Ann Arbor
E-mail:
Eric Myers
University of Michigan, Ann Arbor
E-mail :
homer a. neal
University of Michigan, Ann Arbor
E-mail:
charles severance
University of Michigan, Ann Arbor
E-mail:
Mick storr
CERN
E-mail:
giosue vitaglione
CERN – University of Naples, Italy - University of Michigan, Ann Arbor
E-mail:
Abstract
This paper summarizes the results of a project to develop an electronic repository of “content-rich” lectures, talks, and training activities on the World-Wide Web. The work was carried out from July 1999 to July 2001 by a collaboration consisting of the University of Michigan ATLAS Collaboratory Project, the University of Michigan Media Union, the CERN HR Division, supported by the CERN IT and ETT Divisions and the CERN Academic and Summer Student Programs. In this document, we describe the software application chosen to synchronize the slide presentations to the video recordings, provide technical solutions to the various recording and archival challenges encountered during the project, and propose a set of research and development issues we feel merit further investigation. We also present the concept of a "Lecture Object" and suggest the adoption of standards so that lectures at multiple institutes can be seamlessly shared and incorporated into federated databases world-wide.
Contents
1Introduction......
2Project Motivation......
2.1Communication in modern high-energy physics experiments......
2.2Enhancing learning capability and dissemination of education and training...
3Project Implementation......
3.1The pilot project......
3.2The archive application......
3.3The archive process......
3.4The WLAP archive......
3.4.1The CERN WLAP archive......
3.4.2The ATLAS GEANT4 workshop......
4Details of the Implementation......
4.1Audio and video capture......
4.2Scenarios for handling the visual support material......
5The Lecture Object......
5.1Lecture Object Architecture......
5.2Draft Specification......
5.3Distributed Architecture......
5.4Prototype developed......
5.5Advantages of standardization......
6Other Web Lecture Archives and Technologies......
6.1Other archives......
6.2Other technologies......
7Planned Future Applications and R/D......
7.1ATLAS......
7.2CERN Particle Physics Distance Education Program......
7.3Web accessible Basic Safety Training......
7.4Planned Future Technology Development......
8Conclusions......
9Acknowledgements......
10References......
Bibliography......
Trademark Notice
All trademarks appearing in this document are acknowledged as such.
1Introduction
The primary motivation for the creation of the World-Wide Web was the facilitation of collaboration between scientists [1] . There was a need for a better way for scientists to rapidly exchange large amounts of information, ranging from experimental data and results of analyses to organizational and strategic details related to ongoing experiments. The rapid proliferation of the web and web-related applications, as well as the ever-increasing size and international scope of scientific collaborations, has by now clearly demonstrated the value of the web as a common and necessary tool for research. In addition, it has enhanced the dissemination of scientific knowledge to the general public through the publication of online documents and other web-based media.
This document reports on an effort to explore the usage of the web for archiving of “content-rich” material, which we define to be lectures, seminars, or other events which include audio, video and visual support materials. The work targets a segment of the tasks that must be completed for scientists and others to optimally draw upon the web for transmitting information for training and archival purposes, as well as for keeping colleagues informed of strategic, technical and administrative decisions.
CERN was chosen as a focal point for this research because of its historical participation in web development, its continuing role as a center for scientific research and information exchange, its rich education and training programs, and the new challenges it faces during the current construction and future running of the next generation of experiments for the Large Hadron Collider (LHC). These experiments will be run by teams of sizes heretofore unseen in most sectors of the scientific community, with thousands of members literally spread around the globe.
The involvement of the University of Michigan in the CERN ATLAS experiment, as one of that experiment’s largest groups, is one of the reasons for its interest in this project. Augmenting this reason are the roles played by the University of Michigan in the inauguration of U.S. participation in the CERN Summer Student program, along with its affiliation with Internet2, and its work in bringing CERN into Internet2. These reasons, together with the presence of pioneers at the University in the development of multi-media educational tools, all provided a shared rationale and stimulus for the University of Michigan and CERN to examine the possible future role of web-based archiving in the general area of highly collaborative large-scale research involving universities and international laboratories.
With this background, a major focus of our efforts has been to investigate how to best facilitate the work of large, globally dispersed scientific collaborations. Another has been to study how to best reinforce CERN’s education and training programs and make them accessible to as wide a community as possible.
This paper seeks to examine the relevance of web-based archiving to this set of challenges. We approach the topic by examining how we have used web-based archiving to record a series of content-rich presentations at CERN over the past two years. The issues covered range from the technical details of how such recordings are made, to questions of how the technology can be improved, and how such material could be confederated to address certain larger goals.
2Project Motivation
In this section, we describe the principal motivations for our present study. Though we cite specific applications, the results presented herein have a clear relevance for a variety of scientific fields and educational venues.
2.1Communication in modern high-energy physics experiments
A prime motivation of the WLAP project was the hope that web-based archiving technology could address some of the key challenges that face the high-energy physics community. To understand these challenges, it is instructive to consider the anatomy of a modern high-energy physics experiment. Once a set of physics goals has been established for an experiment, the achievement of these goals requires the massive generation and refinement of novel ideas for how to solve the myriad of attendant technical problems. A talented set of individuals must be assembled to assimilate these ideas, to design and build the detector components, and to integrate the components into the overall detector, a process that can span a decade and involve extensive communication among thousands of experts in numerous countries. The running detector must be maintained and monitored and the resulting data must be analyzed, an activity that may involve yet another decade. The required funding must be repeatedly applied for, dispersed, expended, accounted for and reported on. At every stage extensive communication is required among dispersed participants. In such scientific enterprises the communications aspect can rapidly become as daunting as the scientific and technical challenges themselves.
LHC experimental collaborations will have hundreds of participating university groups. Depending upon the precise responsibilities accepted by a group, it may have the need to interact daily with colleagues at a half-dozen other universities. Many practical questions quickly arise. For example, how does one convene frequent meetings with colleagues in real-time on modest budgets, when time differences of as much as 12 hours are involved for the participants, many of whom have major responsibilities in addition to those that form the focus of the meetings? How does one offer a tutorial to two thousand colleagues on some paradigm he or she has developed, when initially there are only one or two true “expert peers” on a given topic in the entire collaboration and it is your job to train all of the others?
Since many of the large experiments may run for as long as twenty years, involving numerous generations of Ph.D. students, how is information recorded and passed on to subsequent generations? When major talks are given on results from a running experiment by the author of a particular analysis, how does that information get captured and made available to members of the collaboration who may only be able to access the talk hours (or years) later? How do findings and major strides emerging from these experiments get captured and interwoven into classroom materials for later presentation?
These are but a few of the questions that arise in the conduct of large high-energy physics experiments. Given the nature of the World-Wide Web and the original function seen for it, one would naturally be led to inquire if the Web itself might provide possible solutions to facilitate the communication requirements of the very large experiments it had helped make scientifically possible.
2.2Enhancing learning capability and dissemination of education and training
There is another motivation for our work in the area of web lecture archiving: its potential use in education and training. Traditional lectures and seminars follow a sequential pattern in which the lecturer prepares a presentation and delivers it, often accompanied by visual support material. The delivery mechanism can vary in style, with the lecturer using different techniques for displaying the visual support material, for example, an overhead projector, a computer slide projection or a blackboard. Questions may be taken during the presentation, at the end, or not at all. In each case, students must rely on their notes and/or a copy of the support material to recall the key points of the lecture at a later date.
People who are unable to attend or miss a session have to make do with a copy of the visual support material when it is available and even in the best of cases find it very difficult to reconstruct in detail what has been presented verbally.
Having access to some form of audio/video reproduction of the original lecture, however, can greatly facilitate the learning process and allow many more people, in addition to those who physically attend, to benefit. Such a reproduction can exist in a variety of media, including audio or video recordings. Unfortunately, the dissemination of the material on audio/video tapes is cumbersome, thereby limiting access.
Again, recent technological developments based on the accessibility of the Internet and the widespread utilization of the World-Wide Web lead us to conclude that these difficulties can now be overcome.
3Project Implementation
3.1The pilot project
The Web Lecture Archiving Project (WLAP) activity started in 1999 as a pilot-project [2] funded by the U.S. National Science Foundation and the University of Michigan (UM). The primary aim was to examine the feasibility of using a software tool, called Sync-O-Matic [3] , to record and archive slide-based lectures in a variety of situations. Following the success of the pilot project [4] , a collaboration was formed between the CERN HR Division Training and Development group [5] , the UM ATLAS Group [6] and the UM Media Union [7] , supported by CERN IT Division. The objective was to demonstrate the feasibility and usefulness of archiving lectures, seminars, tutorials, training sessions and plenary sessions of ATLAS experiment meetings by focusing first on the archiving of the prestigious CERN Summer Student Lectures.
3.2The archive application
The Sync-O-Matic application, successfully tested at CERN during the pilot project described above, was adopted for the implementation of the joint project. Sync-O-Matic was written by Charles Severance, then Associate Director of the University of Michigan Media Union. Documentation describing the software can be found on its web site [3] . It is freely available and there is a mailing list of users and developers who can be contacted for help and support.
Sync-O-Matic produces slide-based web-lectures for viewing with a standard web browser and the freely available RealPlayer plug-in. Its output is a multi-media lecture that combines the audio and video playback of the lecturer with digital images of the visual support material, synchronized to the video, and displayed in a browser window. Figure 1 illustrates an archived lecture. Note that the video and slide indexes can be used to rapidly locate sections of the lecture or to review the slides in order to select specific topics.
Figure 1: A typical archived lecture as viewed from a web browser. The video image of the speaker appears in the upper left-hand corner of the page. The visual support material (in this case, scanned transparencies) appears in the large window on the right. The changing of the transparencies is synchronized to the timing of the video.
These important features distinguish Sync-O-Matic from the historically common approach to lecture archiving based on a video recording of the event which combines views of the speaker and visual support material. Such a video is necessarily a compromise, as a choice has to be made between focussing on the lecturer or support material, or more precisely part of this material to facilitate readability. The camera operator therefore makes the choice for the viewer as to what is of primary interest at any time. This is in marked contrast to a Sync-O-Matic lecture in which the speaker and support material are always in view.
Another drawback of a standard video is that although good video resolution can indeed reproduce the slides in a readable format, it does so only at a significant cost of network bandwidth and archive size. For example, we find that using MPEG-1, a bandwidth of 1500 Kb/s is necessary to ensure readability using historical approaches, to be compared to a typical Sync-O-Matic archive that can be readable at 50 Kb/s.
Regardless of advances to the technology, there will always be some inefficiency introduced by the transmission of a video stream rather than a fixed image. In addition, the video stream lacks the slide preview and rapid search/location functionality provided by the indexed Sync-O-Matic archive. As we will discuss below, such indexing could also be exploited for the development of web-based lecture databases and search engines.
3.3The archive process
Sync-O-Matic was originally designed for use by an individual teacher, operating alone or in a staffed distance-learning studio, using Microsoft PowerPoint materials [12] . In this mode, Sync-O-Matic imports a PowerPoint file and converts the slides to GIF/JPEG images as shown in
Figure 2.
While the teacher gives the lecture, Sync-O-Matic records the audio and video from a microphone and camera using the RealProducer [8] ActiveX control. As the lecturer changes slides, Sync-O-Matic records those actions and the timing of each action in internal text files.
Figure 2: Sync-O-Matic standard operational procedure. The speaker uses Syncomatic on a PC connected to a camera and a microphone. The speaker loads the PowerPoint file into Syncomat, and then starts recording the audio and video, and changes the slides as he/she talks. The capturing component of Syncomatic creates internal files with the audio, video, slides, and timing information. Once the recording is over, the speaker can publish the web lecture in a format suitable for a CD-ROM or for the Web. The “Style files” can be edited to change the structure and the “look and feel” of the web lectures.
At the end of the presentation, two archived lectures are produced, one suitable for viewing with a standard web browser from CD-ROM, and another suitable for viewing directly from a web server. The look and feel of the resulting lectures is controlled using Sync-O-Matic specific “style files”. These style files are written in HTML with some Sync-O-Matic specific markup included. The result of the publishing process is a directory with several HTML files, text files used to allow random navigation of the lecture, and the media files.
The media files make up the bulk of the disk usage. An average quality video and high quality audio (160x120 pixels, 24kps video + 16kbps audio) lecture requires about 36 Mbytes per hour of lecture. This relatively small amount of disk storage allows 15-20 hours of lectures to be stored on a single CD, and a large number on a web server without resorting to exotic storage technology. To watch a web lecture the user can use any popular web browser. The first time the user views a lecture, it may be necessary to download and install the RealPlayer software (the free version is sufficient), although this software is now commonly bundled with most web browsers. JavaScript is not required but some advanced navigation features are available if it is enabled.
Although adequate to meet the requirements of a lecturer in a teaching environment as described above, one of the major challenges of the WLAP project was to extend the functionality of Sync-O-Matic beyond its original design in order to handle the demands of live lectures and to cope with a number of challenging recording scenarios, such as hand-written transparencies and blackboard material. The various scenarios encountered and the solutions developed are detailed in section 4 below.
3.4The WLAP archive
During the pilot project, a significant portion of the 1999 CERN Summer Student Lecture Series was recorded and made available to the participants and to the general public. As a result, significant demand was generated at CERN and throughout the physics community to continue the recordings on a regular basis.
The CERN HR Division Technical Training group, supported by the CERN Amphitheatre technical support team, took up the challenge of demonstrating the feasibility of recording a wide variety of lectures, while simultaneously investigating and applying technical advancements to simplify the process and to reduce the manpower needs. Over the next 15 months many important CERN colloquia, Technical Training seminars, Academic and Summer Student Program lectures, and software training tutorials were recorded, either by request, or for the purposes of testing the technology under a variety of recording conditions.