CRISs : EMERGING INTO THE LIGHT ?

Keith G Jeffery

Head Systems Engineering Division

Rutherford Appleton Laboratory

Chilton, Didcot, OXON OX11 0QX UK

Tel: +44 1235 44 6103 Fax: +44 1235 44 5831

Email:

WWW: http://www.cis.rl.ac.uk/

1.0. INTRODUCTION

There has been Europe-wide interest in CRISs (Current Research Information Systems) since the eighties. Before that time individual organisations had built systems which were (and are) incompatible in data content and structure, and in processing intent or purpose. Since the start of interest in a Europe-wide CRIS facility, there has been disappointing progress. The publication of the CERIF standard caused some new systems to be built to that standard, and some existing systems to provide an output facility that is CERIF-compatible. However, for the R&D manager of government funds, or for the individual researcher trying to find a possible co-operating partner, there is no European facility.

As a community we have failed our users / customers. Either what we are trying to do is insufficiently important to the EU and its member countries (i.e. there is no real problem to be solved), or we are failing to provide credible solutions (the problem is not solved by our proposals). We are in the dark.

This paper proposes that we:

(a) analyse the lack of success to date;

(b) put in place a plan for a ‘one last chance’ pan-European project;

(c) make it work.

In this way heterogeneous CRISs linked together to form a pan-European view may emerge into the light.

2.0 CRIS93 CONCLUSION

The paper ‘Campus Wide Information Systems: Impact on Current Research Information’ from CRIS93 (Je93) concluded:

(a) CWISs will impact CRISs strongly in several ways:

(1) by making information available;

(2) by stimulating user demand for accuracy and timeliness;

(3) by encouraging information provision as well as retrieval;

(b) The exposure of CRISs in this way will create a demand for integrated information;

(c) The CERIF standard will provide syntactic (and limited semantic) integration of data for information at the project summary level. There is a need for information at other levels. The effectiveness and efficiency of CERIF needs to be tested;

(d) General syntactic and semantic integration requires the use of a KBS assist with user, task and domain models of knowledge;

(e) The exposure of CRISs will also create a demand for integration into the working environment of the end-user:

(1) by providing workflow and active document facilities;

(2) by providing manipulation facilities for modelling and graphics;

(3) by facilitating integration of the information into the office environment including documents, electronic mail, fax, telephone, diary etc.;

There is a requirement, and a market, for CRISs out there. The problem is to shed the rather old-fashioned, monolithic, information retrieval technology image with teletype-style user interface and become part of the 'desktop with connection' culture, including the notebook computer with cellular telephone connections for the European executive on the move.

It is time to analyse the progress we have made against these objectives. The analysis is not encouraging.

3.0 ANALYSIS OF PROGRESS

3.1 CRISs

3.1.1 Individual Systems

In many countries, probably most countries, government agencies for R&D funding have built systems to administer the funding. As a by-product they have produced a CRIS which is more-or-less compatible with CERIF.

In some countries there exist government funded or supported information services, with responsibility for public dissemination of information. Usually these are related to bibliographic information but may well have available a CRIS. In UK, for example, CRIB (Hu93) was maintained by the British Library until its commercialisation.

Commercial information services exist in some countries. In the UK CRIB has been taken over from the British Library and, together with BEST (UK), forms the basis of the commercial CRIS provided by Cartermill International (previously Longman Cartermill).

3.1.2 Collectively Standardised Systems

EXIRPTS (NaJeBoLaVa92) following the IDEAS project (JeLaMiZaNaVa89) was perhaps the first attempt to provide a collectively standardised end-user system over heterogeneous CRISs. The project has been described fully elsewhere, but consisted essentially of a common catalogue to tame heterogeneity - in information content, structure, language - and provide first-level retrieval to limit the cost and time delay in retrieval of full information.

DERPI (NaSv93) drew upon some of the ideas of EXIRPTS and also the CERIF standard. Utilising CD-ROM rather than on-line technology, it has achieved an important data collection at low cost with easy-to-use retrieval facilities based on a pre-existing system at CNR-SIAM.

Combiformat is a meta-level data standard supported and used by VSNU, NBOI, and several Universities in the Netherlands. Based firmly upon a classical Entity-Relationship model at conceptual level, and combined with a strategy for homogeneous systems at universities accessed nationally for comprehensive information.

BEST is a proprietary system including a standard format and a well-known commercial information retrieval system. The information is also published through other media (CD-ROM). BEST(UK) covers UK researchers and projects where the information has been submitted. BEST (Europe) (St93) is based on agreements with organisations in various European countries and is of the same form as BEST (UK).

CORDIS (Vo93) is the information service of the EU. It contains many useful databases, and among them a CRIS for EU-funded projects. In this way it is like a national individual system, but the pan-European nature of the projects described means that it is best classified as a collectively standardised system. It is both on-line and on CD-ROM.

3.1.3 Other Initiatives

ISPRA proposed, and put out a tender for implementation of, a system to provide uniform user access to heterogeneous CRISs throughout Europe. The Q2 project (Ma93) is now underway, and the specification required a challenging use of modern technologies (including knowledge-based assists) to tame the heterogeneity.

3.1.4 CRIS Progress - Conclusion

It is clear from the above that:

(a) individual CRISs are still being developed for specific purposes;

(b) some of them are collectively standardised;

(c) most of them have at least CERIF export / import capability;

(d) some are just starting to utilise more advanced technologies such as knowledge-based systems;

(e) most are idiosyncratic in their inter-relationship with other information sources and supporting technologies such as an office environment.

It is disappointing that, with the exception of Q2, there is no real pan-European initiative, nor use of advanced technologies. In order perhaps to encourage thinking about the technologies to be employed, the next section reviews developments relevant to CRISs.

3.2 Relevant Technologies

3.2.1 WWW

WWW (World-Wide-Web) is perhaps the most important phenomenon to hit information technology. It was identified in (Je93) as an important technology. The phenomenal growth rate (300,000% per year) is breathtaking. It utilises a simple protocol (http) for information transfer on internet and hyperlinked multimedia documents (using html as the markup language) are accessed. It is a ‘walk-up-and-use’ system, and the software is free (although commercial versions give better facilities and performance). In one leap it has revolutionised information systems, document systems, database access, document authoring, electronic publishing, electronic forms and (limited) workflow.

Of course it has disadvantages; performance is variable, information retrieval is extremely crude and inefficient and html is a severely restricted subset of the facilities expected in rich authoring environments. Interconnection with databases, statistics, graphics and office systems is poor. However, there are many R&D projects ongoing to rectify these deficiencies.

It should be noted that CORDIS is available via WWW (http://www.echo.lu/). However, more significantly, most internet sites have a WWW server and on it are to be found sets of information on projects, people, publications and products. Perhaps the pan-World (not just pan-European) CRIS problem is already solved - and it would be but for these problems mentioned above. However, WWW may well be showing us the way forward.

3.2.2 Libraries

Some libraries are grasping the nettle and avoiding extinction by converting to digital, on-line libraries. A large NSF (National Science Foundation) -funded project in the USA involving 6 major centres is co-ordinated to investigate different aspects of this library revolution. Clearly there are issues of copyright, charging, security, privacy and others. Within UK there are small signs; in DRA (Defence Research Agency) the library now comes under the control of the Information Technology Director. Since technical reports (grey literature), paper pre-prints and even software products are now published freely using WWW, conventional publishing and conventional libraries have to evolve to meet the new requirements placed upon them.

3.2.3 Heterogeneous Distributed Information Systems

There is still a great deal of R&D interest in this area. Groups are experimenting with:

(a) meta-translation techniques using reverse engineering of logical schemas to conceptual level and then schema matching followed by forward re-engineering to achieve a logical compatibility (FoGr92);

(b) object oriented techniques for interoperability (Br92);

(c) use of knowledge-based systems as assists to provide heterogeneous schema reconciliation using general and domain knowledge (Je94);

(d) use of intelligent agents to mediate between heterogeneous information systems (BaWi93);

(d) use of data exchange mechanisms, ranging from simple schema plus file (SuJeGi77) through international standard schema plus file (Th90) to advanced intelligent hypermedia exchange (KoJe94).

3.2.4 Hyperlinked Multimedia

Starting from the ideas of Ted Nelson and his Xanadu System (themselves traceable to Vannevar Bush (Bu45)) hypertext developed (Co87). The idea of hyperlinking blocks of related text is intuitively appealing. If the blocks of text are replaced by tables of results, graphics, images, video, sound then the hypetext becomes hypermedia. It is this highly attractive presentation format - with embedded content and structuring - that has secured the dominance of WWW in the on-line world and various publications on CD-ROM in the off-line world. Present technology is limited in structuring capability and link semantics. The Hypermedata Project (KoJe94) utilises graph theory to allow the use of intelligence (first order semantics, second order syntax logic) on the arcs or links together with encapsulated objects at the nodes such that both a dynamic presentation and self-adaptive interchange capability is provided.

3.2.5 Workflow and active documents

In many organisations information about an entity is built with time. This is true of a research project, for example. Workflow describes the process of a succession of states of a document and transitions between the states when something is done to the document (e.g. an authorising signature, or addition of more data) in a process commonly triggered by an event.

The history of the document to date - and perhaps also a list of states yet to be passed - could be stored in the document. This makes it an active or eager document in the system. The concept of a self-adaptive active document allows interoperability between systems.

3.2.6 CSCW

Computer-supported Co-operative Work is not a new concept. At RAL a system for co-design was demonstrated in the early eighties. However, the ability to use telephone (and speak), videophone, fax, and email with one or more correspondents allows for true conferencing. If ready access to heterogeneous information sources, co-display of the search results, co-editing of a ‘shared blackboard’ and co-operative publication are added then fully supported co-operative work is possible.

3.2.7 Relevant Technology: Conclusion

It may be that users are not utilising CRISs because the effort to access them is too great, the information available is insufficiently accurate or up-to-date and the presentation capabilities for the information retrieved are too restricted. Adoption of the technologies outlined above would at least make CRISs more attractive to our users / customers and, with integration into the office environment, should encourage use.

4.0 WHAT IS NEEDED NOW

The proposition is that CRISs are necessary and required, but that the technology we have supplied so far has failed to overcome the inertia and learning curve barriers for the majority of our potential users / customers. Put simply, we have to make access to and utilisation of CRISs more attractive to our end-users.

4.1 Information Retrieval

Clearly the first important dimension in a pan-European CRIS is the linking of existing CRISs through heterogeneous distributed systems technology. Once all available CRISs are viewable through the same user interface, the inertia / learning curve barrier is lowered dramatically.

4.2 Information Provision

One of the greatest barriers to use of CRISs is the belief that they may contain information which is either of little relevance or is in an unusable format (or language). The provision of accurate and timely information is of critical importance for the credibility of any CRIS system. Anything that reduces the cost of information provision and raises its quality is automatically lowering the barrier to use. The solution is the population and update of existing CRISs through active documents / workflow.

4.3 Information Analysis and Presentation

The end-user wants to be able to utilise the retrieved information in whatever he / she is doing - producing a statistical compilation with tables and graphs, generating a report with quotable extracts from the information or simply scanning it for relevant information. The end-user wishes, therefore, to have the information presented into the set of tools he / she chooses to use for all their office environment work. Any alternative is clearly a barrier to use. The solution is integration into the office working environment including cscw, and the use of object linking and embedding technology is likely to be of the greatest importance.

5. CLOSING REMARKS

So, what do we do? A multi-country project team should request funding from the EU to set up a heterogeneous distributed system to incorporate existing and potential future European CRISs and provide information in a standard form into the office environment of end-users, encouraging the use of hyperlinked multimedia active documents, workflow and cscw as appropriate in data collection.

References

(BaWi93) Barsalou,T;Wiederhold,G:"Knowledge-directed mediation between Application Objects and Base data' in Data and Knowledge Base Intgration, Ed S.M Deen 1993

(Br92) Brodie,M.L:'The Promise of Distributed Computing and the Challenges of Legacy Systems' Proceedings BNCOD10, Aberdeen. LNCS 618 Springer-Verlag 1992

(Bu45) Bush,V 'As We May Think' Atlantic Monthly 176 1 July 1945

pp 101-108

(Co87) Conklin, E J 'Hypertext: An Introduction and Survey' IEEE Computer 2 9 Sept 1987 17-41