APPLICATION FOR RENEWAL OF CODATA TASK GROUP FOR PRESENTATION TO THE 28th CODATA GENERAL ASSEMBLY,
Taipei, 1-2November 2012
Task Groups should endeavour to support CODATA’s strategic objectives and activities where appropriate, as articulated in the CODATA Strategic Plan, 2006-12.
The Strategic Plan is currently being updated for the 2013-18 period.
Task Groups may also wish to explore synergies with ICSU strategic priorities
(
1. Name of Task Group
CODATA-ICSTI Task Group on Data Citation Standards and Practices
2. Objective(s) of Task Group
The following were the objectives presented in the first Task Group proposal in 2010:
The need for robust data citation capabilities
The growth of electronic publishing of literature has created new challenges, such as the need for mechanisms for citing online references in ways that can assure discoverability and retrieval for many years into the future. The growth in online datasets presents related, yet more complex challenges. Data citation standards and good practices can also form the basis for increased incentives, recognition, and rewards for scientific data activities that in many cases are currently lacking in all fields of research. The rapidly-expanding universe of online digital data holds the promise of allowing peer-examination and review of conclusions or analysis based on experimental or observational data, the integration of data into new forms of scholarly publishing, as well as the ability for subsequent users to make new and unforeseen uses and analyses of the same data – either in isolation, or in combination with other datasets.
This promise, however, depends upon the ability to reliably identify, locate, access, interpret and verify the version, integrity, and provenance of digital datasets. The problem of citing online data is complicated by the lack of established practices for referring to portions or subsets of data. As funding sources for scientific research have begun to require data management plans as part of their selection and approval processes, it is important that the necessary standards, incentives, and conventions to support data citation, preservation, and accessibility be put into place. There are, in fact, a number of initiatives in different organizations, countries, and disciplines already underway. An important set of technical and policy approaches have already been launched by the Internet Engineering Task Force (IETF), the U.S. National Information Standards Organization (NISO), and other standards bodies regarding persistent identifiers and online linking, including the Open Archives Initiative Object Reuse and Exchange (OAI-ORE)and InfoURI. Another important group is DataCite. The World Data System is also focusing on these issues. Other initiatives remain ad hoc and uncoordinated.
The proposed CODATA Task Group, being organized jointly by several CODATA committees and International Council for Scientific and Technical Information (ICSTI), together with representatives from several other organizations, would examine a number of key issues related to data identification, attribution, citation, and linking, help coordinate activities in this area internationally, and promote common practices and standards in the scientific community.
Issue Areas in the Development and Implementation of Scientific Data Citation There are many issues that would need to be addressed in establishing data citation standards and good practices. Below is a description of some of the topics that the Task Group would address.
Technical
1. Interoperability and Facilitation of Re-use. There is already considerable diversity in database formats, such as various flat-file, hierarchical, relational, object-oriented, and XML-based databases. There is every reason to expect that new modalities and formats for storing and manipulating digital data will continue to emerge.
2. Citation Formats. What data citation conventions have been developed already? How are they similar and how do they differ? Can they be standardized and if so, how? It should be noted, however, that citation formats are not major considerations compared to the difficulty of determining the unit of data or the identity of that which is to be linked (Cole 2008).
3. Metadata. How do metadata conventions or standards affect attribution and citation of data?
4. Database Versioning. Datasets are more dynamic than documents, and this creates additional challenges for citation practice. When should the dataset as a whole be cited? How can a specific, time-fixed version be cited? What changes to the data constitute a new contribution or added value? How should this be acknowledged? How are database versions controlled and labelled? How to cite and give credit to data compiled from a network of integrated databases?
A crucial dimension in this regard is provenance and how it is related to the need for attribution and citation. What attributions are needed given the complex provenance that is common for many types of data? How does one cite data that has been through many stages of transformation, some of them adding significant value and some trivial? How to enable citation and acknowledge data sources without hampering interoperable data systems?
Scientific
The creators and users of online scientific datasets may have diverse needs that should be considered in the development, management, and use of scientific data in different discipline or research contexts. They also may have different needs regarding persistent identifier standards and models. For example, different disciplines may have disparate needs for granularity at which digital “objects” are identified. Some need geospatial metadata while others do not. What are the differences among disciplines that need to be addressed distinctly?
Institutional Roles
Successfully developing and implementing data citation practices and standards requires the participation by all major groups with the research community. What are the roles in this regard of the respective stakeholders in the system—the data managers, researcher umbrella groups, universities, libraries, publishers, and research funders? What are the implications for these stakeholders? Does this vary by major field of science or type of research?
Intellectual Property Rights and Licensing
Any registry system must accommodate traditional intellectual property rights (IPRs), such as those established through copyright, as well as emerging mechanisms of “some rights reserved”, such as Creative Commons and Science Commons licensing.
Various important issues arise from data ownership, control, and IPRs. These are key drivers behind the different attitudes and practices toward data attribution and citation in different fields and countries, but relatively little work has focused on sorting out these issues. Principles and practices that have been tried in different contexts need to be identified, and approaches that are more appropriate in the digital age should be explored. A recent OECD study (OECD 2008) addressed publicly funded data in this way, but both public and private data need to be considered. This is important because the willingness of individuals and institutions to accept and use different attribution, citation, and reuse frameworks will depend to a large extent on the real and perceived ownership, control, and IPRs associated with various databases.
Socio-cultural and Community Norms
A major reason for promoting the adoption of standard data citation practices is to develop a common basis and community of practice for recognizing and rewarding data work and incentivizing disclosure of data in interoperable and quality controlled ways. What are the factors that need to be considered in this area? Of particular interest is how such data management activities might impact the personal performance evaluations of scientists and the reward and promotion structures in science. Another potential area of inquiry would be how citations of databases could be used as Science Indicators.
Attribution is not quite the same as citation, although citation is one of the ways of giving attribution. Licences akin to Creative Commons (CC) may require attribution, but this can result in “attribution stacking”, where the work of hundreds or even thousands may need to be acknowledged. The route through this may be by establishing community norms for what are acceptable levels of attribution for datasets. Creative Commons and Science Commons recentlyadded cut-and-paste citation support to their new version of the CC0 deed and to our norms documents (see click through to the ones with metadata to see examples).
Persistent Digital Identifiers and Institutional Sustainability
In a field that requires a lot of granularity in data use, even nominal registration fees per object can quickly become cost-prohibitive. In order for a data citation system to be useful, it must be accessible and its costs affordable by all necessary user communities.
It is important to consider data citations in the context of the semantic web. Online, the reference becomes “actionable”—the user wants to link directly to the item being cited. Distributed, linked technologies actually take us back to the original intention of citations, which is to enable the reader to discover, retrieve, and verify the identity of the referenced item. Bibliographic references presume that the desired object exists in multiple printed copies, and that any copy will do. In a digital world, only one “copy” exists. It is that copy that must be discoverable and retrievable. That sought item needs a persistent identifier. Normally, that persistent identifier is a URI.
The semantic web is predicated on linking of persistent identifiers. That model would be more inclusive and forward looking than the present framing. Within the semantic web framework, progress is being made on modelling relationships between scholarly objects. The technical standards now in place are the Open Archives Initiative – Object Reuse and Exchange protocol (OAI-ORE) (Pepe, Mayernik, Borgman & Van de Sompel, 2010).
As noted above, there is a need for registration and persistent identification for online digital datasets. Some registry and resolution models for this function have already emerged, but the various models – for-profit vs. not-for-profit, public vs. private, etc. – must be examined to assure that they are sustainable in the long term. Moreover, just as the persistence of the connection from print citations to the correct physical copies depends on libraries or publishers keeping, the persistence of the connection between data citation and the actual data ultimately must also depend on some form of commitment by durable institutions to preserving data that is cited.Although a top down, centralized archive that keeps and organizes all data is an obviously attractive concept and works in some fields, creating such a trustworthy structure is probably not feasible universally, especially given the huge increases in the amount and types of data being generated or used by the scientific community. Distributed approaches to preservation such as institutional repositories, the Data Preservation Alliance for Social Science, and LOCKSS are emerging examples of alternatives to the centralized archiving model.
Other Issues
There are certain to be other important elements to the proper development and implementation of data citation standards and good practices, especially discipline-specific ones that may be identified by the Task Group as it undertakes its activities.
3. Current Membership (Please give institution, area of expertise, telephone, and e-mail of each member)
Co-Chairs
Co-Chair, Bonnie Carroll (U.S. CODATA and CENDI)
President, Information International Associates
104 Union Valley Road
P.O. Box 4219
Oak Ridge, TN 37831-4219
USA
Tel.: +1 865 298-1220
e-mail:
Co-Chair, Jan Brase (Director, DataCite, and ICSTI representative)
Technische Informations Bibliothek (TIB)
German National Library of Science and Technology
Welfengarten 1b
30167 Hannover
GERMANY
Tel.: +0511 762 19869
e-mail:
Co-Chair,Sarah Callaghan (U.K. CODATA)
The NCAS British Atmospheric Data Centre
STFC Rutherford Appleton Laboratory
RAL Space
R25 - Room 2.05
Harwell Oxford
Didcot
OX11 0QX
England, UK.
Tel.: +44 1235 44 57 70
Membership List
(in alphabetical order)
Micah Altman
Senior Research Scientist
Institute for Quantitative Social Science
HarvardUniversity
1727 Cambridge St, K325
Cambridge, MA 02138
USA
Tel. +1
e-mail:
Elizabeth Arnaud
Project Coordinator
Understanding and Managing Biodiversity Programme
Bioversity International
Via dei Tre Dinari, 472/a
00057 Maccarese
Rome
ITALY
Tel. + 39 066118323
Email:
Christine Borgman, Professor and Presidential Chair,
Department of Information Studies, University of California, Los Angeles
Box 951520, UCLA
Los Angeles, CA90095-1520
USA
Tel.: +1 310-825-6164
Email:
Todd Carpenter
Managing Director
National Information Standards Organization
One North Charles Street
Suite 1905
Baltimore, MD21201
USA
Tel: +1 301-654-2512
Fax: +1 410-685-5278
Email:
Dora Ann Lange Canhos
Director, CRIA
Av. Romeu Tórtima 388, Barão Geraldo
13084-791 Campinas, SP
BRAZIL
Tel.: +55 19 3288 0466
Email:
Vishwas Chavan
Senior Program Officer for DIGIT
Global Biodiversity Information Facility
Universitetsparken 15
DK 2100, Copenhagen
DENMARK
Tel. +45 35 32 14 75
Email:
Nathan Cunningham
British Antarctic Survey
Madingley Road, High Cross
Cambridge
Cambridgeshire CB3 0ET
UNITED KINGDOM
Tel.: + 44 1223 221400
Email:
Michael Diepenbroek
WDC-MARE / PANGAEA -
MARUM - Center for Marine Environmental Sciences
University Bremen
Leobener Strasse
POP 330 440
28359 Bremen
GERMANY
Tel.: +49 421 218-65590
Email:
John Helly
Senior Staff Scientist
San DiegoSupercomputerCenter
Scripps Institution of Oceanography, Climate, Atmospheric Science, and Physical Oceanography
University of California, San Diego
USA
Tel.:+1 760 840 8660 or +1 858 534 5060
Email:
Jianhui LI
Director, Scientific Data Center
Computer Network Information Center
Chinese Academy of Sciences
4th South Street,Zhong Guan Cun, Haidian Distict
Beijing, 100190
CHINA
Email:
Brian McMahon
Research and Development Officer
International Union of Crystallography
5 Abbey Square, ChesterCH1 2HU
UNITED KINGDOM
Tel: +44 1244 342878
Fax: +44 1244 314888
Email:
Karen Morgenroth
National Research Council Canada
Canada Institute for Scientific and Technical Information
1200 Montreal Road, M-55
Ottawa, ONK1A 0R6
CANADA
Tel.: +613 998 8396
Email:
Yasuhiro Murayama
Director, Integrated Science Data System Research Laboratory
National Institute of Information and Communications Technology
4-2-1 Nukui-kita, Koganei
Tokyo 184-8795
JAPAN
Tel:+81-423-27-6685
Fax:+81-423-27-6678
Email:
Soren Roug
EEA Coordinator GMES Bureau
European Environmental Agency
Avenue D'Auderghem 45 - BREY 9/211
B - 1040 Brussels
BELGIUM
Tel.:
Email:
Helge Sagen
Head of Norwegian Marine Datacentre
Institute of Marine Research
Pobox 1870, Nordnes
5817 Bergen
NORWAY
Att: Helge Sagen
Tel. +47 55 23 84 47
Email:
Eefke Smit
International Association of STM Publishers,
Director, Standards and Technology
Prins Willem-Alexanderhof 5
2595 BE The Hague
THE NETHERLANDS
Tel. +31 654 321 371
Email:
Martie J. van Deventer
Portfolio Manager
CSIRSouth Africa
Information Services
PO Box 395
Pretoria0001
SOUTH AFRICA
Tel: +27 12 841-3278
Email:
John Wilbanks
Vice President, Creative Commons
Director, ScienceCommons
171 Second Street
Suite 300
San Francisco, CA94105
USA
Tel: +1 617 838 6333
Email:
Koji Zettsu, Director, Information Services Platform LaboratoryNational Institute of Information and Communications Technology, JAPAN
3-5, Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0289, Japan
Tel: +81-774-98-6921
Fax: +81-774-98-6960
Email:
Consultants
Daniel Cohen
Program Officer, NRC Board on Research Data and Information, and
U.S. Committee for CODATA
[on detail from the Library of Congress]
National Academy of Sciences, Keck-
500 Fifth Street NW
Washington, DC 20001
USA
Tel.: +1 202 334 1253
Email:
Franciel Linares,Technical Director, Information Management &TechnologyProgram,
Information International Associates, USA
104 Union Valley Road
Oak Ridge, TN37831-4219
Tel.: (865) 298-1226 (Office) (865) 363-8632 (Cell)
Email:
Yvonne Socha, MLIS candidate
University of Tennessee
PO Box 16635
Knoxville, TN37996
USA
Tel.: 865-742-3478 (cell)
Email:
Paul Uhlir
Director, NRC Board on Research Data and Information, and
U.S. National Committee for CODATA
National Academy of Sciences, Keck-511
500 Fifth Street NW
Washington, DC20001
USA
Tel.: +1 202 334 1531
Fax: +1 202 334 2231
e-mail:
4.Planned Changes in Membership (Please give institution, area of expertise, telephone, and e-mail of each member; indicate if the individual has been invited to participate and agreed to serve, or if the individual has not yet been contacted. Note also that participation by scientists from around the world as members of the Task Group and in Task Group activities is strongly encouraged. Particular attention should be given to gender balance, and representation from developing countries.)
We have already added several members, consultants, and young scientists and do not plan to add any more.
5. Please indicate whether young scientists are going to be involved in this Task Group. If so please provide details.
Participants in Task Group activities include young scientists:Sarah Callaghan, Franciel Azpurua Linares, of Information International Associates, Matthew Mayernik, NationalCenter for Atmospheric Research, Laura Wynholds, UCLA, Jillian Wallis, UCLA,and Yvonne M. Socha, University of Tennessee. Sarah and Franciel are under 35 years old and the others are under 30.
Sarah Callaghan is one of the three co-chairs of the Task Group.
Franciel Azpurua Linares provides ongoing support to the Task Group through coordination of activities, and research.
Matthew Mayernik, National Center for Atmospheric Research, Laura Wynholds, UCLA, Jillian Wallis, UCLA,served as rapporteurs for breakout sessions of the workshop “For Attribution:Developing Data Attribution and Citation Practices and Standards”,held in Berkeley, CA in August 2011.
Yvonne M. Socha is analyzing the Task Group’s bibliography of the literature relating to data citation and attribution practices.
6. Summary of activities since the 2010 General Assembly.
Commencing in March of 2011, we have heldapproximately monthly teleconferencesof the Task Group co-chairs and quarterly teleconferences of the full Task Group.
We have formed working groups for specific activities (Bibliography, Stakeholder Survey, Website/Intranet, Standards and Best Practices). The identified leads and members of these working groups have conferred regularly between teleconferences of the full Task Group, and reported upon progress in these activities. They have shared drafts of the survey (interview) questions, outlines of white paper topics, and developed lists of survey respondents.