25 November 2010
Questions and Answers
This page aims to explain basic concepts important to the project:
- Q. What are digital repositories?
- Q. What are institutional repositories?
- Q. What is a data library?
- Q. What is secondary analysis?
- Q. What does the project mean by 'data'?
- Q. What is a dataset / data set?
- Q. What is open access?
- Q. What is open data?
- Q. What is metadata for?
- Q. What is the value of data repositories?
- According to the Digital Repositories Review, (Heery and Anderson, 2005)
a repository is differentiated from other digital collections
by the following characteristics:
- content is deposited in a repository, whether by the content creator, owner or third party
- the repository architecture manages content as well as metadata
- the repository offers a minimum set of basic services e.g. put, get, search, access control
- the repository must be sustainable and trusted, well-supported and well-managed.
- Institutional repositories are those that are run by
institutions, such as Universities, for various purposes including
showcasing their intellectual assets, widening access to their
published outputs, and managing their information assets over
time. These differ from subject-specific or domain-specific
repositories, such as Arxiv (for Physics papers) and Jorum
(for learning objects).
- A data library refers to both the content and the services that
foster use of collections of numeric and/or geospatial data
sets for secondary use in research. (Wikipedia, 2007, http://en.wikipedia.org/wiki/Data_libraries).
- Secondary analysis is common in the social sciences, whenever a data source is used that was collected by someone other than the researcher involved. It's a method that saves time and expense, as long as the data used is reliable and fit for purpose. Large-scale datasets such as surveys are frequently not 'exhausted' by the original data collector and are therefore a rich source for secondary analysis.
- By data, we do not mean a synonym for information. We mean research data, that which is collected, observed, or created, for purposes of analysing to produce original research results. This differs from what is commonly called research outputs, which are the peer reviewed, published papers/articles/books/presentations that are produced as a result of data analysis.
Research data may be created in tabular, statistical, numeric, geospatial, image, multimedia or other formats.
- Datasets (or data sets) are a group of data files--usually numeric or encoded--along
with the documentation files (such as a codebook, technical
or methodology report, data dictionary) which explain their production or use.
Generally a dataset is un-usable for sound analysis by a second party unless it is well documented.
- Open Access means access to material via the Internet in such a way that the material is free for all users to read and use. (Wikipedia, 2009, http://en.wikipedia.org/wiki/Open_access) The open access publishing movement was started by the Budapest Open Access Initiative and its signatories in February 2002.
- Open Data is a philosophy and practice requiring that certain data
are freely available to everyone, without restrictions from copyright,
patents or other mechanisms of control. It has a similar ethos to a
number of other "Open" movements and communities such as open source, and open access and open content. (Wikipedia, 2009, http://en.wikipedia.org/wiki/Open_data) The OECD and other international bodies have endorsed the concept that publicly funded research data should be made freely available to the public. "Mashups" or web services are often based on combining various sources of open data which would not be possible to create if restrictions on their use were in place.
- Metadata means information about a data item in the repository, including descriptive metadata such as title, and administrative metadata such as date of submission. Metadata in the repository can be searched by Google and specialist search engines and is important for tracking provenance of the dataset.
- Research has shown that researchers consider access to a publication’s primary source data a significant advantage to their own research. Digital repositories facilitate the trend towards global research collaboration. By sharing research data researchers enable both secondary analysis and the exploration of topics not envisioned by the initial investigator.
For further references see our References and Publications page.
For more information about the DISC-UK DataShare project contact the Project Manager -
Robin.Rice AT ed.ac.uk.