Open data is simply data that is freely available for anyone to access, use and share. This section provides a brief overview of the current issues relating to open data found on the web, rather than licensed datasets which might be free at the point of use in a university library. The sources of open data on the web are almost inexhaustible and defy precise classification, but to be considered truly and technically open, the dataset has to be clearly licensed as such and published in an easy-to-use and machine-readable format.
The awareness and availability of open data has grown in parallel with other open source initiatives, notably open access publishing. However open data is currently not subject to many of the same well-defined procedures as publishing nor is access to datasets typically controlled by commercial vendors. Most open data fall into two broad categories; datasets created by an agency for a specific purpose (e.g. census data) and data as the by-product of academic research. The former usually has a unified format and supporting documentation, the later can be seen as a minestrone of data with no effective standardised format or requirement to provide adequate supporting information. Initiatives are underway to remedy the lack of standardisation, principally from research funders and the FAIR data movement, each with the goal of making data Findable, Accessible, Interoperable and Reusable (FAIR). Strathclyde University has similar ambitions, seeking to improve research quality and impact by “progressing towards fully open access publishing and FAIR open data policies” (University Strategic Plan 2020-25).
Open access publications are subject to well defined and understood procedures (peer review, acknowledgements, references, bibliography) not currently applied to a large proportion of open research data. The emergence of Data Access Statements within publications is improving the requirement for, and visibility of, open linked data, but until data is treated as an integral part of the research process, recognised and rewarded in research assessments for example, the ease of (re)use of open data will remain problematic.
The main issue with open data therefore is not the quantity of readily available data, but whether the data is in a useable format with sufficient documentation so that it can be repurposed for your specific research requirements.
As previously noted the web can be regarded as an endless source of open data, some of it structured and documented but most of it unregulated and uploaded to various platforms with little thought to how it might be usefully re-used by the global research community. Finding open data is easy; finding open data that suits your research needs or is able to reproductively verify the findings of a research publication will probably require additional input on your part.
Given the daunting amount of data resources available the best place to start looking for suitable data is among peers, your subject area community and social media hubs. In many cases, a simple Google search will identify many of the Open Data resources you are looking for. Reliable sources of data will always be reported and shared, trustworthy data will usually be hosted by a reputable institution, have sufficient documentation to understand the data, have a licence attached and provide a Digital Object Identifier (DOI) for citation and attribution.
You may find the following sources helpful as a starting point:
Some research funding bodies require data to be deposited in specific discipline-based repositories. Examples include:
It is highly unlikely that an open dataset exists which exactly matches your research requirements, in practice open data generally serves two main functions and both require addtional input in order to be useable:
Trust is mainly based on the provenance of data, so the quality of the documentation or metadata which accompanies open data is of paramount importance. Initiatives such as FAIR data, the CoreTrustSeal of certified repositories, and some publishers are gradually building a framework where open data can be trusted, verified and reused in academic research.