Although the word open might imply access, many times, it does not imply transmission, reproduction, or re-use of material, as seen currently with most government open data and recently discussed at the GovDatax event.[1] Recent laws require the federal government to make their public data available and they encourage agencies to share information between them. Still, in practice, there is a competing group of laws that restrict access to these very same data with a bulk of copyright restrictions, publicity and privacy rights that might be applicable, as well as contract limitations that fill the restrictions gap when no other law is available.

Since 1960 the U.S. government has been taking steps towards more open data. The 1967 Freedom of Information Act (FOIA)[2] provided the public the right to request access to records from any federal agency; the Chief Financial Officers (CFO)[3] Act of 1990 required detailed agency accounting and financial data to treasury; the Federal Funding Accountability and Transparency Act (FFATA)[4] of 2006 required the full disclosure to the public of all entities or organizations receiving federal funds; and most recently the Digital Accountability and Transparency Act of 2014 (DATA Act) standardized and publicized the federal spending data, also considered as the nation’s first open data law.[5]

But the real breakthrough for openness in data sharing came this year with the Foundations for Evidence-Based Policymaking Act of 2019 (Evidence Act) and the OPEN Government Data Act also of 2019. The first act emphasizes collaboration and coordination to advance data and evidence-building functions in the Federal Government by statutorily mandating Federal evidence-building activities, open government data, and confidential information protection and statistical efficiency;[6] the second act requires federal agencies to publish their information online as open data, using standardized, machine-readable data formats, with their metadata included in the Data.gov catalog (Data Act).[7]

As a counterweight, following privacy and/or national security policies, there is another group of data that is not open or available to the public. Limitations to public data are found in the Health Insurance Portability and Accountability Act (HIPAA), the Personally Identifiable Information (PII), the Family Educational Rights and Privacy Act (FERPA), as well for information related to national security, especially if the data is military or intelligence-related.[8]

The real problem comes when excesive copyright or contractual restrictions apply to the data that is available to the public. Although works of the United States federal government generally do not have statutory copyright protection,[9] these works might be subject to protection if there are publicity or privacy rights involved, or if the works prepared for the government by independent contractors are copyright protected.[10] This copyright exception neither extends to works produced by subnational governments, such as states, cities, and other municipalities.[11]

Public Access Policy of agencies such as the National Institute of Health (NIH) explicitly manifests copyright restrictions over most of their data.[12] As found in their national library website, publishers or authors provide all of the material available from the PubMed Central (PMC) site and almost all of it is protected by U.S. and/or foreign copyright laws, even though PMC offers free access to it. There are some public domain materials. However, they may still contain photographs or illustrations copyrighted by other commercial organizations or individuals that may not be used without obtaining prior approval from the copyright holder.

Also, there is no explicit right or implied license for users to use this open data.[13] NIH content is available to be accessed, downloaded, and read. Still, transmission, reproduction, or re-use of protected material, beyond that allowed by the fair use section[14] in the copyright law, requires the written permission of the copyright holders. So, if any third party wants to make use of these copyrighted materials, such as universities, think tanks, research institutions, libraries or museums, they would need to review the materials with regards to the recent rulings of the Supreme Court and the Federal Courts to determine a possible fair use copyright defense/exception.

NIH PubMed also forbids the use of crawlers[15] or systematic downloading of articles that are available in their repositories, limiting most of text and data mining (TDM)[16] research activities.[17] Crawlers and other automated processes may not be used to systematically retrieve batches of articles from the PMC web site. Bulk downloading of materials from the main PMC website, in any way, is prohibited because of copyright restrictions. However, PMC has two auxiliary services that may be used for automated retrieval and downloading from the PMC archive, even though they only apply to a special subset of articles. These two services, the PMC OAI service and the PMC FTP service are the only services available for automated downloading of articles in PMC.

In short, even though the U.S. government has moved to a more open data shared policy, the practice still shows noticeable limitations for the re-use of this data by people different from the government. Third parties, such as university researchers, think tanks, libraries, archives, or museums, are only allowed to read and download but highly limited to reproduce, distribute or modify the content for new purposes. While making re-use of materials under the copyright fair use provision could be an option, the question remains on the probability of succeeding on a copyright claim in court. Also, if the intention is to engage in a text and data mining research project, the high amount of copyright, contract, or technical restrictions would prove too complicated to comply with the law when applying this methodology. Continuing in the path of open data shared policy, legislators will eventually find a balance between the copyright right holders and the access and re-use of the data produced by or on behalf of the government.



