Skip to Main Content

Publishing

Guide to best practices for publishing and resources for publishing different types of research outputs.

Publishing datasets

When researchers talk about "sharing" datasets, they generally mean publishing the data so anyone can access it - not just sharing access to the data when asked. There are many reasons to make a dataset publicly available, such as:

  • Making data available for secondary analysis
  • Sharing data that may not lead to a publication, but could still be useful
  • Encouraging peer review of data analysis
  • Complying with data sharing mandates from a funder or publisher

Important factors to consider when choosing a repository are security (will the data become corrupted or be manipulated?), length of access (will my data suddenly disappear?), and access (will researchers be able to find my dataset?). Alternatively, the Eunice Kennedy Shriver National Institute of Child Health & Human Development (NICHD), Federally Funded Research & Development Centers, and MIT Lincoln Laboratory have created a Data Repository Finder. The NIH also has further guidelines on selecting repositories for data.

Data repositories

  • re3data : Registry of Research Data Repositories. Searchable interface but can also browse by subject, content type and country.
  • FAIRsharing.org : The collections link can be used to find both domain and generalist repositories as well as those recommended by specific journals, funders, and organizations.
  • Repository Finder : This resource can limit to the repositories in re3data that meet the criteria of the Enabling FAIR Data Project and FAIRsFAIR Project.
  • Data Portals : A browsable list of open data portals from around the world (not searchable).
  • DRYAD : Nonprofit data repository. Dryad has a team of curators who check every submission to ensure the validity of files and metadata. A data publishing charge of $120 may apply (additional fees may apply to submissions in excess of 50GB). There is a limit of 300GB per data publication uploaded through the web interface (larger submissions are accepted but require technical assistance).
  • figshare : Free account allows upload of files up to 5GB space and 20 GB of free private space. Blinded links can be created for peer review.
  • Harvard Dataverse : All researchers from any discipline, both inside and out of the Harvard Community, can deposit files of up to 2.5GB, and store up to 1TB of data.
  • Mendeley Data : Posted datasets are currently moderated to ensure the content constitutes research data, is scientific in nature, and doesn’t solely contain a previously published research article. Personal accounts have a maximum limit of 10 GB per dataset.
  • Zenodo : Developed by CERN under the EU FP7 project OpenAIREplus. Currently accept up to 50GB per dataset and users may deposit restricted files with the ability to share access with others if certain requirements are met.
  • Open NIH-supported domain-specific data repositories : Repositories in this list have current NIH funding, sustained support, open data submission and access, and open time frame for data deposit.
  • Other NIH-supported domain-specific resources : This list includes repositories that restrict data submission to a specific set of researchers, as well as those that limit who may access the data.