Written by Mikala Narlock (she/her), Director of the Data Curation Network based at the University of Minnesota Libraries
In the ever-expanding digital landscape, the sheer volume of data generated daily is staggering. Yet, having vast amounts of data doesn’t equate to having valuable information. It’s a reason the FAIR Data Principles begin with F-Findability, and not just because it made for a catchy slogan: if data are not findable, the rest of the principles are moot. If we can’t discover data, we can’t access, use, reuse, or maximize the impact of datasets.
As data curators, there are a few things we are doing to ensure data shared is as discoverable as possible. First and foremost, we are checking to make sure the selected repository through which the dataset is being shared is ideal. While there are many factors that influence where a dataset is shared, and of course there are many repositories that accept datasets, it is important that data are shared in venues where other researchers will discover them. While there are indices that help highlight datasets– getting the data in the right spot to begin with can go a long way to ensuring the designated community can find the data.
Additionally, data curators are not only reviewing the dataset that has been submitted, to ensure it is well-documented internally (e.g., through a README file)– but also to ensure robust description through the external metadata. In particular, when we create persistent identifiers, such as digital object identifiers (DOIs), we are distributing that information to a registry, such as DataCite. In these registries, we reserve that globally unique identifier and also translate some of the metadata INTO the registry. By adding not just descriptive information about a dataset (e.g., title, abstract, keywords), but also unique identifiers for individuals (ORCIDs) and research organizations (Research Organization Registry (RORs)), can further maximize the discoverability of datasets.
The challenge of this is, at the moment, many repositories are building out the technical infrastructure to support this robust linked data that connects datasets, researchers, organizations, grants and funding agencies, etc. In many ways, we are building the track as the train is coming. We are expanding metadata and technologies to enable a more connected and Findable research ecosystem technically. But, we are also building and reinforcing good practices, and adding the correct information now (about DOIs, RORs, etc), such as in README files even if not in the repository itself, to make it easier in the future to surface this hidden information.
In this endeavor, there are ways researchers can help us in this Findability goal. Of course, actions like adding keywords to datasets (especially datasets that are not text-based!) can help maximize the findability of your datasets. Other ways you can help:
- Cite your dataset! When you are publishing data for a paper, be sure that it is included in both the data availability statement AND in your references list.
- Share your data in your professional circles – post on your LinkedIn or other social platform to amplify this in your community.
- Advocate for and recognize the value of data sharing: celebrate your colleagues’ efforts by sharing their data, vocalizing your support in promotion and tenure cases, etc.
The journey to making data truly findable is a collective endeavor that requires commitment from data curators, researchers, and institutions alike. As we continue to build and refine the infrastructure of scientific data sharing, we are not just improving technical systems, but fundamentally transforming how knowledge is discovered, shared, and utilized. Every DOI registered, every carefully crafted metadata description, and every intentional act of data sharing brings us closer to a more connected and accessible research ecosystem. By prioritizing findability, we ensure that the valuable insights hidden within our datasets can illuminate new paths of understanding, spark innovative research, and ultimately accelerate scientific progress for the global community.