Exploring the relevance of ORCID as a source of study of data sharing activities at the individual-level: a methodological discussion
ORCID is a scientific infrastructure created to solve the problem of author name ambiguity. Over the years ORCID has also become a useful source for studying academic activities reported by researchers. Our objective in this research was to use ORCID to analyze one of these research activities: the publication of datasets. We illustrate how the identification of datasets that shared in researchers' ORCID profiles enables the study of the characteristics of the researchers who have produced them. To explore the relevance of ORCID to study data sharing practices we obtained all ORCID profiles reporting at least one dataset in their "works" list, together with information related to the individual researchers producing the datasets. The retrieved data was organized and analyzed in a SQL database hosted at CWTS. Our results indicate that DataCite is by far the most important data source for providing information about datasets recorded in ORCID. There is also a substantial overlap between DataCite records with other repositories (Figshare, Dryad, and Zenodo). The analysis of the distribution of researchers producing datasets shows that the top six countries with more data producers, also have a relatively higher percentage of people who have produced datasets out of total researchers with datasets than researchers in the total ORCID. By disciplines, researchers that belong to the areas of Natural Sciences and Medicine and Life Sciences are those with the largest amount of reported datasets. Finally, we observed that researchers who have started their PhD around 2015 published their first dataset earlier that those researchers that started their PhD before. The work concludes with some reflections of the possibilities of ORCID as a relevant source for research on data sharing practices.
READ FULL TEXT