Author name disambiguation for researcher profile matching: making Teseo and Scopus compatible

Despite significant efforts to improve their representation, women still face disparities within the research sector. A significant challenge in understanding the trajectory of young female researchers lies in accurately identifying them within datasets on researchers. Teseo, the largest virtual library of its kind in Spain, serves as a comprehensive database for Spanish university theses, serves as a pivotal starting point, allowing us to pinpoint female researchers initiating their careers in Spain. This master’s thesis aims to delve into the Spanish academic community, anchored by data from Teseo. We augment each data point in Teseo with additional details obtained from other public research databases such as Scopus (an abstract and citation database), ORCID (Open Researcher and Contributor ID), and DBLP (Digital Bibliography & Library Project). Our objective is to track and assess if each woman has persisted in her academic journey post-Ph.D. and remains an influential contributor to her research domain. Notably, Scopus stands as a tool to trace the careers of these researchers, shedding light on the challenges they encounter. This master’s thesis is an extension of a study previously conducted by my advisor, wherein gender categorization for each doctoral graduate was inferred from their names. Initially, this master’s thesis evaluates the viability of each public research database in enhancing the Teseo data. Subsequently, it pinpoints attributes instrumental to our investigation, facilitating meaningful correlations with Teseo data. Our end goal materializes with the launch of a website where users can search for additional information on doctoral graduates, enabling an assessment of their continued academic pursuits.

​Despite significant efforts to improve their representation, women still face disparities within the research sector. A significant challenge in understanding the trajectory of young female researchers lies in accurately identifying them within datasets on researchers. Teseo, the largest virtual library of its kind in Spain, serves as a comprehensive database for Spanish university theses, serves as a pivotal starting point, allowing us to pinpoint female researchers initiating their careers in Spain. This master’s thesis aims to delve into the Spanish academic community, anchored by data from Teseo. We augment each data point in Teseo with additional details obtained from other public research databases such as Scopus (an abstract and citation database), ORCID (Open Researcher and Contributor ID), and DBLP (Digital Bibliography & Library Project). Our objective is to track and assess if each woman has persisted in her academic journey post-Ph.D. and remains an influential contributor to her research domain. Notably, Scopus stands as a tool to trace the careers of these researchers, shedding light on the challenges they encounter. This master’s thesis is an extension of a study previously conducted by my advisor, wherein gender categorization for each doctoral graduate was inferred from their names. Initially, this master’s thesis evaluates the viability of each public research database in enhancing the Teseo data. Subsequently, it pinpoints attributes instrumental to our investigation, facilitating meaningful correlations with Teseo data. Our end goal materializes with the launch of a website where users can search for additional information on doctoral graduates, enabling an assessment of their continued academic pursuits. Read More