Cold/Cozy Mice - Finding the needles in the haystack of biomedical literature
datasetposted on 19.07.2019 by Helena Deus
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Problem Statement: the task facing biomedical scientists hoping to find publications that corroborate or debunk a hypothesis is akin to finding a needle in a haystack that keeps growing. Strategies that mine or summarize the scientific literature exist but have been largely focused on recovery of named entities (e.g. proteins, cells) or more sophisticated methods that make use of ontologies to recover also related terms and even, more recently, machine learning methods when there is sufficient training data. Our Approach: we describe a use case faced by a biomedical scientist who needs to compare tumor volume/weight results in papers describing mice experiments where mice were exposed to the same or similar compounds but housed in different temperatures. In our approach, we have extracted annotations of units and measures (U&M) in scientific literature, which we then used in combination with contextual information (e.g. section of the paper) and regular expressions to identify the specific entity being measured (e.g. Housing Temperature). Results and Discussion: from a corpus of ~1.1M open access publications we found 299 relevant papers using the U&M approach combined with its surrounding contextual information. This large drop in the number of papers can be explained by our restrictive search criteria which included looking for keywords, patterns and temperature annotations in specific sections of the paper. We found a clear prevalence of papers mentioning housing conditions in the range of 20-25°C, which is the approximate temperature range suggested by NIH guidelines. We also found a small increase in the number of paper describing mouse thermo-neutral housing conditions in the period after the observation that this variable has an impact in mice tumor growth (2014-2016). This dataset contains those results.