New tool looks to speed up research process for UW biologists

Database includes 30 million abstracts and papers, which would be virtually impossible for researchers to go through individually

April 25, 2017

A new University of Wisconsin-developed algorithm looks to help smooth and shorten the research process for biologists and, potentially, researchers across the board.

Called “KinderMiner,” the algorithm holds millions of pieces of archived research and can scan online archives within hours. While it currently caters to biologists, it could be adapted to different fields in the future.

Ron Stewart, project supervisor and UW associate director of bioinformatics, came up with the idea and wrote the original algorithm. He said he is devoted to the project and hopes for a positive outcome.

Advertisements

Then his colleagues improved the code and statistical calculations, making it more than 10 times faster and much more robust. They also compared “KinderMining” to a drug repurposing algorithm Stewart and his colleagues had written, he said.

Stewart is hoping KinderMiner will make potential Google searches easier for biologists by narrowing content. The content in the archives come from a database called EuropePMC, which comprises roughly 30 million different documents for the researchers. These range from biomedical papers to abstracts.

KinderMiner can be compared to the PubMed biomedical literature database, which is also found through EuropePMC, Stewart said.

Stewart said KinderMiner counts the co-occurrence of a key phrase that the biologist or user is interested in such as “cardiomyocyte” and some target term, which is usually a gene name or the name of a drug or compound. It “reads” through all 30 million abstracts and papers for each combination of the key phrase and each target term in the target term list, which amounts to 600 billion papers and abstracts in total.

UW scientists research role of gene-editing in curing inherited diseases

With a new article added to the database in EuropePMC every minute, it is virtually impossible to sort through all the documents. This is similar to the problem of overwhelming and nonrelevant data scientists were running into with Google. Stewart said an algorithm like KinderMiner is the only realistic option.

Stewart said he viewed the algorithm as a relatively simple concept, therefore naming it “KinderMiner.” The algorithm is simple in context, where it basically sorts through key phrases using statistical data to order searched drugs, compounds and genes for the finding.

“We have shown for some historical examples that KinderMiner ranks genes near the top of the list that were later shown to be critical for certain processes such as reprogramming cells to become cardiomyocytes [or heart muscle cells] or to become induced pluripotent stem cells,” Stewart said.

Stewart said the algorithm targets very specific topics the biologists or users are curious about. He highlighted that KinderMiner easily speeds up research for scientists and can give them information that may not be published until five years later.

Million-dollar initiative looks to help revolutionize microbiome research at UW

Stewart said the algorithm is recently developed and can easily assist researchers and biologists who are looking for specific information. He said he looks forward to what is to come in the biology community with KinderMiner taking effect.

“There is insight that leads to the positive effects KinderMiner could have within the biology community,” Stewart said.