Software development for data processing in science has become so highly complex that it can only be mastered through the transparent teamwork of experts across borders and institutions. Johannes Rainer, a bioinformatician at the Institute of Biomedicine, takes up the mantle for/ shares his thoughts on the relevance of Open Science in his interview. His efforts in the field, earnt him the Open Research Award 2021.
As a bioinformatician at Eurac Research, you head the Computational Metabolomics research group, working with software that analyzes metabolic data. What exactly is that all about?
Johannes Rainer: In principle, everything we take in as food is metabolized in our cells, broken down, converted and built up into new products. These metabolic products - glucose, fructose, creatinine, amino acids, and others can be detected in blood serum samples. Why is this exciting? Because all metabolic products allow conclusions to be drawn about the subject's state of health. A high glucose value, for example, can indicate diabetes, a high creatinine value can indicate kidney disease, and so on. But we don't just measure these four metabolites, but thousands of them at the same time.
So, this is where your software comes into play.
Rainer: Right, the amount of data in metabolomics, as well as in other sciences, is getting bigger and bigger, and to be able to process it efficiently you need appropriate software. A commonly used program for metabolomics analyses is available as open-source software. The only problem was that I was not able to analyze our large data sets with the version I had at the time. So, without further ado, I contacted the developers and asked if I could customize the software. The great thing was, without us knowing each other, they invited me to work together on the further development of this software.
And the outcome?
Rainer: In recent years, we have rewritten the software so that the data can be processed in chunks, whereby only the data that needs to be used is loaded into the working memory. Very large data sets can now be calculated on conventional computers. This enables us to process the metabolomics data sets from the CHRIS study with its 7000 study participants. That was a breakthrough in relation to what the software was previously capable of.
"Open Science includes also Open Data, which is extremely important. Because only when I also have access to the data, does science become comprehensible and thus transparent."
Is the software also used by other research institutions?
Rainer: Yes. The software is part of the Bioconductor project, which provides open-source software for the analysis of biological data and is used in many research institutes worldwide. This has also brought us into contact with other scientists working on the analysis of mass spectrometry data and has enabled us to initiate some important collaborations with the Helmholz Center Munich and the University of California San Diego among others. We have also made an international name for ourselves thanks to the "R for Mass Spectrography" initiative, which I co-founded. In it, we develop various software packages that provide tools for researchers to process huge data sets more easily and efficiently.
Software development costs a lot of time and money. Why the effort? Couldn't you just use commercial tools instead of reinventing the wheel?
Rainer: There are enough companies offering commercial software. But it is mostly a black box. I put data in, get something out, but have no idea how the data is processed in there, and I have no chance to customize the software or the algorithms to my needs and my data. The good thing about open-source software is that experts with different backgrounds work together. We can also exchange ideas with researchers from other disciplines, for example genomics and transcriptomics, which also generate enormous amounts of data when sequencing genetic material, in order to find ways of dealing with such data together. And the goal of open-source software is precisely not to always have to reinvent the wheel. I can look at the source code of other developers, and if it fits, I can continue to use it, or even better, I can also change it and adapt it to my needs without having to write everything from scratch.
Do you see any dangers, in Open Science or Open Software, or should it always to be welcomed in principle?
Rainer: I don't see any real danger that our open software will be misused or abused at this point in time. If malicious codes were inserted, they would be quickly identified. In principle, Open Science includes not only Open Software but also Open Data, which is also extremely important. Because only when I also have access to the data, does science become comprehensible and thus transparent. In addition, I can also use Open Data as test data to see whether my software really does what I hope it will. So yes, I believe in Open Science. It's nice when I meet like-minded people in the Open Science Metabolomics community who prefer collaborative thinking to competitive thinking. Our discipline is a very young one - just 20 years old. It remains to be seen what Open Science is capable of. I am an optimist.
The Winners of the Eurac Research Open Research Award 2021
The main two Open Research Awards go to:
The Group “Language Technologies (LT)” at the Institute of Applied Linguistics whose purview stretches across disciplines, languages and communities and manifests itself in the active participation and coordination of initiatives designed to bring people together, invite them to join in the research and shape best practice. (read the interview)
Johannes Rainer, leader of the Team “Computational Metabolomics” at the Institute for Biomedicine, who has established successful tools and practices for open, collaborative, and reproducible research and whose engagement in a community approach to problem solving are influencing the general attitude of data scientists at the Institute and beyond in the vast R and Bioconductor communities.
The two Awards for Early Career go to:
Alberto Scotti, Institute of Alpine Environment, whose research on aquatic insects as sentinels of environmental changes has been done following the ideal of the open research culture and the aim of sharing every research output. (read the interview)
Giulio Genova, Institute for Alpine Environment, and Mattia Rossi, Institute for Earth Observation, who have collaboratively developed open source tools that help and enable not only researchers but also users with minimal programming skills to access and analyze meteorological and environmental data easily and efficiently. (read the article)