Health: Learning from big data how life works
20 September 2018

Interview ERC grantee Natasa Przulj, Professor of Biomedical Data Science at UCL.
The European Conference on Computational Biology took place from September 9 to 12 in Athens. The meeting gathered more than a thousand participants, allowing researchers from a variety of backgrounds - geneticists, molecular biologists, biochemists, computer scientists, statisticians - to come together at one of the most interdisciplinary gatherings in the Life Sciences.

Cover image of Health: Learning from big data how life works

The ERC was there, with Scientific Council member Prof. Janet Thornton, Director Emeritus of EMBL-EBI, one of the largest computational biology institutions in the world. Amongst other things, we took this opportunity under the Greek sun, to talk to ERC grantee Natasa Przulj.

What is your research about?

We are living in the age of big data. We can gather enormous quantities of information in all domains, thanks to advances in biotechnology and in computer sciences. One of these domains is human health and biology. These big data cannot just be read and made sense of. We need to model them and compute on them to extract information to find the knowledge hidden in the complexity of this volume of information. After all, data isn't useful if it doesn't tell us something. And it's not only the size of the information, it's also the complexity of the computational problems. This goes back to the theory of computer science from the late 60s and 70s. We can mathematically prove that some problems are solvable; others are not - even given all the computer power in the world and all the time in the universe. This is why the only way to address them is to approximately solve them. What I do is design algorithms to mine data, with the ultimate goal of understanding how life works.

So what type of data do you focus on?

We focus on all types of data, that's the beauty of it. Every data type, whether it's genomic, proteomic, transcriptomic, phenomic, all types of omics, each one has its own specificity and complexity. Scientists have built methods to understand different datasets, and then each one is like a different pair of glasses through which we look at the same phenomenon. Some will give you the same information, some will give you a different story because they are measuring different things. But there is more. Understanding what each data type is telling us is only one part, another big issue is to fuse these datasets together to extract additional information and help us get the full picture.

And is this what you are doing now?

Exactly, we try to understand how to extract information at different scales - from DNA to protein, from organelles to cells all the way up to the whole organism. With my first ERC grant I focused on showing that the new knowledge we can extract from one type of data (protein-protein interaction networks) complements the information we can extract from another type of biological data (genetic sequence data). For that, my group designed algorithms, based on something I developed called a graphlet. Graphlets are building blocks that you can use to investigate network datasets. Now they are used in anything from biology to social science or economics, image analysis. With my second grant, on the other hand, we mine the complex interconnectedness of diverse types of data. We extract information from the hierarchical structure of life, from multiscale data, to fuse it all together in a mathematically principled way and apply it to biological and medical phenomena.

How important is interdisciplinarity in your field?

From different biotechnologies you get different knowledge that complements the knowledge you previously had, for example you gain different biomedical insight from new systems-level protein-protein interaction network data than from older data such as genetic sequences. And that was not obvious once upon a time. I remember when I first started working in the field, this was maybe 2003 or 2004, as a PhD student, I would do an analysis of protein interactions, I would submit that and I would get two types of comments. The first was "you're getting something that we know from sequencing, so your analysis is useless" and the second was "you're getting something that we do not get from sequencing, so your analysis is wrong". Look, this is neither useless not wrong, this is different and it complements what we already know. There was a big gap there and someone needed to show that different data types are complementary in telling you about life. It’s a really non-trivial problem, you need a solidly trained computer scientist and a good data analyst that can apply the methods to different datasets, the challenge is that you need really interdisciplinary people and these are hard to train and to get your hands on. The relationship with biologists has changes, now everyone understands that you need the whole community and you need computer sciences. This is why I really like coming to conferences such as ECCB, I think that in a good way they combine biology with modelling, mathematics, computer sciences.

Have you learnt some biology?

Yes, I had to! When I was in school I was very interested in biology, I liked mathematics and I always had the first commercial computers, but my passion was actually biology, I entered competitions and even wondered whether I should study biology once I got to university in Canada. Instead, I double-majored in mathematics and computer sciences. I got my masters in theoretical computer sciences, with a focus on graph theory, and then I was lucky I was at the University of Toronto, which is one of the best places to study computer sciences, but also where the first large scale omics network data were generated. I started auditing biology classes where I met new people and finally all the pieces that I love came together. It was at the right historical moment where that was possible.

Then what happened?

I moved back to Europe in 2009 after leaving in 1993. And I am still currently in the UK currently, at UCL. The ERC had a big impact in my decision to stay. Without it I probably would have gone back. When I first arrived, I had kept my previous position, of course you never shut a door without knowing if things are going to work out. But once I got my ERC, I knew I was solid on the ground in Europe and it enabled me to develop into one of the leaders of my field. I think the ERC is an amazing programme, without it Europe wouldn't be scientifically what it is right now. In Europe there are not enough grants for single PIs, so something like the ERC is necessary if we want to compete with the US and Canada, where a lot of career-oriented grants exist. Plus, the ERC really understands that this is not a business operation, you cannot streamline science. Allowing this freedom is the only way, you cannot do science any other way.

For more information on Prof. Przulj's work, read our ERC Story "Mining Big Data For Precious Medical Insight".

Project information

ICON-BIO
Integrated Connectedness for a New Representation of Biology
Researcher:
Nataša Pržulj
Host institution:
Barcelona Supercomputing Cente
,
Spain
Call details
ERC-2017-COG, PE6
ERC funding
2 000 000 €