Research topics

We carry out research on the theory, methods and applications of modern statistical data analysis. Our focus is on data-rich application fields including microbial ecology, functional genomics, and computational history. There is a great demand for targeted algorithmic methods for extracting information from vast data collections with minimal human intervention. By combining information across multiple, complementary sources it is possible to overcome some of the limitations and statistical uncertainties associated with individual data sets. Open source implementations turn the new models into accessible tools to guide modeling and experimentation.

Modern statistical data analysis Our research blends elements from multiple theoretical and methodological fields, including statistical machine learning/AI, probabilistic programming, numerical ecology, and data science.

Human microbiomics and statistical ecology Human gut microbiota constitutes one of the most densely populated ecosystem on our planet. The known pan-genome carries 2-3 orders of magnitude more unique genes than our own genome, it is highly dynamic in space and time, and a key contributor to immune system, digestion of food, and various health complications. Characterizing the overall structure, variability, and health associations of this virtual metabolic organ forms a major challenge for contemporary human biology. We have contributed to research on some of the largest existing population studies of this microbial ecosystem, mapping large-scale associations with diet, lifestyle, physiology, and well-being, developed the tipping elements concept, and investigated the mechanisms and dynamics of microbial community assembly.

Functional genomics Mapping of the three billion base-pair human genome sequence in 2001 was the first step towards uncovering the dynamic and contextual functional properties of the genome. Understanding functional organization of genetic information and its regulation through transcriptional, epigenetic, and other mechanisms remains a key challenge for human biology. We have done research in cancer studies, large gene expression databases, and multi-omics data integration.

Computational humanities We develop open algorithms and research tools for digital humanities as part of Helsinki Computational History (COMHIS) Group and as the founder of rOpenGov. Our current focus is in modeling the history of public discourse across several centuries following the emergence of print press in the 15th century. We also publish algorithms open government data, such as eurostat and manage the rOpenGov developer network, which has over 20 R packages and tens of thousands of annual downloads.

Open science We have actively supported Open Knowledge Finland Open Science work group, which received Open Science and Research award of the Ministry of Education and Culture in 2017. Some highlights include a Report on the openness of academic publishers, commissioned by Finnish Ministry of Education and Culture; opening of several agreements with academic publishers in 2017 and opening of scientific journal subscription costs in Finland 2010-2016 following our FOI requests; the Election Data Analytics project funded by Sitra (2012-2013). The rOpenGov network for open government data analytics (awards from Helsinki Region Infoshare, Sitra, and Apps4Finland).

We are grateful for support from our collaborators and research funders.