Tapio Salakoski, University of Turku

Machine learning and language technology for the bio-health domain

Machine learning algorithms enable the automatic construction of methods that would be difficult or laborious to program by human experts. Despite the recent success, there are enormous challenges in deploying these techniques into applications due to the excessive requirements of computational and memory resources. Novel algorithms are needed for training the learning methods and for evaluating the validity of the results. We have developed computationally efficient algorithms for ranking, preference learning, performance evaluation, and feature selection. Our algorithms have found their use in practical applications in bioinformatics and natural language processing.

BioNLP is a research field where natural language processing methods are developed for the bio-medical domain. We have developed a system for automated extraction of complex, recursively nested molecular biology events from scientific publications. The system makes use of modern machine learning algorithms to cope with the large amount of training data and the high-dimensional feature space. We have applied the system to 20 million sentences of text, resulting in 19 million events forming a massive event network that captures much of biomolecular knowledge.

We also study automated syntactic and semantic parsing of Finnish. We study the applicability of efficient, linear-time parsing algorithms fully reliant on machine learning techniques. In close collaboration with public and private partners along the health care value chain, we develop language technology for enhanced understanding and communication of health information such as in electronic patient records.

Tapio Salakoski is a Professor of Computer Science and Head of the Department of Information Technology at University of Turku, and Vice Director of Turku Centre for Computer Science. His research interests are in machine learning and natural language processing for bio-medical and health informatics.