Machine Learning in Bioinformatics

2025/2026

Type: Mago-Lego

Delivered by: Big Data and Information Retrieval School

Where: Faculty of Computer Science

When: 1, 2 module

Online hours: 14

Open to: students of one campus

Instructors: Maria Poptsova

Language: English

Contact hours: 56

Full Syllabus Ask Question

Abstract

The course "Introduction to Machine Learning for Bioinformatics" introduces students to the theory and practice of using machine learning algorithms to solve problems in this field. The main goal is to provide students with a comprehensive understanding of modern methods of data analysis and the construction of predictive models. During the course, students will learn the key stages of working with data: from preprocessing and dimensionality reduction methods to techniques for building, optimizing, and validating models. The course program covers a wide range of algorithms, including linear regression with regularization (ridge regression, lasso, elastic network), support vector machine (SVM), neural networks, k-nearest neighbor (k-NN) method, classification and regression trees, as well as ensemble methods such as random forest and gradient boosting. Special attention is paid to practical work: seminars are aimed at developing skills in working with specialized software tools and libraries for predictive modeling. The classes will cover a variety of real-world cases and applied problems based on datasets from the field of bioinformatics.

Learning Objectives

Master the theory, process, and components of machine learning implementation.
Learn to distinguish between types of predictive models and to know the key stages of their creation, such as data preprocessing, model construction and performance evaluation.
Test various practical applications of predictive modeling using machine learning algorithms for databases in the field of molecular biology.
Learn how to use functions from various Python libraries to apply different types of models: linear and nonlinear regression and classification models, decision trees, and rule-based models.
Perform input data preprocessing using Python: calculate statistics, evaluate skewness, apply appropriate transformations, perform principal component analysis (PCA), find correlations between predictors, and create dummy variables.
Apply Python functions to measure the importance of predictors and model performance, use feature filtering methods, and estimate prediction error.
Comprehensively apply the acquired knowledge and predictive analytics tools to solve applied problems in the field of bioinformatics.

Expected Learning Outcomes

Know the basic concepts and paradigms of machine learning, the differences between ML and traditional programming.
Possess data preprocessing skills: skip handling, normalization, and outlier detection.
Know the basic algorithms of regression, classification, clustering and methods of their evaluation.
Be able to interpret key metrics of model quality (MSE, R2, precision, recall, F1, ROC AUC).
Master the methods of combating overfitting: regularization, cross-validation, pruning and ensembling.
Know the basics of decision trees, ensembles (Random Forest, Gradient Boosting) and the support vector method.
Be able to apply dimensionality reduction techniques (PCA, t-SNE, UMAP) to visualize biomedical data.
Possess the skills of critical analysis of the results of ML models in medicine, understand the limitations and risks.
Know the basics of interpreted AI, the principles of data privacy and federated learning.

Course Contents

Thinking in the ML paradigm and the anatomy of the project
Data is the raw material for intelligence
Regression: forecasting continuous quantities
Classification I: basic methods
Evaluating models and combating overfitting
Decision Trees: the path to interpretability
Ensembles of models: the wisdom of the crowd
Support Vector Machine (SVM): gap maximization
Dimensionality reduction: visualization of the invisible
Unsupervised Learning: Clustering for Discoveries
Introduction to Neural Networks and Deep Learning
Trust and privacy in biomedical AI
Synthesis of knowledge: analysis of a landmark article
Course review and exam preparation

Assessment Elements

Homework
The work is a set of tasks in the Jupyter Notebook format aimed at developing a competent approach to data analysis and preparation (in particular, biomedical), as well as teaching and interpreting machine learning algorithms. The assignments include open-ended answers to theoretical questions and elements of ML pipeline programming using Python and related libraries.
Complicated HT/Answers to questions during lectures
Each homework assignment will include a pool of complicated homework assignments. Within the framework of each lecture, a pool of questions based on the materials of this lecture will be formed. Additional questions may arise during the discussion, which can also be evaluated by the lecturer.
Writing exam

Interim Assessment

2025/2026 2nd module
0.2 * Complicated HT/Answers to questions during lectures + 0.5 * Homework + 0.3 * Writing exam

Bibliography

Recommended Core Bibliography

Aurélien Géron. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow : Concepts, Tools, and Techniques to Build Intelligent Systems: Vol. Second edition. O’Reilly Media.
Machine learning : a probabilistic perspective, Murphy, K. P., 2012

Recommended Additional Bibliography

Machine learning : beginner's guide to machine learning, data mining, big data, artificial intelligence and neural networks, Trinity, L., 2019

Authors

Emasheva Valeriia Anatolevna

Course Syllabus