2025/2026





Machine Learning in Bioinformatics
Type:
Mago-Lego
Delivered by:
Big Data and Information Retrieval School
Where:
Faculty of Computer Science
When:
1, 2 module
Online hours:
14
Open to:
students of one campus
Instructors:
Maria Poptsova
Language:
English
Contact hours:
56
Course Syllabus
Abstract
The course "Introduction to Machine Learning for Bioinformatics" introduces students to the theory and practice of using machine learning algorithms to solve problems in this field. The main goal is to provide students with a comprehensive understanding of modern methods of data analysis and the construction of predictive models. During the course, students will learn the key stages of working with data: from preprocessing and dimensionality reduction methods to techniques for building, optimizing, and validating models. The course program covers a wide range of algorithms, including linear regression with regularization (ridge regression, lasso, elastic network), support vector machine (SVM), neural networks, k-nearest neighbor (k-NN) method, classification and regression trees, as well as ensemble methods such as random forest and gradient boosting. Special attention is paid to practical work: seminars are aimed at developing skills in working with specialized software tools and libraries for predictive modeling. The classes will cover a variety of real-world cases and applied problems based on datasets from the field of bioinformatics.
Learning Objectives
- Master the theory, process, and components of machine learning implementation.
- Learn to distinguish between types of predictive models and to know the key stages of their creation, such as data preprocessing, model construction and performance evaluation.
- Test various practical applications of predictive modeling using machine learning algorithms for databases in the field of molecular biology.
- Learn how to use functions from various Python libraries to apply different types of models: linear and nonlinear regression and classification models, decision trees, and rule-based models.
- Perform input data preprocessing using Python: calculate statistics, evaluate skewness, apply appropriate transformations, perform principal component analysis (PCA), find correlations between predictors, and create dummy variables.
- Apply Python functions to measure the importance of predictors and model performance, use feature filtering methods, and estimate prediction error.
- Comprehensively apply the acquired knowledge and predictive analytics tools to solve applied problems in the field of bioinformatics.
Expected Learning Outcomes
- Know the basic concepts and paradigms of machine learning, the differences between ML and traditional programming.
- Possess data preprocessing skills: skip handling, normalization, and outlier detection.
- Know the basic algorithms of regression, classification, clustering and methods of their evaluation.
- Be able to interpret key metrics of model quality (MSE, R2, precision, recall, F1, ROC AUC).
- Master the methods of combating overfitting: regularization, cross-validation, pruning and ensembling.
- Know the basics of decision trees, ensembles (Random Forest, Gradient Boosting) and the support vector method.
- Be able to apply dimensionality reduction techniques (PCA, t-SNE, UMAP) to visualize biomedical data.
- Possess the skills of critical analysis of the results of ML models in medicine, understand the limitations and risks.
- Know the basics of interpreted AI, the principles of data privacy and federated learning.
Course Contents
- Thinking in the ML paradigm and the anatomy of the project
- Data is the raw material for intelligence
- Regression: forecasting continuous quantities
- Classification I: basic methods
- Evaluating models and combating overfitting
- Decision Trees: the path to interpretability
- Ensembles of models: the wisdom of the crowd
- Support Vector Machine (SVM): gap maximization
- Dimensionality reduction: visualization of the invisible
- Unsupervised Learning: Clustering for Discoveries
- Introduction to Neural Networks and Deep Learning
- Trust and privacy in biomedical AI
- Synthesis of knowledge: analysis of a landmark article
- Course review and exam preparation
Assessment Elements
- HomeworkThe work is a set of tasks in the Jupyter Notebook format aimed at developing a competent approach to data analysis and preparation (in particular, biomedical), as well as teaching and interpreting machine learning algorithms. The assignments include open-ended answers to theoretical questions and elements of ML pipeline programming using Python and related libraries.
- Complicated HT/Answers to questions during lecturesEach homework assignment will include a pool of complicated homework assignments. Within the framework of each lecture, a pool of questions based on the materials of this lecture will be formed. Additional questions may arise during the discussion, which can also be evaluated by the lecturer.
- Writing exam
Interim Assessment
- 2025/2026 2nd module0.2 * Complicated HT/Answers to questions during lectures + 0.5 * Homework + 0.3 * Writing exam
Bibliography
Recommended Core Bibliography
- Aurélien Géron. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow : Concepts, Tools, and Techniques to Build Intelligent Systems: Vol. Second edition. O’Reilly Media.
- Machine learning : a probabilistic perspective, Murphy, K. P., 2012
Recommended Additional Bibliography
- Machine learning : beginner's guide to machine learning, data mining, big data, artificial intelligence and neural networks, Trinity, L., 2019