• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Магистратура 2025/2026

Анализ неструктурированных данных

Когда читается: 2-й курс, 1, 2 модуль
Охват аудитории: для своего кампуса
Язык: английский

Course Syllabus

Abstract

This course focuses on applied methods and existing tools for information retrieval: web scrap-ing, data preprocessing, natural language processing. All methods considered in this course require basic knowledge of discrete mathematics and probabilistic theory. For instance, most NLP and IR methods use conditional probability. In this course, we show the implementation of contemporary approaches in existing software packages (preferably in the python frameworks), and demonstrate how these methods can be used for the solution of some real-world problems.
Learning Objectives

Learning Objectives

  • to show the implementation of contemporary approaches in existing software packages (preferably in the python frameworks), and demonstrate how these methods can be used for the solution of some real-world problems.
Expected Learning Outcomes

Expected Learning Outcomes

  • be able to criticize constructively and determine existing issues with applied nlp tasks
  • be able to get necessary data for research and applied projects
  • be able to perform basic ETL operations with datasets and unstructured data
  • have an understanding of the basic principles of information retrieval
  • have the skill to meaningfully develop an appropriate data analysis pipeline
  • have the skill to work unstructured text data
  • know advantages of existing natural language processing packages
  • know the basic principles behind the the existing deep learning approaches
Course Contents

Course Contents

  • IR tasks overview, Python dive in
  • Web information extraction
  • Text normalisation
  • Syntax parsing, fact extraction
  • Language modelling, text classification and clustering
  • Sentiment detection
  • Large Language Models
  • Machine translation, question answering
  • Summarization and Domain adaptation
  • Vector Databases. Semantic search and indexing
  • Additional topics and course projects defense
Assessment Elements

Assessment Elements

  • non-blocking Домашнее задание 1. Загрузка, очистка и предобработка данных
  • non-blocking Домашнее задание 2. Классификация и кластеризация
  • non-blocking Проект.
  • non-blocking Презентация проекта
  • non-blocking Доклад
  • non-blocking Финальный тест
Interim Assessment

Interim Assessment

  • 2025/2026 2nd module
    0.1 * Доклад + 0.2 * Домашнее задание 1. Загрузка, очистка и предобработка данных + 0.2 * Домашнее задание 2. Классификация и кластеризация + 0.1 * Презентация проекта + 0.2 * Проект. + 0.2 * Финальный тест
Bibliography

Bibliography

Recommended Core Bibliography

  • Shay Cohen. (2019). Bayesian Analysis in Natural Language Processing : Second Edition. San Rafael: Morgan & Claypool Publishers. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2102157

Recommended Additional Bibliography

  • Manning, C. D., & Schèutze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=24399

Authors

  • Pastukhova Anna Vladimirovna
  • PARINOV ANDREY ANDREEVICH
  • Pavlova Irina Anatolevna