Бакалавриат
2024/2025





Анализ данных в Python
Статус:
Курс обязательный (Международная программа «Международные отношения и глобальные исследования»)
Направление:
41.03.05. Международные отношения
Кто читает:
Факультет мировой экономики и мировой политики
Где читается:
Факультет мировой экономики и мировой политики
Когда читается:
3-й курс, 4 модуль
Формат изучения:
с онлайн-курсом
Онлайн-часы:
20
Охват аудитории:
для своего кампуса
Язык:
английский
Кредиты:
3
Course Syllabus
Abstract
The course is aimed to introduce data analysis using Python. The first part of the course is dedicated to the basics of Python where the topics related to the basics of this programming language are covered. The second part of the course introduces the work with real-life data within social sciences and international relations. The course is specifically designed for people with no prior experience in programming.
Learning Objectives
- To provide a hands-on introduction to Python and its basic applications in the field of data science.
Expected Learning Outcomes
- Basic knowledge about the field of data science.
- Select research question for the group project
- Skill of applying hypotheses testing and statistical inference.
- Skill of applying principles of tidy data.
- Skill of computing descriptive statistics
- Skill of using NumPy, SciPy, Jupyter notebooks.
- Skill of visualizing data in Python.
Course Contents
- Introduction to the field of data science. Examples of data science approaches applied in economics and political science. Course information on grading, prerequisites and expectations.
- Introduction to Python. Review of the environment setup process. Anaconda IDE. NumPy, SciPy, Jupyter notebooks.
- Importing data to Python. Various data sources: text files, web, APIs. Raw and processed data. Working with dates. Pandas library. Merging DataFrames. The principles of tidy data. Data Quality: inaccurate data; sparse data; missing data; insufficient data; imbalanced data.
- Descriptive statistics. Measures of location: mean, median, mode. Measures of spread: standard deviation, interquartile range, range. Percentiles. Robust statistics. Data transformations.
- Visualizing data in Python. Matplotlib library. Scatter plot. Line chart. Histogram. Bar chart. Categorical, times series and statistical data graphics.
- Interactive plots in Python. Introduction to plotly. Finding suitable representation of the data.
- Basics of probability theory. Distributions, sampling, t-tests. Introductory hypotheses testing and statistical inference.
- Introduction to linear regression. Estimation techniques. Evaluating the quality of the regression model. Model interpretation.
- Drawbacks of the linear regression approach. Stability of the coefficients across different parts of the dataset. Rolling estimations.
- Overfitting. Occam’s Razor principle. In-sample and out-of-sample model evaluation. Measuring predictive accuracy of the model.
Assessment Elements
- QuizzesThere will be short in-class quizzes distributed throughout the course conducted at Smart LMS with the help of Safe Exam Browser. Each quiz will take 5-10 minutes and will cover the material of the previous week(s). Question types might be a multiple-choice or a short answer.
- In-class AssignmentsThere will be in-class assignments with practical data analysis tasks using Python. Solutions will be submitted via Smart LMS platform and graded automatically with the help of Safe Exam Browser.
- Seminar ParticipationTo get full mark for the participation, a student needs to actively participate in the class discussions, to demonstrate familiarity with assigned readings and lecture material, to comment on a home assignment, including being prepared to answer the questions that the instructor may pose.
- Midterm TestThe midterm test will be conducted via Smart LMS platform with the help of Safe Exam Browser. The test will consist of multiple-choice questions and practical data analysis problems. A Mock Test will be provided in advance. The grade for the test is from 0 to 10.
- ExamThe exam is conducted in the format of oral survey. There will be no time for preparation, a student will answer the questions from the start. Topics from Independent Exam on Data Analysis will be included into the oral exam. Data analysis. Data processing: data aggregation, filtering, creating new variables, working with summary tables. Types of variables. Measures of central tendency and dispersion. Detecting outliers. Dealing with outliers. Visualization. Detecting missing values. Handling missing values: removal and replacement strategies. Statistics. Data collection. Sampling and general population. Sample Representativeness. Frequency tables and distributions. Continuous distributions. Density function. Pearson correlation. Hypothesis testing. Statistical significance. p-value. Type I and type II errors. Confidence intervals based on Z- and t-distributions for the mean and fraction. Z-test and t-test for one sample and for two independent samples with the same variance. Pearson's chi-squared test. Machine learning. The definition of a machine learning problem. Types of machine learning tasks. k-nearest neighbors algorithm. Linear regression. Interpreting linear regression coefficients. Evaluation Metrics: MSE, MAE, R2. Logistic regression. ROC curve.
Interim Assessment
- 2024/2025 4th modulemin(0.26 * Seminar Participation + 0.12 * In-class Assignments + 0.12 * Quizzes + 0.2 * Midterm Test + 0.3 * Exam, 8). Remark: In accordance with the Regulations for Interim and Ongoing Assessments of Students at HSE University, grades awarded on the basis of interim assessment outcomes of the discipline-prerequisites for the independent exam on digital competency may not exceed 8 points.
Bibliography
Recommended Core Bibliography
- An introduction to data analysis with 'R’ ; Introduction à l’analyse de données avec le logiciel R. (2019). Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.BE2A1501
Recommended Additional Bibliography
- Luke, D. A. . V. (DE-588)130032344, (DE-627)488060184, (DE-576)297960504, aut. (2015). A user’s guide to network analysis in R Douglas A. Luke. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edswao&AN=edswao.454121474
- Text analysis in R. (2017). Communication Methods and Measures, 11(4), 245–265. https://doi.org/10.1080/19312458.2017.1387238