Course Leader: Dr Vitalii Naumov
Home Institution: Cracow university of Technology, Poland
Course pre-requisites: Basics of Calculus and Probability Theory, basic programming skills are desirable but not mandatory.
Nowadays, in the age of information, understanding of data analysis techniques is the must skill almost in every field – from medicine and fine arts to transport and finance business. The course is devoted to persons who want to obtain essential skills in data analysis and statistics, and in this way, become the demanded professional.
During the course, students will become acquainted with the theoretical basis of data science – statistical analysis. We are going to start with the description of a random variable, its distribution functions, and numeric characteristics. Then I will present the basics of distribution fitting and more advanced techniques of mathematical statistics – correlation and regression analysis.
All the presented methods and techniques will be supported by the respective tools in Python programming language. Students will learn basics of Python and also will get acquainted with the most popular tools for data analysis: pandas, numpy, matplotlib and scikit-learn libraries.
By the end of the course, students will be able to use Python language and functionality of its libraries to perform basic operations of data processing. They will be proficient in statistical inference, including distribution fitting, correlation, and regression analysis. Students will have basic skills in data visualization with the use of Python libraries
1. Basics of Python: data types, conditions, loops, and functions
2. Random variable: distribution functions, numeric characteristics
3. Numpy library: the most important functions for data processing and analysis
4. Basic distributions of random variables: discrete and continuous distributions
5. Data visualization in Python: basic tools of the matplotlib library
6. The Central Limit Theorem: definition and examples using Python
7. Confidence intervals: construction of intervals by using simulations in Python
8. Using Python for distribution fitting: Pearson’s chi-squared test and Kolmogorov-Smirnov test
9. Correlation analysis: Pearson’s product-moment coefficient, rank correlation coefficients, and correlation matrices
10. The multiple regression model: estimation of the regression coefficients and significance tests
Lectures, class discussions, case examples, videos
Required Course Materials
All the required materials will be provided by the instructor during the course.
Recommended additional reading:
Downey, A.B. Think Python: How to Think Like a Computer Scientist, O'Reilly, 2015
Madsen, B.S. Statistics for Non-Statisticians, Springer, 2016
Nawidi, W. Statistics for Engineers and Scientists, McGrow Hill, 2004
The final grade will be calculated on the grounds of two tests (midterm and final) and the project developed during the course. Tests will contribute 80% to the final result, and the project will give 20% respectively