Course Leader: Vitalii Naumov
Home Institution: Cracow University of Technology
Course pre-requisites: basics of Calculus and Probability Theory, basic programming skills are desirable but not mandatory
Course Overview
In the article “Data Scientist: The Sexiest Job of the 21st Century” published by Harvard Business Review in October 2012, T.H. Davenport and D.J. Patil have made a prediction that data scientists would become the most demanded specialists in every market due to the development of communication and information technologies. This trend remains the same nowadays, and despite numerous courses and specializations which have been started in universities and in the net, data science professionals are still the most needed specialists in every area.
The course is devoted to persons who want to obtain essential skills in data analysis, and in this way, to catch a wave, and become the demanded professional.
During the course, students will become acquainted with the theoretical basis of data science – statistical analysis. We’re going to begin with the description of a random variable, its distribution functions, and numeric characteristics. Then I will present the basics of distribution fitting and more advanced techniques of mathematical statistics – correlation and regression analysis.
All the presented methods and techniques will be supported by the respective tools in the Python programming language. Students will learn the basics of Python and also will get acquainted with the most popular tools for data analysis: pandas, NumPy, matplotlib, and scikit-learn libraries.
In the last part of the course, I will present essential machine learning tools – simple classifiers and neural networks. Implementation of these tools and their features will be explained with the help of examples in Python.
Learning Outcomes
By the end of the course, students will be able to use Python language and the functionality of its libraries in order to perform basic operations of data processing. They will be proficient in statistical inference, including distribution fitting, correlation, and regression analysis. Students will have basic skills in data visualization with the use of Python libraries.
Course Content
Instructional Method
During the course, we will have lectures and individual projects in 50/50 proportion of time
Required Course Materials
All the required materials will be provided by the instructor during the course.
Recommended additional reading:
Madsen, B.S. Statistics for Non-Statisticians, Springer, 2016
Downey, A.B. Think Python: How to Think Like a Computer Scientist, O'Reilly, 2015
Raschka, S., Mirjalili, V. Python Machine Learning, Packt, 2017
Assessment
The final grade will be calculated on the grounds of two tests (midterm and final) and the project developed during the course. Tests will contribute 80% to the final result, and the project will give 20% respectively