R Environment in the Analysis of Statistical Data

Course Leader: Dr Tomasz Szubert

Home Institution: WSB University

Course pre-requisite: Basics of Mathematics

Course Overview
The course has two main goals: to present how to work in the R program (an advanced but fully free tool for mathematical and statistical calculations) and to present the most important analytical techniques (starting from the description of one variable distribution, searching for the relationship between more variables, modeling of cause-and-effect relationships, creating dynamic forecasts, up to selected methods of multivariate analysis - i.e. cluster analysis or classification trees). After completing the course, the student will be able to perform professional statistical analyzes on various types of data: from simple series with one variable up to multidimensional tables and even spatial data.

Learning Outcomes
After completing the course the student will be able to:

- import data of various types (XLS, CVS, etc.) into the R program, transform variables (aggregate & recode them), create selection conditions, create own calculation functions, present data on charts, export results to present in other programs

- make a synthetic description of the analyzed variables (using measures of central tendency and variability, measures of asymmetry and concentration, elements of statistical inference)

- model causal relationships using linear and non-linear regression models and make predictions for a dependent variable based on them

- model the time series, by decomposing the series into the trend, cyclic, seasonal and random fluctuations and on their basis forecast future values of the analyzed variable

- classify data, i.e. using cluster analysis, classification trees or discriminant analysis

- present spatial data in the form of maps and cartograms

- create a dynamic application of the R-Shiny type, in which by adding buttons, sliders, drop-down lists etc. it will be able to present graphs in a professional manner and publish them on websites

Course Content

Basics of R 

program installation, license, help system, program modes, basic commands, data import and saving work results, installation of packages, basic calculations, graphics, control instructions, types and structures of data, an overview of R-environment applications in business process modelling

Structure analysis of data 

measures of central tendency and variability, measures of asymmetry and concentration, sample statistics as estimators of population parameters, confidence intervals for means and proportions, testing of statistical hypothesis

Regression and correlation

analysis of correlation and regression of two quantitative variables (Pearson correlation coefficient and linear regression), multi-variable linear correlation and regression, non-linear models

Forecasting in time series

methods of isolating linear and non-linear trends from time series, analysis of seasonal fluctuations in time series, moving average method, exponential smoothing methods

Spatial data analysis 

downloading maps and data attached to them in the form of SHP (shape of a map) and DBF (database) files, presentation of the intensity of individual variables for the studied areas, checking whether there are a spatial correlation and determination of spatial regimes: groups of objects with similar properties

R-Shiny application 

creating graphs, tables and dynamic reports by using format tools for data visualization (buttons, sliders, drop-down lists etc.), publishing the created application on websites

Instructional Method
Learning the subject will initially be based on the implementation of the basic R program codes (such basic procedures can be found on many websites, such as the official R-Cran website), then the students will learn how to modify these codes according to their needs and finally they will create procedures completely from start, without using any ready-made formulas. All necessary materials can be found in the Internet resources, but the lecturer will also provide his materials.

Students participating in the classes will have to design path of analysis carried out on their own data, other than shown in the classroom, but it is not a complicated challenge, because the advantage of the R program is to perform the analysis using procedures written in special scripts, therefore replacing the data set name, names of variables and necessary parameters, it will very quickly bring the expected results. The awarded points will depend on the level of complexity of a given method