Data Analytics for Information Systems (DAIS)

This course provides a hands-on introduction to master the essentials of data analytics and machine learning using R.

The growing ubiquity of information systems both in organizational and private consumer contexts increasingly makes large data streams available in various domains. As part of the digital transformation, knowing how to handle these data sets, how to analyze and to interpret them, becomes a more and more important skillset in companies, policymaking and in academic research.

The course builds on real-word data sets from information systems in the realm of consumer behavior, in particular in the resource consumption context. Based on hands-on examples and practical challenges, we cover fundamental data analytics methods using the software environment R.

The course starts with basic concepts from descriptive and inferential statistics that will be needed in the following course units, followed by an introduction to the statistics software R and R Studio. Students will be introduced to experimental design to distinguish between correlation and causation and to critically evaluate the validity and reliability of results. In the following, a large share of the course is dedicated to regression analysis, clustering, and different classification techniques. Students will apply these methods to data sets from concrete real-world challenges. The course closes with a discussion of relevant privacy regulations and also highlights social concerns and ethical aspects.

In the second half of the semester, students have the possibility to earn bonus points in a course project (self-study), by applying the skills and methods covered in the lecture and exercise sessions in the analysis of a large real-world dataset.

In this course, students will acquire

an introduction (or refresher) to fundamental concepts in statistics needed for various quantitative methods in data analytics
skills to design and use information systems to collect behavioral data
skills to formulate hypotheses and to perform and explain the corresponding statistical tests
skills to formulate, solve, and interpret linear and logistic regression analyses
skills to conduct clustering analyses
skills to set up, train, and evaluate machine learning algorithms, including K-means, regression, and support vector machines
programming skills in the statistics software R that allow you to efficiently perform the related tasks
a solid understanding of the ethical issues when dealing with personal data and of the privacy regulations to follow

Recommended prerequisites	An introductory part that covers essential concepts from statistics and an introduction to R is part of the course. However, a basic level of familiarity with some programming languages prior to the course is strongly recommended.
Time and room	tbd
Method of examination	Written examination (90 minutes)
Grading procedure	Written examination (100 %) – Bonus points can be acquired in a project in the second half of the semester. Students who pass the exam may increase their exam grade by up to 0.7 with the project.
Module frequency	Winter term, irregularly
Workload	Lecture and exercise sessions: 50h Self-study: 100h
Module duration	1 semester
Teaching and examination language	English
(Recommended) reading	Will be announced in class

Data Analytics for Information Systems (DAIS)

Contents

Learning objectives and skills