Course Curriculum

It stretches your mind, think better and create even better.

Description: Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. In this first module we will introduce to the field of Data Science and how it relates to other fields of data like Artificial Intelligence, Machine Learning and Deep Learning.

Introduction to Data Science

High level view of Data Science, Artificial Intelligence & Machine Learning

Subtle differences between Data Science, Machine Learning & Artificial Intelligence

Approaches to Machine Learning

Terms & Terminologies of Data Science

Understanding an end to end Data Science Pipeline, Implementation cycle

Description: Mathematics is very important in the field of data science as concepts within mathematics aid in identifying patterns and assist in creating algorithms. The understanding of various notions of Statistics and Probability Theory are key for the implementation of such algorithms in data science.

Linear Algebra

Matrices, Matrix Operations

Eigen Values, Eigen Vectors

Scalar, Vector and Tensors

Prior and Posterior Probability

Conditional Probability


Differentiation, Gradient and Cost Functions

Graph Theory

Description: This module focuses on understanding statistical concepts required for Data Science, Machine Learning and Deep Learning. In this module, you will be introduced to the estimation of various statistical measures of a data set, simulating random distributions, performing hypothesis testing, and building statistical models.

Descriptive Statistics

Types of Data (Discrete vs Continuous)

Types of Data (Nominal, Ordinal)

Measures of Central Tendency (Mean, Median, Mode)

Measures of Dispersion (Variance, Standard Deviation)

Range, Quartiles, Inter Quartile Ranges

Measures of Shape (Skewness and Kurtosis)

Tests for Association (Correlation and Regression)

Random Variables

Probability Distributions

Standard Normal Distribution

Probability Distribution Function

Probability Mass Function

Cumulative Distribution Function

Inferential Statistics

Statistical sampling & Inference

Hypothesis Testing

Null and Alternate Hypothesis

Margin of Error

Type I and Type II errors

One Sided Hypothesis Test, Two-Sided Hypothesis Test

Tests of Inference: Chi-Square, T-test, Analysis of Variance

t-value and p-value

Confidence Intervals

Python for Data Science



Matplotlib & Seaborn

Jupyter Notebook


NumPy is a Python library that works with arrays when performing scientific computing with Python. Explore how to initialize and load data into arrays and learn about basic array manipulation operations using NumPy.

Loading data with Numpy

Comparing Numpy with Traditional Lists

Numpy Data Types

Indexing and Slicing

Copies and Views

Numerical Operations with Numpy

Matrix Operations on Numpy Arrays

Aggregations functions

Shape Manipulations


Statistical operations using Numpy

Resize, Reshape, Ravel

Image Processing with Numpy


Pandas is a Python library that provides utilities to deal with structured data stored in the form of rows and columns. Discover how to work with series and tabular data, including initialization, population, and manipulation of Pandas Series and DataFrames.

Basics of Pandas

Loading data with Pandas


Operations on Series

DataFrames and Operations of DataFrames

Selection and Slicing of DataFrames

Descriptive statistics with Pandas

Map, Apply, Iterations on Pandas DataFrame

Working with text data

Multi Index in Pandas

GroupBy Functions

Merging, Joining and Concatenating DataFrames

Visualization using Pandas

Data Visualization using Matplotlib

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+

Anatomy of Matplotlib figure

Plotting Line plots with labels and colors

Adding markers to line plots

Histogram plots

Scatter plots

Size, Color and Shape selection in Scatter plots.

Applying Legend to Scatter plots

Displaying multiple plots using subplots

Boxplots, scatter_matrix and Pair plots

Data Visualization using Seaborn

Seaborn is a data visualization library that provides a high-level interface for drawing graphs. These graphs are able to convey a lot of information, while also being visually appealing.

Basic Plotting using Seaborn

Violin Plots

Box Plots

Cat Plots

Facet Grid

Swarm Plot

Pair Plot

Bar Plot

LM Plot

Variations in LM plot using hue, markers, row and col

Exploratory Data Analysis helps in identifying the patterns in the data by using basic statistical methods as well as using visualization tools to displays graphs and charts. With EDA we can assess the distribution of the data and conclude various models to be used.

Pipeline ideas

Exploratory Data Analysis

Feature Creation

Evaluation Measures

Data Analytics Cycle ideas

Data Acquisition

Data Preparation

Data cleaning

Data Visualization


Model Planning & Model Building

Data Inputting

Reading and writing data to text files

Reading data from a csv

Reading data from JSON

Data preparation

Selection and Removal of Columns






One hot Encoding


Train, Test Splitting

In machine learning, computers apply statistical learning techniques to automatically identify patterns in data. This module on Machine Learning is a deep dive to Supervised, Unsupervised learning and Gaussian / Naive-Bayes methods. Also you will be exposed to different classification, clustering and regression methods.

Introduction to Machine Learning

Applications of Machine Learning

Supervised Machine Learning



Unsupervised Machine Learning

Reinforcement Learning

Latest advances in Machine Learning

Model Representation

Model Evaluation

Hyper Parameter tuning of Machine Learning Models.

Evaluation of ML Models.

Estimating and Prediction of Machine Learning Models

Deployment strategy of ML Models.

Supervised learning is one of the most popular techniques in machine learning. In this module, you will learn about more complicated supervised learning models and how to use them to solve problems.

Classification methods & respective evaluation

K Nearest Neighbors

Decision Trees

Naive Bayes

Stochastic Gradient Descent



Non linear

Radial Basis Function

Random Forest

Gradient Boosting Machines


Logistic regression

Ensemble methods

Combining models




Choosing best classification method

Model Tuning

Train Test Splitting

K-fold cross validation

Variance bias tradeoff

L1 and L2 norm

Overfit, underfit along with learning curves variance bias sensibility using graphs

Hyper Parameter Tuning using Grid Search CV

Respective Performance measures

Different Errors (MAE, MSE, RMSE)

Accuracy, Confusion Matrix, Precision, Recall

Regression is a type of predictive modelling technique which is heavily used to derive the relationship between variables (the dependent and independent variables). This technique finds its usage mostly in forecasting, time series modelling and finding the causal effect relationship between the variables. The module discusses in detail about regression and types of regression and its usage & applicability


Linear Regression

Variants of Regression



Multi Linear Regression

Logistic Regression (effectively, classification only)

Regression Model Improvement

Polynomial Regression

Random Forest Regression

Support Vector Regression

Respective Performance measures

Different Errors (MAE, MSE, RMSE)

Mean Absolute Error

Mean Square Error

Root Mean Square Error

Unsupervised learning can provide powerful insights on data without the need to annotate examples. In this module, you will learn several different techniques in unsupervised machine learning.


K means

Hierarchical Clustering


Association Rule Mining

Association Rule Mining.

Market Basket Analysis using Apriori Algorithm

Dimensionality reduction using Principal Component analysis (PCA)

Natural language is essential to human communication, which makes the ability to process it an important one for computers. In this module, you will be introduced to natural language processing and some of the basic tasks.

Text Analytics

Stemming, Lemmatization and Stop word removal.

POS tagging and Named Entity Recognition

Bigrams, Ngrams and colocations

Term Document Matrix

Count Vectorizer

Term Frequency and TF-IDF

Advanced Analytics covers various areas like Time series Analysis, ARIMA models, Recommender systems etc.

Time series

Time series Analysis.

ARIMA example

Recommender Systems

Content Based Recommendation

Collaborative Filtering

Reinforcement learning is an area of Machine Learning which takes suitable action to maximize reward in a particular situation. It is employed by various software and machines to find the best possible behavior or path it should take in a specific situation.

Basic concepts of Reinforcement Learning



Penalty Mechanism

Feedback loop

Deep Q Learning

Artificial intelligence (AI) is the ability of a computer program or a machine to think and learn. It is also a field of study which tries to make computers "smart"

Artificial Neural Networks

Neural Networks & terminologies

Non linearity problem, illustration

Perceptron learning

Feed Forward Network and Back propagation

Gradient Descent

Mathematics of Artificial Neural Networks


Partial derivatives

Linear algebra



Eigen vectors


Vector quantization

Overview of tools used in Neural Networks

Tensor Flow


Deep learning is part of a broader family of machine learning methods based on the layers used in artificial neural networks. In this module, you’ll deep dive in the concepts of Neural Networks, Convolutional Neural Networks, Recurrent Neural Networks, Auto Encoders and many more.

Deep Learning

Tensorflow & keras installation

More elaborate discussion on cost function

Measuring accuracy of hypothesis function

Role of gradient function in minimizing cost function

Explicit discussion of Bayes models

Hidden Markov Models (HMM)

Optimization basics

Sales Prediction of a Gaming company using Neural Networks

Build an Image similarity engine.

Deep Learning with Convolutional Neural Nets

Architecture of CNN

Types of layers in CNN

Different Filters and Kernels

Building an Image classifier with and without CNN

Recurrent neural nets

Fundamental notions & ideas

Recurrent neurons

Handling variable length sequences

Training a sequence classifier

Training to predict Time series

Cloud computing is massively growing in importance in the IT sector as more and more companies are eschewing traditional IT and moving applications and business processes to the cloud. This section covers detailed information about how to deploy Data Science models on Cloud environments.


Introduction to Cloud Computing

Amazon Web Services Preliminaries - S3, EC2, RDS

Big data processing on AWS using Elastic Map Reduce (EMR)

Machine Learning using Amazon Sage Maker

Deep Learning on AWS Cloud

Natural Language processing using AWS Lex

Analytics services on AWS Cloud

Data Warehousing on AWS Cloud

Creating Data Pipelines on AWS Cloud

DevOps play a pivotal role in bridging the gap between Development and Operational teams. This section covers key DevOps tools which a Data Scientist need to be aware of for doing their day to day data science work.


Introduction to DevOps for Data Science

Tasks in Data Science Development

Deploying Models in Production

Deploying Machine Learning Models as Services

Running Machine Learning Services in Containers

Scaling ML Services with Kubernetes