# Grow With Us

### Course Curriculum

It stretches your mind, think better and create even better.Description: Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. In this first module we will introduce to the field of Data Science and how it relates to other fields of data like Artificial Intelligence, Machine Learning and Deep Learning.

Introduction to Data Science

High level view of Data Science, Artificial Intelligence & Machine Learning

Subtle differences between Data Science, Machine Learning & Artificial Intelligence

Approaches to Machine Learning

Terms & Terminologies of Data Science

Understanding an end to end Data Science Pipeline, Implementation cycle

Description: Mathematics is very important in the field of data science as concepts within mathematics aid in identifying patterns and assist in creating algorithms. The understanding of various notions of Statistics and Probability Theory are key for the implementation of such algorithms in data science.

Linear Algebra

Matrices, Matrix Operations

Eigen Values, Eigen Vectors

Scalar, Vector and Tensors

Prior and Posterior Probability

Conditional Probability

Calculus

Differentiation, Gradient and Cost Functions

Graph Theory

Description: This module focuses on understanding statistical concepts required for Data Science, Machine Learning and Deep Learning. In this module, you will be introduced to the estimation of various statistical measures of a data set, simulating random distributions, performing hypothesis testing, and building statistical models.

**Descriptive Statistics**

Types of Data (Discrete vs Continuous)

Types of Data (Nominal, Ordinal)

Measures of Central Tendency (Mean, Median, Mode)

Measures of Dispersion (Variance, Standard Deviation)

Range, Quartiles, Inter Quartile Ranges

Measures of Shape (Skewness and Kurtosis)

Tests for Association (Correlation and Regression)

Random Variables

Probability Distributions

Standard Normal Distribution

Probability Distribution Function

Probability Mass Function

Cumulative Distribution Function

**Inferential Statistics**

Statistical sampling & Inference

Hypothesis Testing

Null and Alternate Hypothesis

Margin of Error

Type I and Type II errors

One Sided Hypothesis Test, Two-Sided Hypothesis Test

Tests of Inference: Chi-Square, T-test, Analysis of Variance

t-value and p-value

Confidence Intervals

**Python for Data Science**

Numpy

Pandas

Matplotlib & Seaborn

Jupyter Notebook

**Numpy**

NumPy is a Python library that works with arrays when performing scientific computing with Python. Explore how to initialize and load data into arrays and learn about basic array manipulation operations using NumPy.

Loading data with Numpy

Comparing Numpy with Traditional Lists

Numpy Data Types

Indexing and Slicing

Copies and Views

Numerical Operations with Numpy

Matrix Operations on Numpy Arrays

Aggregations functions

Shape Manipulations

Broadcasting

Statistical operations using Numpy

Resize, Reshape, Ravel

Image Processing with Numpy

**Pandas**

Pandas is a Python library that provides utilities to deal with structured data stored in the form of rows and columns. Discover how to work with series and tabular data, including initialization, population, and manipulation of Pandas Series and DataFrames.

Basics of Pandas

Loading data with Pandas

Series

Operations on Series

DataFrames and Operations of DataFrames

Selection and Slicing of DataFrames

Descriptive statistics with Pandas

Map, Apply, Iterations on Pandas DataFrame

Working with text data

Multi Index in Pandas

GroupBy Functions

Merging, Joining and Concatenating DataFrames

Visualization using Pandas

**Data Visualization using Matplotlib**

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+

Anatomy of Matplotlib figure

Plotting Line plots with labels and colors

Adding markers to line plots

Histogram plots

Scatter plots

Size, Color and Shape selection in Scatter plots.

Applying Legend to Scatter plots

Displaying multiple plots using subplots

Boxplots, scatter_matrix and Pair plots

**Data Visualization using Seaborn**

Seaborn is a data visualization library that provides a high-level interface for drawing graphs. These graphs are able to convey a lot of information, while also being visually appealing.

Basic Plotting using Seaborn

Violin Plots

Box Plots

Cat Plots

Facet Grid

Swarm Plot

Pair Plot

Bar Plot

LM Plot

Variations in LM plot using hue, markers, row and col

Exploratory Data Analysis helps in identifying the patterns in the data by using basic statistical methods as well as using visualization tools to displays graphs and charts. With EDA we can assess the distribution of the data and conclude various models to be used.

**Pipeline ideas**

Exploratory Data Analysis

Feature Creation

Evaluation Measures

**Data Analytics Cycle ideas**

Data Acquisition

Data Preparation

Data cleaning

Data Visualization

Plotting

Model Planning & Model Building

**Data Inputting**

Reading and writing data to text files

Reading data from a csv

Reading data from JSON

**Data preparation**

Selection and Removal of Columns

Transform

Rescale

Standardize

Normalize

Binarize

One hot Encoding

Imputing

Train, Test Splitting

In machine learning, computers apply statistical learning techniques to automatically identify patterns in data. This module on Machine Learning is a deep dive to Supervised, Unsupervised learning and Gaussian / Naive-Bayes methods. Also you will be exposed to different classification, clustering and regression methods.

Introduction to Machine Learning

Applications of Machine Learning

Supervised Machine Learning

Classification

Regression

Unsupervised Machine Learning

Reinforcement Learning

Latest advances in Machine Learning

Model Representation

Model Evaluation

Hyper Parameter tuning of Machine Learning Models.

Evaluation of ML Models.

Estimating and Prediction of Machine Learning Models

Deployment strategy of ML Models.

Supervised learning is one of the most popular techniques in machine learning. In this module, you will learn about more complicated supervised learning models and how to use them to solve problems.

**Classification methods & respective evaluation**

K Nearest Neighbors

Decision Trees

Naive Bayes

Stochastic Gradient Descent

SVM –

Linear

Non linear

Radial Basis Function

Random Forest

Gradient Boosting Machines

XGboost

Logistic regression

**Ensemble methods**

Combining models

Bagging

Boosting

Voting

Choosing best classification method

**Model Tuning**

Train Test Splitting

K-fold cross validation

Variance bias tradeoff

L1 and L2 norm

Overfit, underfit along with learning curves variance bias sensibility using graphs

Hyper Parameter Tuning using Grid Search CV

**Respective Performance measures**

Different Errors (MAE, MSE, RMSE)

Accuracy, Confusion Matrix, Precision, Recall

Regression is a type of predictive modelling technique which is heavily used to derive the relationship between variables (the dependent and independent variables). This technique finds its usage mostly in forecasting, time series modelling and finding the causal effect relationship between the variables. The module discusses in detail about regression and types of regression and its usage & applicability

**Regression**

Linear Regression

Variants of Regression

Lasso

Ridge

Multi Linear Regression

Logistic Regression (effectively, classification only)

Regression Model Improvement

Polynomial Regression

Random Forest Regression

Support Vector Regression

**Respective Performance measures**

Different Errors (MAE, MSE, RMSE)

Mean Absolute Error

Mean Square Error

Root Mean Square Error

Unsupervised learning can provide powerful insights on data without the need to annotate examples. In this module, you will learn several different techniques in unsupervised machine learning.

**Clustering**

K means

Hierarchical Clustering

DBSCAN

**Association Rule Mining**

Association Rule Mining.

Market Basket Analysis using Apriori Algorithm

Dimensionality reduction using Principal Component analysis (PCA)

Natural language is essential to human communication, which makes the ability to process it an important one for computers. In this module, you will be introduced to natural language processing and some of the basic tasks.

Text Analytics

Stemming, Lemmatization and Stop word removal.

POS tagging and Named Entity Recognition

Bigrams, Ngrams and colocations

Term Document Matrix

Count Vectorizer

Term Frequency and TF-IDF

Advanced Analytics covers various areas like Time series Analysis, ARIMA models, Recommender systems etc.

**Time series**

Time series Analysis.

ARIMA example

**Recommender Systems**

Content Based Recommendation

Collaborative Filtering

Reinforcement learning is an area of Machine Learning which takes suitable action to maximize reward in a particular situation. It is employed by various software and machines to find the best possible behavior or path it should take in a specific situation.

Basic concepts of Reinforcement Learning

Action

Reward

Penalty Mechanism

Feedback loop

Deep Q Learning

Artificial intelligence (AI) is the ability of a computer program or a machine to think and learn. It is also a field of study which tries to make computers "smart"

**Artificial Neural Networks**

Neural Networks & terminologies

Non linearity problem, illustration

Perceptron learning

Feed Forward Network and Back propagation

Gradient Descent

**Mathematics of Artificial Neural Networks**

Gradients

Partial derivatives

Linear algebra

Li

LD

Eigen vectors

Projections

Vector quantization

**Overview of tools used in Neural Networks**

Tensor Flow

Keras

Deep learning is part of a broader family of machine learning methods based on the layers used in artificial neural networks. In this module, you’ll deep dive in the concepts of Neural Networks, Convolutional Neural Networks, Recurrent Neural Networks, Auto Encoders and many more.

**Deep Learning**

Tensorflow & keras installation

More elaborate discussion on cost function

Measuring accuracy of hypothesis function

Role of gradient function in minimizing cost function

Explicit discussion of Bayes models

Hidden Markov Models (HMM)

Optimization basics

Sales Prediction of a Gaming company using Neural Networks

Build an Image similarity engine.

**Deep Learning with Convolutional Neural Nets**

Architecture of CNN

Types of layers in CNN

Different Filters and Kernels

Building an Image classifier with and without CNN

**Recurrent neural nets**

Fundamental notions & ideas

Recurrent neurons

Handling variable length sequences

Training a sequence classifier

Training to predict Time series

Cloud computing is massively growing in importance in the IT sector as more and more companies are eschewing traditional IT and moving applications and business processes to the cloud. This section covers detailed information about how to deploy Data Science models on Cloud environments.

**Topics**

Introduction to Cloud Computing

Amazon Web Services Preliminaries - S3, EC2, RDS

Big data processing on AWS using Elastic Map Reduce (EMR)

Machine Learning using Amazon Sage Maker

Deep Learning on AWS Cloud

Natural Language processing using AWS Lex

Analytics services on AWS Cloud

Data Warehousing on AWS Cloud

Creating Data Pipelines on AWS Cloud

DevOps play a pivotal role in bridging the gap between Development and Operational teams. This section covers key DevOps tools which a Data Scientist need to be aware of for doing their day to day data science work.

**Topics**

Introduction to DevOps for Data Science

Tasks in Data Science Development

Deploying Models in Production

Deploying Machine Learning Models as Services

Running Machine Learning Services in Containers

Scaling ML Services with Kubernetes