**In this tutorial,** we will discuss about percentiles , **quantiles **and **quantiles **in data .

**So **, In data how can we find where is** 10 percentiles **, where is** 20** , **30** **so on.**

**In section 1,** will discuss about **percentiles .**

**In section 2,** will discuss about **quantiles **or **quartiles **in **depth **.

So, **understand **in depth let take a simple sample of **height **data -

**Height distribution — 18 , 19 , 20, 55 , 48 , 46, 77, 15, 17, 66.**

It’s** height of sample . **now **visualize **the **quantiles **and **percentiles**.

**First of all**, we…

**Now **,** In this tutorial** we will learn about **distribution **or **distribution of the continuous data.**

There is some of **distribution **that describe the **continuous **data.

**1- Symmetric distribution**

**2- Skew distribution**

**3- Center of distribution**

**4- Location of distribution**

**5- Spread of distribution**

**6- Normal distribution**

**7- Binomial distribution**

**Etc.**

**Distribution is useful to know in depth about continous data or analysis and describe the data very precisely.**

**In this tutorial we will learn about relationship. How can we visualize relationship/association between two variables using graphs.**

**Table of Contents**

**Analysis the Relationship between categorical and numeric variable.****Analysis the Relationship between two categorical variable.****Analysis the relationship Between Two numeric variable.**

**Note**- Graph are a great way that we can **visualize **and explore the **relationship **or **two variable are associated and the nature of the association.**

No matter, variable are either **categorical **or **numeric**.

**1- **Exploring the relationship between **categorical **and **numeric **variable using** Box plot.**

We have many options to visualize the relationship. …

**Analysis the continuous data points using Box plot.**

**Suppose ,**we take example of** Frequency of fruits.**

**Distribute of frequency is**

**Section 1 -**

How to **Summarizing Categorical **variable or **Qualitative **Variable.

**Table of content.**

**Frequency Table.****bar charts.****Pie Chart.**

To understand the concept , suppose we take a example of **wines**. Sample of size **200 **and we have recorded of **never drink peoples** ,** past drink peoples** , **present drink peoples **.best way to **summarized **the **qualitative **data or **categorical **data in frequency relative frequency is called **proportion **and **percentage**.

Let take doing that , summarizing **categorical **variable.

1- **Frequency **table , some time is called **frequency distribution **.[Sample Size =**200**]

**Drinking Status**

I was **working **a **project **which were related to it For that needed more speaker and their voices .Searched many times on web and find different different places. but solution didn’t found. after** many days** research found a solution to how to add more speakers and their voices in pyttsx3.

So. like me there is **many students and project managers and developers working on this kind of projects .** So it is helpful for them.

We already known there is only two default system speakers and their voices are **available **in **Microsoft window **or **pyttsx3**.

**prerequisite —**

**1-Install pyttsx3 (pip…**

**Detect and Handle **the **outliers **is **biggest **and **challengeable **task in **Machine learning**.

Outliers directly effect on **model accuracy**.

First let understand , what is the **outliers **in **dataset**?

An **outlier **is a data set that is **distant **from all other **observations**. A data points that **lies outside **the overall **distribution **of the dataset.

**Now**, let understand with the help of example….

In an **organization**, The **salary range **of all **employees **in **between 10k**$ to **50k**$.

So, in salary column all **employee’s salaries **fall under the given range.

**Suppose**, we have 10 **employees **in an organization and their **salaries distributions**.

**Indeed **, **unbalanced **data set is biggest challengeable task in **machine learning**.

It’s common problem in machine learning specially in **classification**.

It effect on **model accuracy **and lead **overfitting**.

**ARIMA**, **ARMA **and **SARIMA **are used for **predict future data**(**Forecasting**), that can be sale, **stock price **, **no of visitors **, supply data **etc**.

There are many **models **for data **forecasting **. but in this tutorial our main focus on discuss about these three models and how to do forecasting using these three models.

First let understand about **ARMA**, **ARIMA **and **SARIMA **models.

Before go on **ARMA**, **ARIMA **and **SARIMA **. let understand two basic model of **forecasting**.

**1-Auto regression.**

**AR(p)**

**The value for “p” is called the order.**

For example, an AR(1) would be a** “first order auto regressive…**

We already saw about **data analysis** and **visualization**.

**Now**, In this second part. we will see how** feature engineering perform **on **data**.

So, **in this tutorial. **we will perform** feature engineering **on** part1’s dataset.**

**What , We will perform in feature engineering -**

**1- Handle Null values.**

**2- Perform label encoding .**

**3- Perform feature scaling.**

**So let start with this-**

**STEP 1-** Once again we are **checking**, how many columns(features) contains **null **values.

**Before it, first let discuss about techniques of handle null values.**

**There are various way to handle NULL values.**

**1. Delete that rows. who have contains…**

Data Science and Machine Learning Enthusiast.