In this tutorial, we will discuss about percentiles , quantiles and quantiles in data .

Photo by Artem Beliaikin on Unsplash

So , In data how can we find where is 10 percentiles , where is 20 , 30 so on.

In section 1, will discuss about percentiles .

In section 2, will discuss about quantiles or quartiles in depth .

So, understand in depth let take a simple sample of height data -

Height distribution — 18 , 19 , 20, 55 , 48 , 46, 77, 15, 17, 66.

It’s height of sample . now visualize the quantiles and percentiles.

First of all, we…


Photo by Luke Chesser on Unsplash

Now , In this tutorial we will learn about distribution or distribution of the continuous data.

There is some of distribution that describe the continuous data.

1- Symmetric distribution

2- Skew distribution

3- Center of distribution

4- Location of distribution

5- Spread of distribution

6- Normal distribution

7- Binomial distribution

Etc.

Distribution is useful to know in depth about continous data or analysis and describe the data very precisely.


In this tutorial we will learn about relationship. How can we visualize relationship/association between two variables using graphs.

Photo by Isaac Smith on Unsplash

Table of Contents

  1. Analysis the Relationship between categorical and numeric variable.
  2. Analysis the Relationship between two categorical variable.
  3. Analysis the relationship Between Two numeric variable.

Note- Graph are a great way that we can visualize and explore the relationship or two variable are associated and the nature of the association.

No matter, variable are either categorical or numeric.

1- Exploring the relationship between categorical and numeric variable using Box plot.

We have many options to visualize the relationship. …


Photo by Morgan Housel on Unsplash

Analysis the continuous data points using Box plot.

Suppose ,we take example of Frequency of fruits.

Distribute of frequency is


Photo by Isaac Smith on Unsplash

Section 1 -

How to Summarizing Categorical variable or Qualitative Variable.

Table of content.

  1. Frequency Table.
  2. bar charts.
  3. Pie Chart.

To understand the concept , suppose we take a example of wines. Sample of size 200 and we have recorded of never drink peoples , past drink peoples , present drink peoples .best way to summarized the qualitative data or categorical data in frequency relative frequency is called proportion and percentage.

Let take doing that , summarizing categorical variable.

1- Frequency table , some time is called frequency distribution .[Sample Size =200]

Drinking Status


Photo by CHUTTERSNAP on Unsplash

I was working a project which were related to it For that needed more speaker and their voices .Searched many times on web and find different different places. but solution didn’t found. after many days research found a solution to how to add more speakers and their voices in pyttsx3.

So. like me there is many students and project managers and developers working on this kind of projects . So it is helpful for them.

We already known there is only two default system speakers and their voices are available in Microsoft window or pyttsx3.

prerequisite —

1-Install pyttsx3 (pip…


Photo by Will Myers on Unsplash

Detect and Handle the outliers is biggest and challengeable task in Machine learning.

Outliers directly effect on model accuracy.

First let understand , what is the outliers in dataset?

An outlier is a data set that is distant from all other observations. A data points that lies outside the overall distribution of the dataset.

Now, let understand with the help of example….

In an organization, The salary range of all employees in between 10k$ to 50k$.

So, in salary column all employee’s salaries fall under the given range.

Suppose, we have 10 employees in an organization and their salaries distributions.


Photo by Elena Mozhvilo on Unsplash

Indeed , unbalanced data set is biggest challengeable task in machine learning.
It’s common problem in machine learning specially in classification.
It effect on model accuracy and lead overfitting.


Photo by Morning Brew on Unsplash

ARIMA, ARMA and SARIMA are used for predict future data(Forecasting), that can be sale, stock price , no of visitors , supply data etc.

There are many models for data forecasting . but in this tutorial our main focus on discuss about these three models and how to do forecasting using these three models.

First let understand about ARMA, ARIMA and SARIMA models.

Before go on ARMA, ARIMA and SARIMA . let understand two basic model of forecasting.

1-Auto regression.

AR(p)

The value for “p” is called the order.

For example, an AR(1) would be a “first order auto regressive…


Photo by Dan Meyers on Unsplash

We already saw about data analysis and visualization.

Now, In this second part. we will see how feature engineering perform on data.

So, in this tutorial. we will perform feature engineering on part1’s dataset.

What , We will perform in feature engineering -

1- Handle Null values.

2- Perform label encoding .

3- Perform feature scaling.

So let start with this-

STEP 1- Once again we are checking, how many columns(features) contains null values.

Before it, first let discuss about techniques of handle null values.

There are various way to handle NULL values.

1. Delete that rows. who have contains…

Puneet Singh

Data Science and Machine Learning Enthusiast.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store