为了账号安全,请及时绑定邮箱和手机立即绑定

Normalization Methods

Normalization

Data transformation is one of the critical steps in Data Mining. Among many data transformation methods, normalization is a most frequently used technique. For example, we can use Z-score normalization to reduce possible noise in sound frequency.

We will introduce three common normalization method, Max-Min Normalization, Z-Score Normalization, Scale multiplication.

Max-Min Normalization
xnormal=(xmin(x))(max(x)min(x))x_{normal}= \frac{(x- min(x))}{(max(x)- min(x))}
it will scale all the data between 0 and 1.
Example:
Chinese high schools use 150 point scale, USA high schools use 100 point scale and Russian high schools use 5 point scale.

`

Z-Score Normalization

Xznormal=(Xmean)sdX_{z-normal}= \frac{(X- mean)}{sd}
It will transform the data in units relative to the standard deviation.
Example:
It is useful when comparing data sets with different units (cm and inch).

Scale multiplication

$ Z_{z-normal} =X*10 or Z_{z-normal} =X/10$
It will transform the data in scales of muliple of 10.
Example:
Some money transactions are too large, we will divide 1000 to make it viewer friendly.

Code

import random
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import colors
from matplotlib.ticker import PercentFormatter
from matplotlib import pylab


y=random.sample(range(0,150),50)
x=list(map(int,y))
x1=np.array(x)
xmin=min(x)
xmax=max(x)

#Max-Min normalization
mmnorm=(x1 - xmin)/(xmax-xmin)
#plot

fig,axs=plt.subplots(1,2,sharey=True)

#Original random number
axs[0].hist(x, bins=10)
axs[0].title.set_text("Random Data")


#Max-Min normalizaed histogram Plot
axs[1].hist(mmnorm, bins=10,color="lightblue")
plt.title("Max-Min Normalized Data")
plt.show()

#Z-score Normalization

y2=random.sample(range(0,150),50)
x2=list(map(int,y3))
x21=np.array(x2)
mean=np.mean(x21)
sd=np.std(x21)


#scale normalization
znorm=(x21-mean)/sd

#plot

fig,axs=plt.subplots(1,2,sharey=True)

#Original random number
axs[0].hist(x2, bins=10, color="green")
axs[0].title.set_text("Random Data")


#scale normalizaed histogram Plot
axs[1].hist(znorm, bins=10,color="lightgreen")
plt.title("Z-score Normalized Data")
plt.show()

#scale

y3=random.sample(range(1000,10000),50)
x3=list(map(int,y3))
x31=np.array(x3)

#scale normalization
snorm=x31/1000

#plot

fig,axs=plt.subplots(1,2,sharey=True)

#Original random number
axs[0].hist(x3, bins=10, color="orange")
axs[0].title.set_text("Random Data")


#scale normalizaed histogram Plot
axs[1].hist(snorm, bins=10,color="yellow")
plt.title("Scale Normalized Data")
plt.show()
点击查看更多内容
TA 点赞

若觉得本文不错,就分享一下吧!

评论

作者其他优质文章

正在加载中
  • 推荐
  • 1
  • 收藏
  • 共同学习,写下你的评论
感谢您的支持,我会继续努力的~
扫码打赏,你说多少就多少
赞赏金额会直接到老师账户
支付方式
打开微信扫一扫,即可进行扫码打赏哦
今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与 放弃机会
意见反馈 帮助中心 APP下载
官方微信

举报

0/150
提交
取消