Normalization
Data transformation is one of the critical steps in Data Mining. Among many data transformation methods, normalization is a most frequently used technique. For example, we can use Z-score normalization to reduce possible noise in sound frequency.
We will introduce three common normalization method, Max-Min Normalization, Z-Score Normalization, Scale multiplication.
Max-Min Normalization
it will scale all the data between 0 and 1.
Example:
Chinese high schools use 150 point scale, USA high schools use 100 point scale and Russian high schools use 5 point scale.
`
Z-Score Normalization
It will transform the data in units relative to the standard deviation.
Example:
It is useful when comparing data sets with different units (cm and inch).
Scale multiplication
$ Z_{z-normal} =X*10 or Z_{z-normal} =X/10$
It will transform the data in scales of muliple of 10.
Example:
Some money transactions are too large, we will divide 1000 to make it viewer friendly.
Code
import random
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import colors
from matplotlib.ticker import PercentFormatter
from matplotlib import pylab
y=random.sample(range(0,150),50)
x=list(map(int,y))
x1=np.array(x)
xmin=min(x)
xmax=max(x)
#Max-Min normalization
mmnorm=(x1 - xmin)/(xmax-xmin)
#plot
fig,axs=plt.subplots(1,2,sharey=True)
#Original random number
axs[0].hist(x, bins=10)
axs[0].title.set_text("Random Data")
#Max-Min normalizaed histogram Plot
axs[1].hist(mmnorm, bins=10,color="lightblue")
plt.title("Max-Min Normalized Data")
plt.show()
#Z-score Normalization
y2=random.sample(range(0,150),50)
x2=list(map(int,y3))
x21=np.array(x2)
mean=np.mean(x21)
sd=np.std(x21)
#scale normalization
znorm=(x21-mean)/sd
#plot
fig,axs=plt.subplots(1,2,sharey=True)
#Original random number
axs[0].hist(x2, bins=10, color="green")
axs[0].title.set_text("Random Data")
#scale normalizaed histogram Plot
axs[1].hist(znorm, bins=10,color="lightgreen")
plt.title("Z-score Normalized Data")
plt.show()
#scale
y3=random.sample(range(1000,10000),50)
x3=list(map(int,y3))
x31=np.array(x3)
#scale normalization
snorm=x31/1000
#plot
fig,axs=plt.subplots(1,2,sharey=True)
#Original random number
axs[0].hist(x3, bins=10, color="orange")
axs[0].title.set_text("Random Data")
#scale normalizaed histogram Plot
axs[1].hist(snorm, bins=10,color="yellow")
plt.title("Scale Normalized Data")
plt.show()
共同学习,写下你的评论
评论加载中...
作者其他优质文章