概念介绍:
将数据按期属性(按列进行)减去其均值,并除以其标准差。得到的结果是,对于每个属性/每列来说所有数据都聚集在0附近,方差为1。
代码示例:
import numpy as np
from sklearn.preprocessing import MinMaxScaler,StandardScaler
def autoNorm(dataset):
x = dataset[:, 0:1]
##method2 Z-socre by Skit-Learn
std = StandardScaler()
x_std = std.fit_transform(x)
print(x_std[2])
##method2 Z-socre by formula
print(np.average(x))
print(np.std(x))
print((x[2]-np.mean(x))/np.std(x))
if __name__ == '__main__':
returnMat, classLabelVector=file2matrix('F:\\datingTestSet2.txt')
autoNorm(returnMat)
执行结果:
数据集示意: