测试了一下使用linear regression做分类任务准确率的确很高啊,我的的思路是:
- 需要预测的是 0 1 ,而linear regression本来的预测值是连续变量
- 把linear regression预测的结果 >0.5 的当成 1,把 <0.5 的当成0
- 然后把预测结果与实际的结果比较
核心代码(交叉验证)
这里根据自己的数据填充:
- x_train_std:正则化后的训练的 X
- y_train:训练的 Y
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import ShuffleSplit
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import KFold
from sklearn.metrics import make_scorer
from sklearn.model_selection import cross_validate
def linear_score(true_value, predict):
predict[predict < 0.5] = 0
predict[predict > 0.5] = 1
return predict[predict == true_value].size / predict.size
liner_model = LinearRegression()
scoring = {
'linear_score': make_scorer(linear_score, greater_is_better=True)
}
kfold = KFold(n_splits=10, random_state=0)
cv_cross = cross_validate(liner_model, x_train_std, y_train, cv=kfold, scoring=scoring)
print(cv_cross['test_linear_score'].mean()) # 交叉验证的均值
print(cv_cross['test_linear_score'].std()) # 交叉验证的方差