我正在使用 svm 来查看是否可以获取棒球数据并对击球进行分类并估算本垒打。当我多次运行模型时,我似乎得到了不同的结果,因此,我做了一个模拟,它运行了 100 次模型,但我不明白为什么以及是什么导致了变化。有人可以解释为什么会这样吗?我确实设置了 random_state=42import pandas as pdfrom mlxtend.plotting import plot_decision_regionsimport matplotlib.pyplot as pltfrom sklearn.svm import SVCfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrixfrom sklearn import metricsimport statisticsimport numpy as npresult_array = []players = [488768, 517369, 461314, 477165, 506560, 572114, 641319, 592669, 622534, 605486, 602922, 518466, 572362, 519082, 623182, 595978, 543272]dfSave = pd.DataFrame(columns=['Mean','Max','Min','Std', 'Accuracy', 'Precision', 'f1_score', 'Recall_Score', 'First_Name', 'Last_Name'])for i in players: batter = i df = pd.read_csv('D:baseballData_2016_use.csv') df2 = pd.read_csv('D:padres_2016_home.csv') #Team to test dataFilter = df.loc[df['Home_Team'] == 'Orioles'] #Stadium to train model to. dataFilter2 = df2.loc[df2['Batter_ID'] == batter] #Players to test in stadium j = 0 while j <= 100: predict = dataFilter2.iloc[:,[4,5]] X =dataFilter.iloc[:,[4,5]] y = dataFilter.iloc[:,3] y = y.astype(np.integer) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30) svclassifier = SVC(C=4, cache_size=200, class_weight= None, coef0=0.0, decision_function_shape='ovo', degree=3, gamma=0.001, kernel='rbf', max_iter=-1, probability=False, random_state=42, shrinking=False, tol=0.001, verbose=False) #defaults svclassifier.fit(X_train, y_train) y_pred = svclassifier.predict(X_test) predicted= svclassifier.predict(predict) listDf = [] sum = 0 # print predicted home runs for i in predicted: if i == 1: sum = sum + 1 result_array.append(sum) print(sum)
1 回答
莫回无
TA贡献1865条经验 获得超7个赞
在您的代码中,随机性来自train_test_split
在每次运行时给出不同的分割。
您可以通过修复来避免这种情况,random_state
但多次运行它被认为是更好的做法(正如您所做的那样),获取输出分数的分布,计算分数的置信区间并报告。
添加回答
举报
0/150
提交
取消