網站導航

zblogPHP模板zbpkf
zblog免費模板zblogfree
zblog模板學習zblogxuexi
zblogPHP仿站zbpfang

42算法原理的什么

42算法原理的什么？

算法原理

k近鄰（k-Nearest Neighbor，kNN），應該是最簡單的傳統機器學習模型，給定一個訓練數據集，對新的輸入實例，在訓練數據集中找到與該實例最鄰近的k個實例，這k個實例中的大多數屬于哪個類別，就把該輸入實例劃分到這個類別。

k近鄰算法沒有顯示的訓練過程，在“訓練階段”僅僅是把樣本保存起來，訓練時間開銷為零，待收到測試樣本后在進行計算處理。

這個k實際上是一個超參數，k值的選擇會對k近鄰法的結果產生重大影響。如果選擇較小的k值，意味著只有與輸入實例較近的（相似的）訓練實例才會對預測結果起作用，預測結果會對近鄰的實例點非常敏感，如果近鄰的實例點恰巧是噪聲點，預測就會出錯；如果選擇較大的k值，就意味著與輸入實例較遠的（不相似的）訓練實例也會對預測起作用，這樣預測也會出錯。在實際應用中，k值一般取一個比較小的數值，并且通常采用交叉驗證法來選取最優的k值。如上圖的k=5。

模型訓練代碼地址：https://github.com/qianshuang/ml-exp

def train():

print("start training...")

# 處理訓練數據

train_feature, train_target = process_file(train_dir, word_to_id, cat_to_id)

# 模型訓練

model.fit(train_feature, train_target)

def test():

print("start testing...")

# 處理測試數據

test_feature, test_target = process_file(test_dir, word_to_id, cat_to_id)

# test_predict = model.predict(test_feature) # 返回預測類別

test_predict_proba = model.predict_proba(test_feature) # 返回屬于各個類別的概率

test_predict = np.argmax(test_predict_proba, 1) # 返回概率最大的類別標簽

# accuracy

true_false = (test_predict == test_target)

accuracy = np.count_nonzero(true_false) / float(len(test_target))

print()

print("accuracy is %f" % accuracy)

# precision recall f1-score

print()

print(metrics.classification_report(test_target, test_predict, target_names=categories))

# 混淆矩陣

print("Confusion Matrix...")

print(metrics.confusion_matrix(test_target, test_predict))

if not os.path.exists(vocab_dir):

# 構建詞典表

build_vocab(train_dir, vocab_dir)

categories, cat_to_id = read_category()

words, word_to_id = read_vocab(vocab_dir)

# kNN

model = neighbors.KNeighborsClassifier()

train()

test()運行結果：

read_category...

read_vocab...

start training...

start testing...

accuracy is 0.820000

precision recall f1-score support

時政 0.65 0.85 0.74 94

財經 0.81 0.94 0.87 115

科技 0.96 0.97 0.96 94

游戲 0.99 0.74 0.85 104

娛樂 0.99 0.75 0.85 89

時尚 0.88 0.67 0.76 91

家居 0.44 0.78 0.56 89

房產 0.93 0.82 0.87 104

體育 1.00 0.98 0.99 116

教育 0.96 0.65 0.78 104

avg / total 0.87 0.82 0.83 1000

Confusion Matrix...

[[ 80 4 0 0 0 0 6 3 0 1]

[ 1 108 0 0 0 0 6 0 0 0]

[ 0 0 91 0 0 0 3 0 0 0]

[ 4 0 1 77 0 3 18 0 0 1]

[ 4 3 0 1 67 4 10 0 0 0]

[ 0 0 0 0 1 61 29 0 0 0]

[ 9 5 2 0 0 0 69 3 0 1]

[ 9 3 0 0 0 0 7 85 0 0]

[ 2 0 0 0 0 0 0 0 114 0]

[ 14 10 1 0 0 1 10 0 0 68]]

社群：

公眾號：

了解更多干貨文章，可以關注小程序八斗問答

---------------------

作者：gaoyan0335

來源：CSDN

原文：https://blog.csdn.net/gaoyan0335/article/details/86299367

上一篇tp怎么用

下一篇動態磁特性如何測量

色婷婷狠狠18禁久久YY,CHINESE性内射高清国产,国产女人18毛片水真多1,国产AV在线观看

網站導航

網站導航

網站分類

42算法原理的什么

色婷婷狠狠18禁久久YY,CHINESE性内射高清国产,国产女人18毛片水真多1,国产AV在线观看

網站導航

網站導航

網站分類

42算法原理的什么

相關文章