admin 管理员组

文章数量: 1184232

kNN

kNN_约会网站匹配效果改进

【准备数据】数据处理函数

import numpy as np
import os
def file2matrix(filename):fr = open(filename)arrayOLines = fr.readlines()numberOfLines = len(arrayOLines)returnMat = np.zeros((numberOfLines,3))classLabelVector = []index = 0for line in arrayOLines:line = line.strip()listFromLine = line.split('\t')returnMat[index,:] = listFromLine[0:3]classLabelVector.append(label2int(listFromLine[-1]))index += 1return returnMat,classLabelVectordef label2int(labelName):if labelName == 'didntLike':return 0elif labelName == 'smallDoses':return 1elif labelName == 'largeDoses':return 2datingDataMat,datingLabels = file2matrix('datingTestSet.txt')

【分析数据】绘制数据散点图

import matplotlib
import matplotlib.pyplot as pltdef arrColor(labels):arrColor = []for i in datingLabels:if i == 0:arrColor.append('r')elif i == 1:arrColor.append('y')elif i == 2:arrColor.append('g')  return arrColorfig = plt.figure(figsize=(8,20))
#plt.axis([-1,22,-0.1,1.8])
ax1 = fig.add_subplot(311)
ax1.scatter(datingDataMat[:,0],datingDataMat[:,1],c = arrColor(datingLabels))ax2 = fig.add_subplot(312)
ax2.scatter(datingDataMat[:,1],datingDataMat[:,2],c = arrColor(datingLabels))ax3 = fig.add_subplot(313)
ax3.scatter(datingDataMat[:,0],datingDataMat[:,2],c = arrColor(datingLabels))plt.show()

由数据两两对比的三幅散点图分布可知，取第一列和第二列为x，y轴绘制散点图（图一）时，三种类型的人基本分属于不同的区域。

注:用scatter绘制散点图时，当数据在列表中未分类时，无法按照颜色给出图例。
想要显示图例，需对数据进行分类，然后分别用不同的scatter绘制，则可有不同分类的图例。

import matplotlib.font_manager as fm
myfont = fm.FontProperties(fname='C:/Windows/Fonts/msyh.ttf')def showClassify(datingDataMat,datingLabels,x,y,x_name='',y_name=''):type1_x = []type1_y = []type2_x = []type2_y = []type3_x = []type3_y = []for i in range(len(datingLabels)):if datingLabels[i] == 0:type1_x.append(datingDataMat[i][x])type1_y.append(datingDataMat[i][y])if datingLabels[i] == 1:type2_x.append(datingDataMat[i][x])type2_y.append(datingDataMat[i][y])if datingLabels[i] == 2:type3_x.append(datingDataMat[i][x])type3_y.append(datingDataMat[i][y])fig = plt.figure()plt.xlabel(x_name,fontproperties=myfont)plt.ylabel(y_name,fontproperties=myfont)#plt.title("pythoner.com",fontproperties=myfont)ax = fig.add_subplot(111)type1 = ax.scatter(type1_x,type1_y,c = 'r')type2 = ax.scatter(type2_x,type2_y,c = 'y')type3 = ax.scatter(type3_x,type3_y,c = 'g') ax.legend((type1, type2, type3), (u'不喜欢', u'魅力一般', u'极具魅力'),loc=2,prop=myfont)plt.show()showClassify(datingDataMat,datingLabels,0,1,u'每年获取的飞行常客里程数',u'玩视频游戏所耗时间百分比')

【准备数据】归一化特征值函数

def autoNorm(dataSet):minValues = dataSet.min(0)maxValues = dataSet.max(0)ranges = maxValues - minValuesnormDataSet = np.zeros(np.shape(dataSet))m = dataSet.shape[0]normDataSet = dataSet - np.tile(minValues,(m,1))normDataSet = normDataSet / np.tile(ranges,(m,1))return normDataSet, ranges, minValuesnormMat, ranges, minValues = autoNorm(datingDataMat)

array([[ 0.44832535,  0.39805139,  0.56233353],[ 0.15873259,  0.34195467,  0.98724416],[ 0.28542943,  0.06892523,  0.47449629],..., [ 0.29115949,  0.50910294,  0.51079493],[ 0.52711097,  0.43665451,  0.4290048 ],[ 0.47940793,  0.3768091 ,  0.78571804]])

k-近邻算法

import operator
def classify0(inX, dataSet, labels, k):dataSetSize = dataSet.shape[0]diffMat = np.tile(inX,(dataSetSize,1)) - dataSetsqDiffMat = diffMat**2sqDistances = sqDiffMat.sum(axis=1)distances = sqDistances**0.5sortedDistIndicies = distances.argsort()classCount = {}for i in range(k):voteIlabel = labels[sortedDistIndicies[i]]classCount[voteIlabel] = classCount.get(voteIlabel,0)+1sortedClassCount = sorted(classCount.items(),key=operator.itemgetter(1),reverse=True)return sortedClassCount[0][0]classify0([ 0.28542943,  0.06892523,  0.47449629],normMat,datingLabels,5)

【测试算法】测试错误率函数

def datingClassTest():hoRatio = 0.1datingDataMat,datingLabels = file2matrix('datingTestSet.txt')normMat, ranges, minValues = autoNorm(datingDataMat)m=normMat.shape[0]numTestVecs = int(m*hoRatio)errorCount = 0.0for i in range(numTestVecs):classifierResult = classify0(normMat[i,:],normMat[numTestVecs:,:],datingLabels[numTestVecs:],5)if(classifierResult != datingLabels[i]):errorCount += 1.0print("the total error rate is: %f" % (errorCount/float(numTestVecs)))datingClassTest()

the total error rate is: 0.040000

【使用算法】采集数据并输出预测结果

def classifyPerson():resultList = ['not at all','in small doses','in large doses']percentTats = float(input("percentage of thime spent playing video games?"))ffMiles = float(input("frequent flier miles earned per year?"))iceCream = float(input("liters of ice cream consumed per year?"))datingDataMat,datingLabels = file2matrix('datingTestSet.txt')normMat, ranges, minValues = autoNorm(datingDataMat)inArr = np.array([ffMiles,percentTats,iceCream])classifierResult = classify0((inArr-minValues)/ranges,normMat,datingLabels,5)print("You will probably like this person:",resultList[classifierResult])classifyPerson()

percentage of thime spent playing video games?8
frequent flier miles earned per year?40000
liters of ice cream consumed per year?0.95
You will probably like this person: in large doses

本文标签： KNN

版权声明：本文标题：kNN 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.roclinux.cn/b/1687981684a164864.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

Linux大棚 – 不忘初心的技术博客，浮躁时代的安静角落

kNN

kNN

kNN_约会网站匹配效果改进

【准备数据】数据处理函数

【分析数据】绘制数据散点图

【准备数据】归一化特征值函数

k-近邻算法

【测试算法】测试错误率函数

【使用算法】采集数据并输出预测结果

更多相关文章

#KNN

kNN

Python 利用SVM,KNN,随机森林进行预测

Re48：读论文 kNN-LMs Generalization through Memorization: Nearest Neighbor Language Models

发表评论

推荐文章

深度解析：如何快速、安全地删除System Volume Information

未处理的“System.Runtime.InteropServices.COMException”类型的异常出现在 comlayout.exe 中。其他信息: 检索 COM 类工厂中 CLSID 为 {0EC8CCC8-EBED-495E-9A9F-313

电脑安全设置：如何为Windows、Mac和Linux设置及取消密码

电脑网页打不开但有网的原因及解决方法_电脑有网,但是打不开网页

轻松跨越NOD32版本障碍 - 一键直达最新版本

热门文章

解开‘域’与‘地址’的神秘面纱：网络通信中的关键概念

修复 Windows PC 上的 Steam 错误代码编程_steam游戏启动失败53

关于电脑开（WIFI、蓝牙）热点其他设备无法连接的问题_一台电脑连接另一台电脑的热点

无法修改IE浏览器主页解决方案(主页绑架)_以下进程试图自动修改ie主页,已阻止

电脑缺失msvcp100.dll怎么修复，msvcp100.dll丢失故障解决方法_msvcp100文件

Linux下将swf文件用浏览器读取打开（html、html5嵌入swf格式文件）_linux打开swf

Adobe Flash Player进阶之路：从基础到精通

丢失的Flash动画文件？EasyRecovery帮你找回，就像魔法一样

成为Project 2013高手：全方位项目管理秘籍

USB2.0设备识别和传输效率问题，如何系统排查？

最新文章

一文教会你AIX系统备份：mksysb实用指南

SWF文件备份失败？这些步骤让你轻松搞定

Win10系统备份轻松搞定：掌握captureimage命令的关键技巧

Linux系统安全小贴士：掌握备份与恢复，安心每一天

省时省心！三步完成电脑系统高效备份！

Ubuntu系统维护秘籍：备份步骤详解，保护你的劳动成果！

Linux系统不哭：高效备份与快速恢复方案

Ubuntu系统安全大计，备份技巧大公开

GHOST教程：系统备份和还原，小白也能变成高手！

Linux备份与恢复必修课：SWF文件安全策略从入门到精通

Exploring the Finest Accommodations: A Comprehensive Guide to Ruston LA Hotels

The Enchanting Experience of ScaliniTella NYC: A Culinary Gem in the Heart of Manhattan

Exploring the Exquisite Aloft Chicago O'Hare: A Blend of Modern Luxury and Convenience

A Culinary Journey: Discovering the Finest Dining Experiences in Waco, TX

A Culinary Journey: Discovering the Finest Dining Experiences in Athens, GA

电脑设备管理器在哪里？一次让我抓狂又兴奋的寻找经历

与GWX的持久战：一段关于Windows10升级弹窗的私人记忆

以管理员身份运行：那些年我们追过的权限与踩过的坑