admin 管理员组

文章数量: 1184232


2024年2月29日发(作者:制表格初入门)

DataMinKnowlDiscDOI10.1007/s10618-014-0360-3DetectinganomalycollectionsusingextremefeatureranksHanboDai·FeidaZhu·Ee-PengLim·HweeHwaPangReceived:24February2013/Accepted:4June2014©TheAuthor(s)2014AbstractDetectinganomalycollectionsisanimportanttaskwithmanyapplications,omalycollection,entitieult,,membersofananomalycollectionareeoduceanovelanomalydefioseanewmeasureofanomaloecanbealargenumberofERACsofvarioussizes,forsimplicity,wefirstinvestigatetheERACdetectionproblemoffindingtop-KERACsofapredefitacklethefollow-upERACexpansionproblemofuncoveringthesupersetthmsareproposedforbothERACdetectionandexpansionproblems,fically,insyntheticdatasets,bothERACdetsibleeditor:(B)TheSchoolofComputerScienceandInformationEngineering,HubeiUniversity,Wuhan,Chinae-mail:daihanbo@··eSchoolofInformationSystems,SingaporeManagementUniversity,Singapore,Singaporee-mail:fdzhu@-mail:eplim@-mail:hhpang@123

spamdataset,bothERACdetectionandexpansionalgorithDBdataset,bothERACdetectionandexpansionalgorithmsidentifyunusualactorcollectionsthatarenoteasilyidentifineseonlineforumdataset,ourERACdetectionalgorithmidentifiessuspicious“waterarmy”pansionalgorithmsuccessfullyrdsAnomalycollection·Extremefeaturerank·Anomalycluster·Outliergroup·Spamdetection·Spamcluster1IntroductionAccordingtoBarnettandLewis(1994),ananomalyoroutlierisadatainstanceorsubsetofdatainstaral,ananomalycanbeclassifianomalyusuallyliesinasparseregionorisfarawayfromnormalones,whereasananomalycollectionareformedbysimilarentities,tice,thisinconsistencyoftenimpliesdifferentagendaspaper,wedetectanomalycollectionsbytheirextremefeatureranks,basedontheobservationthatmembersinananomalycollrtedbyFetterlyetal.(2004),Castilloetal.(2007)andGyöngyietal.(2004),mple,theystuffthepagsogeneratepagesfromsimilartemplatesontheflyinordertoperform“linkspam”.Asaresult,whenmeasuredbythosecharacteristicsorfeatures,spammerhostsconsistentlydemonstrateveryextremetraitsandformanidentifiableanomalouscollection,strate,Fig.1shows30webhosts{e0,...,e29}withthreehostfeatures{f0,f1,f2},reflectingtheaforementionedspammingstrategies:f0representstheaveragenumberofpopularkeywords,f1isthevarianceofthewordcount,hfeature,henidentify{e5,e7,e12}6e29e20e21Fig.1AnexampleofERAC.30entities{e0,...,e29}arerankedaccordingtoeach3features{f0,f1,f2}.Inthisexample,{e5,e7,e12}isanERAC123

Detectinganomalycollectionsf0andf2,tthate5,e7ande12collectivelydisplayextremetraitsacrossthherexample,groupsoffraudulentulentuserwouldcreatesufficientlow-pricetransactionswithotheraccompliceaccountsinashorttimeinordertogaincredibility,beforeperformingfraudtransactionsinvolvinglargesumsofmoneyaccordingtoChuaandWareham(2004)andPanditetal.(2007).Consequently,theyarelikelytorankatextremepositionswithrespeerstudythiskindofanomalycollections,weproposeanoveldefinition,isanentitysubsetclusteredtowardthetoporbottomranks,annotbeeasilydetectedbyexistinganomalydetectionapproaches,becausetheyeitherfocusonsinglepointanomalies,ortatasetofsinglepointanomaliesdoesnotalwaysformanERAC,mple,inFig.1,e12isnotveryextremebyitselfalthoughitispartofanERAC{e5,e7,e12}.Incontrast,e8isveryanomalousasasingleentity,sinceitappearsatextremepositionsonallthreefeatures,hereforecannotbediscoverctERACs,Daietal.(2012)malousnessofanERACisquantifihelargenumberofERACsofvarioussizes,Daietal.(2012)tackletheERACdetectionproblemofdiscoveringtop-KERACswithapredefinedsizelimit,whichissettosmallvaluesforeffiheless,afterbeingofferedwiththetopERACsofapredefinedsize,usersmaywanttomple,inthewebspamcase,usersmayfindthedetectedERAC{e5,e7,e12}ofinterestastheyhavethecom-monspammingstrategyofusinglotsofpopularkeywords,withverylittlevarianceonthewordcount,turaltoask,canwedetectthesupersetofthisERACthatareevenmoreanomalouswithsimilarsetsofspammingstrategies?Therefore,inthispaper,wenotonlyexploretheERACdetectionproblem,butalsoproposetheERACexpansionproblemtouncoverthesupersettheERACdetectionproblem,ERACexpansionisdonewithoutpredefiarizeourcontributionsasfollows:–Wearethefiuretheanomalousnessofacollectionbyhowextremelyrankeditiswithrespecttoanyfeatureset.–Wedevelopbothexactandheuristicalgorithmstofindthetop-Kanomalouscol-lectionsofapredefinedsizelimitondifferentpruningstrategiesunderthefeature123

l.––––providealgorithosethefollow-upproblemdesignefficientgreedyalgoritoseanexploratoryschemeforsearchingERACs,makinguseoyourERACdetectionapproachonsyntheticdatasetswithinjectedERACsandonthreerealdatasetsincludingawebhostgraph,heticdatasets,ourproposedheuristicalgorithmscaleswellwiwebspamdatasetwithlabeledtruespammers,ourapproachdiscoversspammercollectionsthataremoreanomalouswhileachievinghigherprecisions,comoviedataset,wedetectunusuaaluationshowsourapproachsuccessfullyfiultsdemon-stratethatinthesyntheticdatasets,theinjectedERACsareretrievedwithhighsuccessrate;inwebhostdataset,thealgorithmachieveshigherprecisionthanexistingmethods;inthemoviedataset,theexpansionrevealslargeranomalousactorcollectionsthatcannotbediscoveredbytheclustering-basedapproach;intheChineseonlineforumdataset,theexpantroducingtheproblemformulationinSects.3and4presentsourERy,Sect.8concludesthepaperanddiscusseslimitationsandfeaturework.2RelatedworkAccordingtoasurveybyChandolaetal.(2009),ulofapproachesareproposedsuchasaclassification-basedonebyCastilloetal.(2007),adistance-basedonebyKnorrandNg(1998),adensity-basedonebyBreunigetal.(2000)andclusteringbasedonesbyEsteretal.(1996)andGuhaetal.(1999).Sincetheseapproachesassumethatanomaliesappearinsparseregionsorarefarawayfromthenormalentities,anomalouscollectal.(2009),Heetal.(2003)andLoureiroetal.(2004)useclusteringbasedapproachforanomalouscollectiondetection,assumingthatnormalentitiesbelongtolargeanddenseclusters,ingtoDuanetal.(2009)andHeetal.(2003),anomalousclustersarethesmalleronesthat123

Detectinganomalycollectionstogetherconstitutelessthan10%roetal.(2004)r,theassumptionthrmore,l.(2010)kassumesthatafterdatapointsareprojectedtosomehyperplane,theanomalouspointsfollowdistributr,theseassumptionsdonotalwayshold,-Castroetal.(2011)useastatisticalmodeltodetectinagraphananomalycollectioninwhicrktakestheassumptionthatonlyoneanomalousclusterexistsinthewholegraph,inganomalouscollectionsisalsorelatedtothetaskofsubgroupdiscoveryproposedbyKlösgen(1996)andWrobel(1997).AsurveydonebyHerreraetal.(2011)summarizesthetaskastodiscoverthesubgroupsoftheentitypopulationthatarestatistically“mostinteresting”groupsareinducedbyrules,r,theanobecause,(i)notallmembersofananomalycollectionsatisfyrules;(ii)er,theanomalousnessweusetomeasureananomalycollectiondoesnotinvolvetheclasslabels,whereastheinterestingnessmeasureusedforsubgraphdiscoveryusuallydoes.3ExtremerankanomalouscollectionDaietal.(2012)haveshownthattheanomalousnessofacollectionisbettermeasureddirectlyatcollectionlevel,insteadofmeasuringindivadoptthatdefinotetheuniversalentityset,(e).1,rank(e8)=1andrank(e7)=emityindexrreferstoanextremeregion,andSf(r)ef,Table1NotationsNotationESRankf(e)pf(S,r)rf(S)presentativerofSonfNotationFrSf(r)

本文标签: 表格 作者 入门

更多相关文章

积木编程安卓app入门 —— 5 分钟学会 App Inventor

3月前

不用怀疑,学习App Inventor就是这么简单。花费5分钟,就能学会App Inventor。 1分钟了解App Inventor App Inventor 2 简称 ai&#xf

Kali Linux实战:用Social Engineering Toolkit(SET)打造钓鱼攻击!黑客技术零基础入门到精通教程!

3月前

社会工程攻击是一种利用“社会工程学” (Social Engineering)来实施的网络攻击行为。比如免费下载的软件中捆绑了流氓软件、免费音乐中包含病毒、钓鱼网站、垃圾电子邮件中包括间谍软件等都是近来社会工程学的代表应用。Kali Li

windows10 Carla0.9.13安装与入门

3月前

前言最近需要找一款能够模拟车辆的仿真软件,用来验证自动驾驶算法。在各种搜索引擎百度、Google、bing查找这类关键字,发现大部分都是给出了perscan、carsim、gazebo的答案。

linux世界中的10大经典病毒全分析,从零基础入门到精通,收藏这一篇就够了!

3月前

linux世界中的10大经典病毒全分析 今天收集整理了linux世界中的10大病毒的特点及影响。Linux系统由于其高度的安全性和开源特性,比起Windows和其他操作系统,病毒和恶意软件的感染案

实战派 S3:最适合 AI 入门的 ESP32-S3 开发板?

3月前

一块百元开发板,如何撬动你的第一个嵌入式AI项目?🧠💡你有没有过这样的经历:兴致勃勃地想做个“智能门铃”&#xff

MasterCAM软件从入门到精通教程

3月前

本文还有配套的精品资源,点击获取简介:MasterCAM是一款集计算机辅助设计(CAD)和计算机辅助制造(CAM&

【新】Rust入门:环境搭建并跑出第一行代码

3月前

本文介绍了 Rust 开发环境的搭建方法,推荐用 Visual Studio Code(VS Code),也提及 Eclipse 适合新手&#x

零基础入门Rust:Comprehensive Rust项目Day 1完全攻略

3月前

零基础入门Rust:Comprehensive Rust项目Day 1完全攻略 【免费下载链接】comprehensive-rust 这是谷歌Android团队采用的Rust语言课程,它为你提供了

Windows 系统从入门到精通:一份全面的学习指南

3月前

作为全球用户量最多的桌面操作系统,Windows 凭借其直观的图形界面、丰富的软件生态和强大的兼容性,成为个人电脑和企业办公的首选。无论是计算机专业的学生、刚接触电脑的新手,

Python机器学习:从入门到精通

3月前

目录第一部分:思想与基石——万法归宗,筑基问道第1章:初探智慧之境——机器学习世界观1.1 何为学习?从人类学习到机器智能1.2 机器学习的“前世今生”

Arduino Mixly入门到精通教程

3月前

目录 1、介绍 2.实验器材和相关资料下载链接 3. Uno Plus 开发板和米思齐软件 第1小节  简单介绍 Uno Plus 开发板 第2小节 Uno Plus 开发板的驱动安装方法 第3小节 brick shield 传

Python从入门到快速精通模型算法(六十):人工智能和机器学习概述

3月前

目录 基本概念 监督学习和非监督学习 特征向量和特征工程 距离度量 机器学习的定义和应用领域 机器学习实施步骤 Scikit-learn介绍 所谓“人工智能”通常是泛指让机器具有像人一样的智慧的技术,其目

4、Windows 7使用指南:从入门到精通

3月前

Windows 7使用指南:从入门到精通1. Windows 7简介Windows 7是一款备受期待的计算机操作系统,它带来了看似简单却功能强大的计算体验。其用户界面优雅,若你熟悉Windows Vista,会发现它在原有基础上进行

13、Linux环境配置与vi编辑器入门指南

2月前

Linux环境配置与vi编辑器入门指南1. 激活环境配置更改当对.bashrc文件进行修改后,这些更改不会立即生效。因为.bashrc文件仅在会话开始时被读取,所以需要关闭当前终端会话并开启一个新的会话,更改才会生效

新手必看:USB转232驱动安装入门指引

2月前

从“未知设备”到稳定通信:手把手教你搞定USB转232驱动安装你有没有遇到过这样的场景?手里拿着一条USB转232线,准备调试一块STM32开发板或一台老式PLC&

WORD中表格的跨页显示

2月前

大家在使用WORD时行文档编辑的时候,会遇到这样的情况,本来一个页面中还有很多空间,但由于表格要比空间略长,所以表格就自动挪到了下一页&#xff

Wireshark 使用教程 | 入门、进阶、实战

2月前

注:本文为 “wireshark 使用” 教程合辑。 原文资料有点老,但原理是一样的。 图片清晰度受引文原图所限。 略作重排,如有内容异常,请看原文。

微信内置浏览器导出Excel表格功能

2月前

最近做项目的时候遇到了这样一个问题,微信内置的浏览器把下载这个功能屏蔽了。唉,,,折腾了一天,从网上各种找资料&#xff0

Adobe Flash Player中,验证码图片没反应了?解决步骤详解。

1月前

--  作者:cooldiy--  发布时间:2004-8-24 10:29:34--  SP2无法显示验证码。。。求教!~ SP2安装以后,打开一些网页时,无法显示验证码,这该怎么办呀???--  作者

UEFI技术宝典:新手到高手的完美跳板

1月前

作者简介罗冰:系统安全(特别是物理隔离领域)专家,主导开发网络隔离卡、双网隔离机、国产隔离系统、单向光传输等各类安全产品,拥有十几项发明和实用新型专利。致力于UEFI技术的研究、实践和推广,在CSDN和知乎上设有“UEF

发表评论

全部评论 0
暂无评论