DBSCAN Clustering Algorithm Based On Density

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2020 7th International Forum on Electrical Engineering and Automation (IFEEA)

'%6&$1&OXVWHULQJ$OJRULWKP%DVHGRQ'HQVLW\

'LQJVKHQJ'HQJ 
6LFKXDQ8QLYHUVLW\1DWLRQDOLWLHV.DQJGLQJ6LFKXDQ&KLQD

&RUUHVSRQGLQJDXWKRU¶VHPDLOGGV#VFXQHGXFQ

Abstract²&OXVWHULQJ WHFKQRORJ\ KDV LPSRUWDQW DSSOLFDWLRQV VDOHVQHHGWRXQGHUVWDQGWKHFKDUDFWHULVWLFVRISHRSOHZKRQHHG
LQ GDWD PLQLQJ SDWWHUQUHFRJQLWLRQ PDFKLQHOHDUQLQJDQGRWKHU WREX\KRXVHV$FFRUGLQJWRWKHFKDUDFWHULVWLFVRISDWLHQWVZLWK
2020 7th International Forum on Electrical Engineering and Automation (IFEEA) | 978-1-7281-9627-5/20/$31.00 ©2020 IEEE | DOI: 10.1109/IFEEA51475.2020.00199

ILHOGV +RZHYHU ZLWK WKH H[SORVLYH JURZWK RI GDWD WUDGLWLRQDO WKH VDPH GLVHDVH PHGLFDO VFKRODUV ZLOO WDNH PHDVXUHV WR GHDO
FOXVWHULQJDOJRULWKPLVPRUHDQGPRUHGLIILFXOWWRPHHWWKHQHHGV ZLWKWKHGLVHDVHRUSURSRVHDFXUH&OXVWHUDQDO\VLVLVQHHGHGLQ
RI ELJ GDWD DQDO\VLV +RZ WR LPSURYH WKH WUDGLWLRQDO FOXVWHULQJ DOO RI WKHVH FDVHV 7KH H[SORVLRQ RI VWRUHG DQG WUDQVLHQW GDWD
DOJRULWKP DQG HQVXUH WKH TXDOLW\ DQG HIILFLHQF\ RI FOXVWHULQJ KDV VSXUUHG D WKLUVW IRU LQWHOOLJHQW GDWDSURFHVVLQJ WRROV DQG
XQGHU WKH EDFNJURXQG RI ELJ GDWD KDV EHFRPH DQ LPSRUWDQW FXWWLQJHGJH WHFKQRORJLHV 1HZ WHFKQRORJLHV DQG WRROV WKDW
UHVHDUFK WRSLF RI DUWLILFLDO LQWHOOLJHQFH DQG ELJ GDWD SURFHVVLQJ
NHHS SDFH ZLWK 7KH 7LPHV FDQ WXUQ PDVVLYH DPRXQWV RI GDWD
7KH GHQVLW\EDVHG FOXVWHULQJ DOJRULWKP FDQ FOXVWHU DUELWUDULO\
LQWRLQIRUPDWLRQDQGNQRZOHGJHWKDWVXSSRUWVGHFLVLRQPDNLQJ
VKDSHG GDWD VHWV LQ WKH FDVH RI XQNQRZQ GDWD GLVWULEXWLRQ
'%6&$1LVDFODVVLFDOGHQVLW\EDVHGFOXVWHULQJDOJRULWKPZKLFK
LQDQLQWHOOLJHQWZD\7KHUHIRUHZLWKWKHULVHRIWKHFRQFHSWRI
LV ZLGHO\ XVHG IRU GDWD FOXVWHULQJ DQDO\VLV GXH WR LWV VLPSOH DQG ELJGDWDLQUHFHQW\HDUVGDWDPLQLQJKDVDOVREHHQDSSOLHGLQD
HIILFLHQW FKDUDFWHULVWLFV 7KH SXUSRVH RI WKLV SDSHU LV WR VWXG\ ZLGHU UDQJHRI ILHOGV&OXVWHULQJLVRQHRI WKH PRVWFRPPRQO\
'%6&$1FOXVWHULQJDOJRULWKPEDVHGRQGHQVLW\7KLVSDSHUILUVW XVHG DOJRULWKPV LQ GDWD PLQLQJ &OXVWHULQJ KDV EHHQ ZLGHO\
LQWURGXFHV WKH FRQFHSW RI '%6&$1 DOJRULWKP DQG WKHQ FDUULHV XVHGLQLPDJHSURFHVVLQJPDUNHWUHVHDUFKSDWWHUQUHFRJQLWLRQ
RXW SHUIRUPDQFH WHVWV RQ '%6&$1 DOJRULWKP LQ WKUHH GLIIHUHQW DQG GDWD DQDO\VLV >@ 7KH SXUSRVH RI XVLQJ FOXVWHULQJ
GDWD VHWV %\ DQDO\]LQJ WKH H[SHULPHQWDO UHVXOWV LW FDQ EH DOJRULWKPLVWRGLVFRYHUFOXVWHUVLQGDWDZKLFKLVDFROOHFWLRQRI
FRQFOXGHGWKDW'%6&$1DOJRULWKPKDVKLJKHUKRPRJHQHLW\DQG GDWD REMHFWV 7KH SURSHUW\ RI FODVV LV WKDW REMHFWV LQ WKH VDPH
GLYHUVLW\ZKHQLWSHUIRUPVSHUVRQDOL]HGFOXVWHULQJRQGDWDVHWVRI VHWDUHPRVWVLPLODUWRHDFKRWKHU$PRQJWKHPGHQVLW\EDVHG
QRQXQLIRUP GHQVLW\ ZLWK EURDG YDOXHV DQG JUDGXDOO\ VSDUVH FOXVWHULQJDOJRULWKPLVZLGHO\XVHGIRUGDWDFOXVWHULQJDQDO\VLV
IRUZDUGV:KHQWKH'%6&$1DOJRULWKP
VQHLJKERUKRRGGLVWDQFH GXHWRLWVVLPSOHDQGHIILFLHQWFKDUDFWHULVWLFV,WFDQEHXVHGWR
HSVLVFODVVHVDUHJHQHUDWHGDIWHUFOXVWHULQJ FOXVWHUDUELWUDULO\VKDSHGGDWDVHWV'%6&$1DOJRULWKPXVHVWKH
JLYHQFOXVWHULQJUDGLXVDQGGHQVLW\WKUHVKROGWRUDQGRPO\VHOHFW
Keywords-DBSCAN Algorithm; Density Clustering; Machine FRUH SRLQWV WR FRQGXFW FOXVWHULQJ LQ WKH PDQQHU RI
Learning;Algorithm Research QHLJKERUKRRGH[SDQVLRQ

, ,1752'8&7,21 7KLV SDSHU ILUVW LQWURGXFHV WKH FRQFHSW RI '%6&$1
DOJRULWKPDQGWKHQFDUULHVRXWSHUIRUPDQFHWHVWVRQ'%6&$1
6LQFHWKHPLGGOHRIWKHWKFHQWXU\PDFKLQHOHDUQLQJKDV DOJRULWKP LQ WKUHH GLIIHUHQW GDWD VHWV %\ DQDO\]LQJ WKH
EHHQGHYHORSLQJ UDSLGO\LQWKHRUHWLFDO UHVHDUFK +RZHYHUGXH H[SHULPHQWDO UHVXOWV LW FDQ EH FRQFOXGHG WKDW '%6&$1
WRWKHODFNRIKDUGZDUHFRPSXWLQJSHUIRUPDQFHDQGGDWDLWKDV DOJRULWKP KDV KLJKHU KRPRJHQHLW\ DQG GLYHUVLW\ ZKHQ LW
QRWEHHQZLGHO\DSSOLHG>@:LWKWKHUDSLGGHYHORSPHQWDQG SHUIRUPV SHUVRQDOL]HG FOXVWHULQJ RQ GDWD VHWV RI QRQXQLIRUP
XSJUDGLQJ RI HFRQRPLF DQG WHFKQRORJ\ HOHFWURQLF SURGXFWV GHQVLW\ZLWKEURDGYDOXHVDQGJUDGXDOO\VSDUVHIRUZDUGV:KHQ
JUDGXDOO\ LQWR HYHU\RQH
V OLIH RQO\ WKLV NLQG RI SURGXFWV KDV WKH '%6&$1 DOJRULWKP
V QHLJKERUKRRG GLVWDQFH HSV LV 
EHFRPHDQHFHVVLW\RI3HRSOH
V'DLO\OLIHLQWKHSURFHVVRIWKH FODVVHVDUHJHQHUDWHGDIWHUFOXVWHULQJ
LQWHUDFWLRQ RI WKH SHRSOH ZLWK DOO VRUWV RI PDFKLQH KDV EHHQ
SURGXFHG D ODUJH DPRXQW RI GDWD SDYLQJ WKH ZD\ IRU WKH
,, '%6&$1&/867(5,1*$/*25,7+0
ZLGHVSUHDG XVH RI PDFKLQH OHDUQLQJ >@ 7KHUH DUH KXJH
VRFLDODQGFRPPHUFLDOYDOXHVEXULHGLQWKHKXJHGDWDJHQHUDWHG
A. DBSCAN Clustering Algorithm
DOO WKH WLPH DQG LW KDV EHFRPH WKH FRPPRQ JRDO RI ERWK
DFDGHPLF DQG LQGXVWULDO FLUFOHV WR GLJ RXW PRUH YDOXHV IURP 7KH EHKDYLRU RI JURXSLQJ VLPLODU GDWD REMHFWV LQ D
WKHVH GDWD 0DFKLQH OHDUQLQJ KDV JUDGXDOO\ HPHUJHG LQ WKH FROOHFWLRQ LQWR WKH VDPH FODVV LV XVXDOO\ FDOOHG FOXVWHULQJ DQG
FRPPHUFLDO DSSOLFDWLRQ RI GDWD PLQLQJ SURGXFHG UHPDUNDEOH WKHFROOHFWLRQRIGDWDREMHFWVLVFDOOHG&OXVWHU($FROOHFWLRQ
H[FHOOHQWUHVXOWVDQGKDVLPSRUWDQWFRPPHUFLDOYDOXHDQGKDV RIGDWDREMHFWVLVGLYLGHGLQWRDJURXSZKLFKFDQEHFRQVLGHUHG
JUDGXDOO\ EHFRPH DQ LPSRUWDQW VROXWLRQ IRU GDWD PLQLQJ >@ D IRUP RI GDWD FRPSUHVVLRQ 2QH RI WKH FKDUDFWHULVWLFV RI
&OXVWHULQJDOJRULWKPLVDQLPSRUWDQWWHFKQRORJ\LQWKHILHOGRI FOXVWHULQJ LV WKDW LW LV XQVXSHUYLVHG ZKLFK LV GLIIHUHQW IURP
PDFKLQH OHDUQLQJ ,W LV GHHSO\ XVHG LQ D ZLGH UDQJH RI GDWD FODVVLILFDWLRQZKLFKUHTXLUHVKLJKRYHUKHDGLQREMHFWPRGHOLQJ
PLQLQJ VFHQDULRV VXFK DV FRPPRGLW\ UHFRPPHQGDWLRQ $QRWKHU FKDUDFWHULVWLF RI FOXVWHULQJ LV LWV DGDSWDELOLW\ WR
QXPHULFDOSUHGLFWLRQSDWWHUQUHFRJQLWLRQDQGVRRQ>@ FKDQJHV LQ GDWD &OXVWHULQJ FDQ DXWRPDWLFDOO\ GLVFRYHU DQG
LGHQWLI\ WKH VSDUVH DQG GHQVH DUHDV RI WKH REMHFW GDWD VHW DQG
,QRXUGDLO\OLIHZHRIWHQHQFRXQWHUWKLVNLQGRIVLWXDWLRQ GLVFRYHU WKH SRWHQWLDO FRUUHODWLRQ EHWZHHQ WKH JOREDO
WKHPDUNHWHURIWKHVKRSSLQJPDOOZLOOSXW WKHJRRGVZLWKWKH GLVWULEXWLRQ DQG GDWD DWWULEXWHV ,Q WKH EXVLQHVV ILHOG FOXVWHU
KLJKHVWVDOHVYROXPHLQWKHVDPHSODFHLQRUGHUWRLQFUHDVHWKH DQDO\VLV KHOSV PDUNHW DQDO\VWV WR DQDO\]H DQG GHVFULEH WKH
SRVVLELOLW\RIWKHJRRGVEHLQJERXJKWDWWKHVDPHWLPH+RXVLQJ

978-1-7281-9627-5/20/$31.00 ©2020 IEEE 949


DOI 10.1109/IFEEA51475.2020.00199

licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on September 07,2022 at 21:47:02 UTC from IEEE Xplore. Restrict
FKDUDFWHULVWLFV RI FXVWRPHU JURXSV VR DV WR JDLQ DQ B. Density Peak Point Discovery
XQGHUVWDQGLQJ RI WKH FRQVXPSWLRQ GLUHFWLRQ FRQVXPSWLRQ 7KH QHLJKERUKRRG RI D GDWD SRLQW *LYHQ D PLQLPXP
DELOLW\ FRQVXPSWLRQ ZLOOLQJQHVV DQG VR RQ RI GLIIHUHQW QXPEHU RI SRLQWV PLQ3WV  WKH QHLJKERUKRRG FHQWHU RI GDWD
FXVWRPHU JURXSV &OXVWHU DQDO\VLV LV DOVR KHOSIXO IRU WKH SRLQW3LVGHILQHGDVDVHWRI1SGDWDSRLQWVFORVHVWWR3
DQDO\VLVRIJHRJUDSKLFDOGDWD7KHDQDO\VLVRIUDLQIDOOJHRORJ\
DQG RWKHU GDWD LV FORVHO\ UHODWHG WR WKH WLPHO\ SUHGLFWLRQ RI 7KHORFDOGHQVLW\RIDGDWDSRLQW*LYHQDPLQLPXPQXPEHU
QDWXUDOGLVDVWHUV Up
RI SRLQWV PLQ3WV  WKH ORFDO GHQVLW\ RI GDWD SRLQW 3 LV
'%6&$1 DOJRULWKP KDV JRRG FOXVWHULQJ UHVXOWV LQ WKH GHILQHGDV
DSSOLFDWLRQ ZKLFK LV D W\SLFDO UHSUHVHQWDWLYH RI GHQVLW\
DOJRULWKP 7KH DOJRULWKP KDV WKH FKDUDFWHULVWLF RI ILQGLQJ DQ\ Up PLQ Pts  PD[ Dist p x
xN p
VKDSH FODVV LQ WKH GDWD VHW 7KH '%6&$1 DOJRULWKP UHTXLUHV   
WKH XVHU WR VHW WKH JOREDO FRQVWDQW SDUDPHWHU QHLJKERUKRRG
GLVWDQFH DQG WKUHVKROG EHIRUH UXQQLQJ 7KH QHLJKERUKRRG N PLQ P
GLVWDQFH VHWV WKH UDGLXV RI WKH QHLJKERUKRRG UDQJH RI WKH :KHUH p LV S
V QHLJKERUKRRG FRQWDLQV ts GDWD

VDPSOH SRLQW 7KH WKUHVKROG LV VHW E\ FKDQJLQJ WKH PLQLPXP SRLQWVFORVHVWWR3DQG'LVW LVWKHGLVWDQFHIXQFWLRQ,QRUGHU
QXPEHURIVDPSOHSRLQWVZLWKLQWKHUDGLXVRIWKHQHLJKERUKRRG WR LQWURGXFH WKH DOJRULWKP LQ WKLV SDSHU WKH GLVWDQFH EHWZHHQ
UDQJHWREHPDUNHGDVDFRUHSRLQW,WLVZRUWKQRWLQJWKDWWKH GDWD SRLQWV LV (XFOLGHDQ GLVWDQFH $FFRUGLQJ WR IRUPXOD
QHLJKERUKRRGGLVWDQFHDQGWKUHVKROGDUHERWKFRQVWDQWVZKLFK PD[ Dist p x
xN
KDYH EHHQ VHW EHIRUH WKH SURJUDP VWDUWHG DQG GR QRW FKDQJH   p
LVWKHORFDOUDGLXVRI3 (36 7KHODUJHU
DIWHUVHWWLQJ'%6&$1DOJRULWKPLVLQVHQVLWLYHWRQRLVHSRLQWV WKHORFDOGHQVLW\RI3LVWKHVPDOOHUWKHORFDOUDGLXVLVWKHPRUH
DQGFDQEH DSSOLHG WR GDWD VHWVFRQWDLQLQJQRLVHSRLQWV ,WFDQ PLQLPXPSRLQWVFDQEHPHW,QRUGHUWRPHDVXUHWKHGLIIHUHQFH
LGHQWLI\QRLVHSRLQWVDQGH[FOXGHWKHPIURPFOXVWHULQJUHVXOWV EHWZHHQ SHDN GHQVLW\ SRLQW DQG RWKHU FRUH SRLQWV WKH
)RU HDFK FODVV LQ WKH FOXVWHULQJ UHVXOW WKH GHQVLW\ ZLWKLQ WKH Gp
FODVV LV KLJKHU WKDQ WKH GHQVLW\ DW WKH HGJH RI WKH FODVV 7KH GLIIHUHQFHPHWULF RIGDWDSRLQW 3 LVGHILQHGDV
GHQVLW\ RI WKH QRLVH SRLQW LV ORZHU WKDQ WKDW RI WKH HGJH
$FFRUGLQJWRWKHGDWDGLVWULEXWLRQFKDUDFWHULVWLFVWKHDOJRULWKP Gp PLQ Dist p x
Ux !U p
XVHVWKHGHQVLW\GLIIHUHQFHWRLGHQWLI\GLIIHUHQWGHQVLW\UHJLRQV   
DQGPDUNVWKHFOXVWHULQJUHVXOWV
G
7KHDOJRULWKPVWHSVDUHDVIROORZV 7KDWLV p LVWKHPLQLPXPGLVWDQFHRIDOOSRLQWV1SZLWKD
ORFDOGHQVLW\JUHDWHUWKDQSRLQW3:KHQWKHORFDOGHQVLW\RI3
(QWHU WKH PLQLPXP UDGLXV H DQG WKH PLQLPXP GHQVLW\ LV WKH ODUJHVW 3 LV PRVW OLNHO\ WR EH D GHQVLW\ FRUH SRLQW DQG
WKUHVKROGPLQS Gp
WKHQ LVGHILQHGDV
6HTXHQWLDO UHDGLQJ RI GDWD LQWR D WH[W ILOH 7KH GDWD LV
VHTXHQWLDOO\ UHDG LQWR D WH[W ILOH WKDW KROGV WKH RULJLQDO WZR Gp PD[ Dist p x
GLPHQVLRQDO;DQG<FRRUGLQDWHVRIWKHSRLQWVDQGVWRUHGLQWR xD   
D SRLQW/LVW WKDW KROGV WKH 3RLQW VWUXFWXUH ZKLFK UHFRUGV
LQIRUPDWLRQDERXWWKHLQSXWSRLQWV 
,,, (;3(5,0(17$/'(6,*12)'%6&$1$/*25,7+021
'HWHUPLQH LI WKH SRLQW LV D FRUH SRLQW 5HDG D SRLQW IURP ',))(5(17'$7$6(76
SRLQW/LVW UHDGLQRUGHU LIWKHSRLQWLVQRWPDUNHG QRWSDUWRI
DFOXVWHU WKHQFDOFXODWHWKHGLVWDQFHEHWZHHQWKHSRLQWDQGDOO A. Data Acquisition
RWKHUSRLQWVLIWKHGLVWDQFHEHWZHHQWKHWZRSRLQWVLVOHVVWKDQ ,Q RUGHU WR H[SHULPHQW WKH HIIHFW RI '%6&$1 DOJRULWKP
RUHTXDOWRWKHPLQLPXPUDGLXVHSXWWKHWZRSRLQWV HOLPLQDWH WKLV SDSHU VLPXOWDQHRXVO\ XVHV '%6&$1 DOJRULWKP FOXVWHULQJ
WKH VDPH SRLQWV  LQWR WKH WPS/VW DUUD\ DQG FRXQW ,I WKH RQWKUHHGDWDVHWV7KHVHWKUHHGDWDVHWVDUH'GDWDVHW5
GLVWDQFHEHWZHHQWZRSRLQWVLVJUHDWHUWKDQWKHPLQLPXPUDGLXV GDWDVHWDQGFUHGLWFDUGXVHUGDWDVHWUHVSHFWLYHO\'DQG5
HWKHQVNLSWKLVSRLQWDQGSURFHHGWRWKHQH[WSRLQW,QWKHHQG GDWDVHWVDUHVDPSOHSRLQWVREH\LQJ*DXVVLDQGLVWULEXWLRQDQG
ZKHQWKHWRWDOQXPEHULVJUHDWHUWKDQRUHTXDOWRWKHPLQLPXP FUHGLW FDUG XVHU GDWD VHWV DUH UHDO FRQVXPSWLRQ GDWD VHWV RI
GHQVLW\ WKUHVKROG WKH HOHPHQWV LQ WPS/VW DUH PDUNHG DV WHOHFRP FXVWRPHUV 7KH FKDUDFWHULVWLFV RI WKH GDWD VHW DUH
JURXSHG DQG WKH HOHPHQWV ZLWK H[FHVVLYH JURXSV DUH SXW LQWR VKRZQLQ7DEOH
WKH UHVXOW/LVW DUUD\ DV D FOXVWHU VWRUHG DV DQ HOHPHQW  ,I WKH
SRLQW LV PDUNHG WKH SRLQW LV VNLSSHG DQG WKH MXGJPHQW RI WKH
QH[WSRLQWFRQWLQXHV8QWLODOOWKHSRLQWVDUHWUDYHUVHGRQFH 7$%/(&+$5$&7(5,67,&62)'$7$6(7

0HUJHWKHFOXVWHUVDQGPHUJHWKHHOHPHQWVLQWKHUHVXOW/LVW 'DWDVHWQDPH 6L]H 1XPEHURIFODVVHV 'LPHQVLRQ


7KHFOXVWHULQJRIWKHFRUHSRLQWVLQWKHUHVXOW/LVWLVMXGJHGDQG '   
FRPSDUHG ,I \RX KDYH WKH VDPH HOHPHQW FRPELQH WKH WZR 5   
FOXVWHUV WKHFOXVWHUZKHUHWKHFRUHSRLQWLVORFDWHG WRIRUPD &UHGLWFDUGGDWDVHW   
QHZFOXVWHUDQGGRWKHVDPHXQWLOQRQHZFOXVWHUVDUHSURGXFHG
2XWSXWWKHUHVXOWRIFOXVWHULQJDQGQRLVHSRLQW

950

licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on September 07,2022 at 21:47:02 UTC from IEEE Xplore. Restrict
B. Experimental Environment $V FDQ EH VHHQ IURP )LJXUH  DIWHU WKH (SV SDUDPHWHU RI
8EXQWX XVLQJ /LQX[ V\VWHP KDV D &38RI JK] D '%6&$1DOJRULWKPLVUHGXFHGWKHUHVXOWFODVVDIWHUFOXVWHULQJ
KRVWPHPRU\RI*%DQGDKDUGGLVNFDSDFLW\RI*%7KH FDQ EH LQFUHDVHG ZKLFK LQFUHDVHV WKH GLYHUVLW\ WR D FHUWDLQ
GHYHORSPHQWODQJXDJHXVHVYHUVLRQRI3\WKRQDQGWKH,'( H[WHQW +RZHYHU IURP WKH YDOXHV RI VWDQGDUG GHYLDWLRQ DQG
GHYHORSPHQWHQYLURQPHQWXVHV3\FKDUP FRPPXQLW\ GLIIHUHQFH ZLWKLQ WKH UHVXOW FODVV WKH GLIIHUHQFH
ZLWKLQWKHUHVXOWFODVVLVYHU\ODUJHDQGXQVWDEOH,WVKRZVWKDW
WKH KRPRJHQHLW\ ZLWKLQ WKH FODVV LV QRW FRQVLGHUHG LQ WKH
C. Performance Indicators
FOXVWHULQJ SURFHVV DQG WKHUH LV VRPH UDQGRPQHVV XQGHU WKH
7KH UHVXOW FODVV VWDQGDUG GHYLDWLRQ DQG FRPPXQLW\ LQIOXHQFH RI LQLWLDOL]DWLRQ '%6&$1 DOJRULWKP LQFUHDVHV WKH
GLIIHUHQFH DUH XVHG LQ WKLV SDSHU IRU D UHVXOW FODVV LI WKH GLYHUVLW\ RI UHVXOWV E\ UHGXFLQJ (SV UHVXOWLQJ LQ WKH HIIHFW RI
VWDQGDUG GHYLDWLRQ DQG FRPPXQLW\ GLIIHUHQFH DUH VPDOO LW ODUJHLQWUDFODVVGLIIHUHQFHVRIUHVXOWFODVVHV
PHDQV WKDW WKH UHVXOW FODVV LV KLJKO\ KRPRJHQHRXV ,I WKH
GLIIHUHQFHEHWZHHQVWDQGDUGGHYLDWLRQDQGFRPPXQLW\LVVPDOO 7KH UDQJH RI HLJHQYDOXH UDQJH RI 5 GDWD VHW LV VPDOOHU
LWPHDQVWKDWWKHGDWDZLWKLQWKHUHVXOWFODVVIOXFWXDWHVJUHDWO\ WKDQWKDWRI'GDWDVHW,WFDQEHVHHQ IURPWKHDERYHWDEOH
DQGWKHKRPRJHQHLW\LVORZ WKDWWKHFOXVWHULQJUHVXOWVDUHFODVVHVZKLFKDUHWKHVDPHDV
WKH ELJ FDWHJRULHV DQQRWDWHG ZLWK WKH RULJLQDO GDWD LQGLFDWLQJ
WKDW'%6&$1DOJRULWKPLVH[FHOOHQWLQWKHFODVVLILFDWLRQRIELJ
,9 (;3(5,0(17$/5(68/762)'%6&$1$/*25,7+021
FDWHJRULHV EXW LW FDQQRW PHHW WKH FRQWUROODEOH LQFUHDVH RI
',))(5(17'$7$6(76 GLYHUVLW\DQGWKHFOXVWHULQJSURFHVVLVXQFRQWUROODEOH
A. Experimental Results of D31 and R15 Data Sets
B. Experimental Results of Credit Card User Data Set
$OO GDWD VHWV ZHUH SXW LQWR '%6&$1 DOJRULWKP $IWHU WKH
FOXVWHULQJRIWKHDOJRULWKPWKHFOXVWHULQJUHVXOWVZHUHFRXQWHG $IWHU WKH FOXVWHULQJ RI WKH FUHGLW FDUG XVHU GDWD VHW E\
'%6&$1DOJRULWKPZDVXVHGIRUH[SHULPHQWVRQ'DQG5 '%6&$1 DOJRULWKP DQG &($9'%6&$1 DOJRULWKP WKH
GDWD VHWV 7KH JHQHUDWLRQ RI HDFK FODVV LQ ' DQG 5 GDWD UHVXOWFODVVVWDWLVWLFVDIWHUWKHFOXVWHULQJRI'%6&$1DOJRULWKP
VHWV LV JDXVVLDQ DQG WKHUH DUH  FDWHJRULHV DQQRWDWHG LQ WKH DUHVKRZQLQ7DEOHDQG)LJXUH
RULJLQDO GDWD VHW RI ' 7KH FOXVWHULQJ UHVXOWV DUH VKRZQ LQ
7DEOHDQG)LJXUH 7$%/(&/867(5,1*5(68/76

 PHDQ VWG PLQ PHGLDQ PD[


7$%/(&/867(5,1*5(68/76 1XPEHURILQFODVV
    
VDPSOHSRLQWV
PHGLD 6WDQGDUGGHYLDWLRQRI
 PHDQ VWG PLQ
Q
PD[     
LQFODVVVDPSOHSRLQWV
1XPEHURILQFODVV  'LPHQVLRQ
        
VDPSOHSRLQWV  FRPPXQLW\GLIIHUHQFH
6WDQGDUGGHYLDWLRQRI 'LPHQVLRQ
       
LQFODVVVDPSOH   FRPPXQLW\GLIIHUHQFH
  
SRLQWV
'LPHQVLRQ 
FRPPXQLW\     
GLIIHUHQFH
'LPHQVLRQ

FRPPXQLW\    

GLIIHUHQFH



)LJXUH 5HVXOWFODVVVWDWLVWLFVDIWHU'%6&$1DOJRULWKPFOXVWHULQJ

:KHQ (SV WDNH  QHLJKERUKRRG GLVWDQFH RI '%6&$1


DOJRULWKP FOXVWHULQJ DOJRULWKP WR SURGXFH VL[ FODVVHV WKDW LV
WRROLWWOHDQGWKHUHVXOWVDUHDWDQHLJKERUKRRGRIQHDUD[LV
 ]HURJDWKHUHGLQGDWDVSDFH\RXFDQLPDJLQHIRUFKDUDFWHULVWLF
)LJXUH &OXVWHULQJUHVXOWV YDOXH IRU PRUH WKDQ  VDPSOHV WKH OHQJWK RI 
QHLJKERUKRRGGLVWDQFHLVWRRVKRUWKDUGFOXVWHULQJ&OXVWHULQJ
FDQ RQO\ EH FDUULHG RXW RQ GDWD EHORZ  :KHQ WKH IL[HG

951

licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on September 07,2022 at 21:47:02 UTC from IEEE Xplore. Restrict
QHLJKERUKRRG LV WRR VPDOO HIIHFWLYH FOXVWHULQJ FDQ RQO\ EH '%6&$1 DOJRULWKP ZH FRXQWHG WKH HYDOXDWLRQ GDWD RI WKH
FDUULHGRXWRQWKHDUHDFORVHWRWKHFRRUGLQDWHD[LVLQWKHVSDFH UHVXOWFODVVHVDVVKRZQLQ7DEOH
DQGWKHHIIHFWLYHFOXVWHULQJFDQQRWEHFDUULHGRXWRQWKHDUHDRI
WKHSULQFLSOHFRRUGLQDWHD[LV
7$%/(&/867(5,1*5(68/76
$GMXVW QHLJKERUKRRG RI '%6&$1 DOJRULWKP GLVWDQFH (SV
IRU  FRQWLQXH WR H[SHULPHQW DIWHU '%6&$1 DOJRULWKP RI  PHDQ VWG PLQ PHGLDQ PD[
1XPEHURILQFODVV
FOXVWHULQJ RXU VWDWLVWLFV FODVV DVVHVVPHQW GDWD ZKHQ WKH VDPSOHSRLQWV
    
QHLJKERUKRRG RI '%6&$1 DOJRULWKP IURP (SV LQ   6WDQGDUGGHYLDWLRQRI
FODVVHVDIWHUFOXVWHULQJDOJRULWKPWKHUHVXOWVRIWKHQXPEHURI LQFODVVVDPSOH     
FODVVHVLVPRUHWKDQZKHQ(SVWDNHEXWWKHWRWDOLVQRWPXFK SRLQWV
WKHQHLJKERUKRRGRIUHVXOWVWRGDWDJDWKHUHGLQWKHPLGGOH 'LPHQVLRQ
RIWKHVSDFH\RXFDQLPDJLQHIRUFKDUDFWHULVWLFYDOXHIRUPRUH FRPPXQLW\     
GLIIHUHQFH
WKDQVDPSOHVWKHOHQJWKRIQHLJKERUKRRGGLVWDQFHLV 'LPHQVLRQ
WRR VKRUW KDUG FOXVWHULQJ IRU HLJHQYDOXHV RI  WKH FRPPXQLW\     
IROORZLQJVDPSOHVWKHQHLJKERUKRRGGLVWDQFHZLWKDOHQJWKRI GLIIHUHQFH
 LV D OLWWOH ODUJH HVSHFLDOO\ IRU WKH VDPSOHV ZLWK DQ
HLJHQYDOXH OHVV WKDQ  WKH QHLJKERUKRRG GLVWDQFH ZLWK D 9 &21&/86,21
OHQJWK RI  FRPSOHWHO\ FRYHUV WKH GDWD VSDFH DQG WKH GDWD ,Q WKH HUD RI ELJ GDWD KXPDQ HOHFWURQLF LQWHUDFWLRQ LV
EHORZ DQG DURXQG  DUH FODVVLILHG LQWR RQH FDWHJRU\ :KHQ WUDQVIRUPHG LQWR D VHULHV RI GDWD ZKLFK FRQWDLQV JUHDW YDOXH
(SVLVVHWDWWKHGDWDFDQRQO\EHHIIHFWLYHO\FOXVWHUHGLQ 0DFKLQH OHDUQLQJ KDV VKRZQ H[FHOOHQW UHVXOWV LQ GDWD PLQLQJ
WKH UDQJH RI  WR  ZKHQ WKH QHLJKERUKRRG LV IL[HG DQGKDVJUDGXDOO\EHFRPHWKHPDLQWHFKQRORJ\RIGDWDPLQLQJ
WKHGDWDFDQRQO\EHHIIHFWLYHO\FOXVWHUHGLQDFHUWDLQUHJLRQRI +RZHYHUWKHODFNRIODEHOLQJGDWDLQDFWXDOSURGXFWLRQPDNHV
WKHVSDFHDQGWKHGDWDFDQQRWEHHIIHFWLYHO\FOXVWHUHGLQRWKHU XQVXSHUYLVHG OHDUQLQJ PRUH DGDSWDEOH &OXVWHULQJ DOJRULWKP LV
UHJLRQV $IWHU WKH FOXVWHULQJ RI '%6&$1 DOJRULWKP ZH DQ LPSRUWDQW WHFKQLTXH LQ XQVXSHUYLVHG OHDUQLQJ ,W LV ZLGHO\
FRXQWHG WKH HYDOXDWLRQ GDWD RI WKH UHVXOW FODVVHV DV VKRZQ LQ XVHG LQ PDQ\ VFHQDULRV VXFK DV FRPPRGLW\ UHFRPPHQGDWLRQ
7DEOH DQG QXPHULFDO SUHGLFWLRQ +RZHYHU LQ WKHVH VFHQHV WKH
QXPHULFDO UDQJH RI GDWD LVYHU\ZLGH DQG VRPH RI WKHPKDYH
7$%/(&/867(5,1*5(68/76 FXVWRPL]HG SHUVRQDOL]HG VHUYLFHV ZKLFK QRW RQO\ UHTXLUHV WKH
FOXVWHULQJDOJRULWKPWREHVXLWDEOHIRUWKHQRQXQLIRUPGHQVLW\
 PHDQ VWG PLQ PHGLDQ PD[ GDWDVHWZLWKQXPHULFDOYDVWGHQVLW\JUDGXDOO\VSDUVHEXWDOVR
1XPEHURILQFODVV QHHGVWRKDYHGLYHUVLILHGUHVXOWVDQGKLJKKRPRJHQHLW\&OXVWHU
    
VDPSOHSRLQWV DQDO\VLVSOD\VDQH[WUHPHO\LPSRUWDQWUROHLQGDWDPLQLQJDQG
6WDQGDUGGHYLDWLRQ
RILQFODVVVDPSOH     
FDQ PDNH D YHU\ LPSRUWDQW FRQWULEXWLRQ LQ D ODUJH QXPEHU RI
SRLQWV GDWD DQDO\VLV EXVLQHVVHV 1RZDGD\V WKH GDWD YROXPH LV
'LPHQVLRQ LQFUHDVLQJUDSLGO\VRLWLVXUJHQWWRLPSURYHWKHHIILFLHQF\DQG
FRPPXQLW\      UHOLDELOLW\ RI FOXVWHULQJ DOJRULWKP LQ WKH VWDJH RI FOXVWHULQJ
GLIIHUHQFH DQDO\VLV
'LPHQVLRQ
FRPPXQLW\     
GLIIHUHQFH $&.12:/('*0(17
(SV GR FRQWLQXH WR QHLJKERUKRRG RI '%6&$1 DOJRULWKP 7KLVSDSHUZDVILQDQFLDOO\VXSSRUWHGE\WKH.H\3URMHFWRI
GLVWDQFHDGMXVWPHQWLVVHWWRFRQWLQXHWRH[SHULPHQWDIWHU 1DWXUDO 6FLHQFH RI 6LFKXDQ 3URYLQFLDO (GXFDWLRQ 'HSDUWPHQW
'%6&$1 DOJRULWKP RI FOXVWHULQJ RXU VWDWLVWLFV FODVV 1R=$ $SSOLHG'HPRQVWUDWLRQ&RXUVH3URMHFW
DVVHVVPHQWGDWDZKHQWKH'%6&$1DOJRULWKPRI(SVLQ RI 6LFKXDQ 8QLYHUVLW\ IRU 1DWLRQDOLWLHV 1R VINF  DQG
WKH QHLJKERUKRRG GLVWDQFH FOXVWHULQJ DOJRULWKP XVLQJ WKH  .H\ 3URMHFW RI 1DWXUDO 6FLHQFH RI 6LFKXDQ 8QLYHUVLW\ IRU
FODVVHVWKHUHVXOWVRIWKHQXPEHURIFODVVHVLVPRUHWKDQZKHQ 1DWLRQDOLWLHV 1R;<=%=$ 
(SVIURPWREXWWKHWRWDOLVQRWPXFKWKHQHLJKERUKRRG
RIUHVXOWVLQWKHGDWDJDWKHUHGLQWKHDUHDRIWKHVSDFH\RX 5()(5(1&(6
FDQ LPDJLQH IRU FKDUDFWHULVWLF YDOXH IRU PRUH WKDQ  >@ /L 6 6  $Q ,PSURYHG '%6&$1 $OJRULWKP %DVHG RQ WKH 1HLJKERU
VDPSOHVWKHOHQJWKRIQHLJKERUKRRGGLVWDQFHLVWRRVKRUW 6LPLODULW\DQG)DVW1HDUHVW1HLJKERU4XHU\,((($FFHVV33
KDUGFOXVWHULQJULJKW)RUVDPSOHVZLWKDQHLJHQYDOXHOHVVWKDQ >@ /HH 6  $ +\EULG )UDPHZRUN XVLQJ )X]]\ LIWKHQ UXOHV IRU '%6&$1
 WKH QHLJKERUKRRG GLVWDQFH ZLWK D OHQJWK RI  LV D $OJRULWKP ,QWHUQDWLRQDO MRXUQDO RI FRPSXWDWLRQDO LQWHOOLJHQFH UHVHDUFK
OLWWOHODUJHHVSHFLDOO\IRUVDPSOHVZLWKDQHLJHQYDOXHOHVVWKDQ SS
 WKH QHLJKERUKRRG GLVWDQFH ZLWK D OHQJWK RI  >@ &KHQ*&KHQJ<-LQJ:'%6&$1360DQLPSURYHPHQWPHWKRG
FRPSOHWHO\FRYHUVWKHGDWDVSDFHDQGWKHGDWDEHORZDQG RI '%6&$1 DOJRULWKP RQ 6SDUN ,QWHUQDWLRQDO -RXUQDO RI +LJK
3HUIRUPDQFH&RPSXWLQJDQG1HWZRUNLQJSS
QHDUE\ DUH JURXSHG LQWR RQH FODVV ZKLFK PDNHV LW LPSRVVLEOH
>@ 0DOLN 1  6XSHUQRYD 7\SH ,D 'LYHUVLW\ $ 6WXG\ XVLQJ '%6&$1
WRFRQGXFWHIIHFWLYHFOXVWHULQJIRUVDPSOHVLQWKLVUHJLRQ:KHQ $OJRULWKP,QWHUQDWLRQDO-RXUQDORI$GYDQFHG7UHQGVLQ&RPSXWHUHQFH
(SV LV  HIIHFWLYH FOXVWHULQJ FDQ RQO\ EH FDUULHG RXW IRU DQG(QJLQHHULQJSS
GDWD EHWZHHQ  DQG $IWHU WKH FOXVWHULQJ RI

952

licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on September 07,2022 at 21:47:02 UTC from IEEE Xplore. Restrict
>@ .D]HPL%H\GRNKWL0$OL$EEDVSRXU50RMDUDE06SDWLR7HPSRUDO
0RGHOLQJRI6HLVPLF3URYLQFHVRI,UDQ8VLQJ'%6&$1$OJRULWKP3XUH
DQG$SSOLHG*HRSK\VLFVSS
>@ =KDQJ:DQJ/LDQJ6KRUW7HUP:LQG3RZHU3UHGLFWLRQ8VLQJ*$%3
1HXUDO 1HWZRUN %DVHG RQ '%6&$1 $OJRULWKP 2XWOLHU
,GHQWLILFDWLRQ3URFHVVHV  
>@ =KDQJ+/LX3*XR<%OLQGPRGXODWLRQIRUPDWLGHQWLILFDWLRQXVLQJ
WKH '%6&$1 DOJRULWKP IRU FRQWLQXRXVYDULDEOH TXDQWXP NH\
GLVWULEXWLRQ -RXUQDO RI WKH 2SWLFDO 6RFLHW\ RI $PHULFD %
  %
>@ =KDQJ70D),PSURYHGURXJKNPHDQVFOXVWHULQJDOJRULWKPEDVHGRQ
ZHLJKWHGGLVWDQFHPHDVXUHZLWK*DXVVLDQIXQFWLRQ,QWHUQDWLRQDO-RXUQDO
RI&RPSXWHU0DWKHPDWLFVSS
>@ 0HPRQ.+/HH'+*HQHUDOLVHGIX]]\FPHDQVFOXVWHULQJDOJRULWKP
ZLWKORFDOLQIRUPDWLRQ)X]]\6HWV 6\VWHPVSS

953

licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on September 07,2022 at 21:47:02 UTC from IEEE Xplore. Restrict

You might also like