DBSCAN Clustering Algorithm Based On Density
DBSCAN Clustering Algorithm Based On Density
DBSCAN Clustering Algorithm Based On Density
'%6&$1&OXVWHULQJ$OJRULWKP%DVHGRQ'HQVLW\
'LQJVKHQJ'HQJ
6LFKXDQ8QLYHUVLW\1DWLRQDOLWLHV.DQJGLQJ6LFKXDQ&KLQD
&RUUHVSRQGLQJDXWKRU¶VHPDLOGGV#VFXQHGXFQ
Abstract²&OXVWHULQJ WHFKQRORJ\ KDV LPSRUWDQW DSSOLFDWLRQV VDOHVQHHGWRXQGHUVWDQGWKHFKDUDFWHULVWLFVRISHRSOHZKRQHHG
LQ GDWD PLQLQJ SDWWHUQUHFRJQLWLRQ PDFKLQHOHDUQLQJDQGRWKHU WREX\KRXVHV$FFRUGLQJWRWKHFKDUDFWHULVWLFVRISDWLHQWVZLWK
2020 7th International Forum on Electrical Engineering and Automation (IFEEA) | 978-1-7281-9627-5/20/$31.00 ©2020 IEEE | DOI: 10.1109/IFEEA51475.2020.00199
ILHOGV +RZHYHU ZLWK WKH H[SORVLYH JURZWK RI GDWD WUDGLWLRQDO WKH VDPH GLVHDVH PHGLFDO VFKRODUV ZLOO WDNH PHDVXUHV WR GHDO
FOXVWHULQJDOJRULWKPLVPRUHDQGPRUHGLIILFXOWWRPHHWWKHQHHGV ZLWKWKHGLVHDVHRUSURSRVHDFXUH&OXVWHUDQDO\VLVLVQHHGHGLQ
RI ELJ GDWD DQDO\VLV +RZ WR LPSURYH WKH WUDGLWLRQDO FOXVWHULQJ DOO RI WKHVH FDVHV 7KH H[SORVLRQ RI VWRUHG DQG WUDQVLHQW GDWD
DOJRULWKP DQG HQVXUH WKH TXDOLW\ DQG HIILFLHQF\ RI FOXVWHULQJ KDV VSXUUHG D WKLUVW IRU LQWHOOLJHQW GDWDSURFHVVLQJ WRROV DQG
XQGHU WKH EDFNJURXQG RI ELJ GDWD KDV EHFRPH DQ LPSRUWDQW FXWWLQJHGJH WHFKQRORJLHV 1HZ WHFKQRORJLHV DQG WRROV WKDW
UHVHDUFK WRSLF RI DUWLILFLDO LQWHOOLJHQFH DQG ELJ GDWD SURFHVVLQJ
NHHS SDFH ZLWK 7KH 7LPHV FDQ WXUQ PDVVLYH DPRXQWV RI GDWD
7KH GHQVLW\EDVHG FOXVWHULQJ DOJRULWKP FDQ FOXVWHU DUELWUDULO\
LQWRLQIRUPDWLRQDQGNQRZOHGJHWKDWVXSSRUWVGHFLVLRQPDNLQJ
VKDSHG GDWD VHWV LQ WKH FDVH RI XQNQRZQ GDWD GLVWULEXWLRQ
'%6&$1LVDFODVVLFDOGHQVLW\EDVHGFOXVWHULQJDOJRULWKPZKLFK
LQDQLQWHOOLJHQWZD\7KHUHIRUHZLWKWKHULVHRIWKHFRQFHSWRI
LV ZLGHO\ XVHG IRU GDWD FOXVWHULQJ DQDO\VLV GXH WR LWV VLPSOH DQG ELJGDWDLQUHFHQW\HDUVGDWDPLQLQJKDVDOVREHHQDSSOLHGLQD
HIILFLHQW FKDUDFWHULVWLFV 7KH SXUSRVH RI WKLV SDSHU LV WR VWXG\ ZLGHU UDQJHRI ILHOGV&OXVWHULQJLVRQHRI WKH PRVWFRPPRQO\
'%6&$1FOXVWHULQJDOJRULWKPEDVHGRQGHQVLW\7KLVSDSHUILUVW XVHG DOJRULWKPV LQ GDWD PLQLQJ &OXVWHULQJ KDV EHHQ ZLGHO\
LQWURGXFHV WKH FRQFHSW RI '%6&$1 DOJRULWKP DQG WKHQ FDUULHV XVHGLQLPDJHSURFHVVLQJPDUNHWUHVHDUFKSDWWHUQUHFRJQLWLRQ
RXW SHUIRUPDQFH WHVWV RQ '%6&$1 DOJRULWKP LQ WKUHH GLIIHUHQW DQG GDWD DQDO\VLV >@ 7KH SXUSRVH RI XVLQJ FOXVWHULQJ
GDWD VHWV %\ DQDO\]LQJ WKH H[SHULPHQWDO UHVXOWV LW FDQ EH DOJRULWKPLVWRGLVFRYHUFOXVWHUVLQGDWDZKLFKLVDFROOHFWLRQRI
FRQFOXGHGWKDW'%6&$1DOJRULWKPKDVKLJKHUKRPRJHQHLW\DQG GDWD REMHFWV 7KH SURSHUW\ RI FODVV LV WKDW REMHFWV LQ WKH VDPH
GLYHUVLW\ZKHQLWSHUIRUPVSHUVRQDOL]HGFOXVWHULQJRQGDWDVHWVRI VHWDUHPRVWVLPLODUWRHDFKRWKHU$PRQJWKHPGHQVLW\EDVHG
QRQXQLIRUP GHQVLW\ ZLWK EURDG YDOXHV DQG JUDGXDOO\ VSDUVH FOXVWHULQJDOJRULWKPLVZLGHO\XVHGIRUGDWDFOXVWHULQJDQDO\VLV
IRUZDUGV:KHQWKH'%6&$1DOJRULWKP
VQHLJKERUKRRGGLVWDQFH GXHWRLWVVLPSOHDQGHIILFLHQWFKDUDFWHULVWLFV,WFDQEHXVHGWR
HSVLVFODVVHVDUHJHQHUDWHGDIWHUFOXVWHULQJ FOXVWHUDUELWUDULO\VKDSHGGDWDVHWV'%6&$1DOJRULWKPXVHVWKH
JLYHQFOXVWHULQJUDGLXVDQGGHQVLW\WKUHVKROGWRUDQGRPO\VHOHFW
Keywords-DBSCAN Algorithm; Density Clustering; Machine FRUH SRLQWV WR FRQGXFW FOXVWHULQJ LQ WKH PDQQHU RI
Learning;Algorithm Research QHLJKERUKRRGH[SDQVLRQ
, ,1752'8&7,21 7KLV SDSHU ILUVW LQWURGXFHV WKH FRQFHSW RI '%6&$1
DOJRULWKPDQGWKHQFDUULHVRXWSHUIRUPDQFHWHVWVRQ'%6&$1
6LQFHWKHPLGGOHRIWKHWKFHQWXU\PDFKLQHOHDUQLQJKDV DOJRULWKP LQ WKUHH GLIIHUHQW GDWD VHWV %\ DQDO\]LQJ WKH
EHHQGHYHORSLQJ UDSLGO\LQWKHRUHWLFDO UHVHDUFK +RZHYHUGXH H[SHULPHQWDO UHVXOWV LW FDQ EH FRQFOXGHG WKDW '%6&$1
WRWKHODFNRIKDUGZDUHFRPSXWLQJSHUIRUPDQFHDQGGDWDLWKDV DOJRULWKP KDV KLJKHU KRPRJHQHLW\ DQG GLYHUVLW\ ZKHQ LW
QRWEHHQZLGHO\DSSOLHG>@:LWKWKHUDSLGGHYHORSPHQWDQG SHUIRUPV SHUVRQDOL]HG FOXVWHULQJ RQ GDWD VHWV RI QRQXQLIRUP
XSJUDGLQJ RI HFRQRPLF DQG WHFKQRORJ\ HOHFWURQLF SURGXFWV GHQVLW\ZLWKEURDGYDOXHVDQGJUDGXDOO\VSDUVHIRUZDUGV:KHQ
JUDGXDOO\ LQWR HYHU\RQH
V OLIH RQO\ WKLV NLQG RI SURGXFWV KDV WKH '%6&$1 DOJRULWKP
V QHLJKERUKRRG GLVWDQFH HSV LV
EHFRPHDQHFHVVLW\RI3HRSOH
V'DLO\OLIHLQWKHSURFHVVRIWKH FODVVHVDUHJHQHUDWHGDIWHUFOXVWHULQJ
LQWHUDFWLRQ RI WKH SHRSOH ZLWK DOO VRUWV RI PDFKLQH KDV EHHQ
SURGXFHG D ODUJH DPRXQW RI GDWD SDYLQJ WKH ZD\ IRU WKH
,, '%6&$1&/867(5,1*$/*25,7+0
ZLGHVSUHDG XVH RI PDFKLQH OHDUQLQJ >@ 7KHUH DUH KXJH
VRFLDODQGFRPPHUFLDOYDOXHVEXULHGLQWKHKXJHGDWDJHQHUDWHG
A. DBSCAN Clustering Algorithm
DOO WKH WLPH DQG LW KDV EHFRPH WKH FRPPRQ JRDO RI ERWK
DFDGHPLF DQG LQGXVWULDO FLUFOHV WR GLJ RXW PRUH YDOXHV IURP 7KH EHKDYLRU RI JURXSLQJ VLPLODU GDWD REMHFWV LQ D
WKHVH GDWD 0DFKLQH OHDUQLQJ KDV JUDGXDOO\ HPHUJHG LQ WKH FROOHFWLRQ LQWR WKH VDPH FODVV LV XVXDOO\ FDOOHG FOXVWHULQJ DQG
FRPPHUFLDO DSSOLFDWLRQ RI GDWD PLQLQJ SURGXFHG UHPDUNDEOH WKHFROOHFWLRQRIGDWDREMHFWVLVFDOOHG&OXVWHU($FROOHFWLRQ
H[FHOOHQWUHVXOWVDQGKDVLPSRUWDQWFRPPHUFLDOYDOXHDQGKDV RIGDWDREMHFWVLVGLYLGHGLQWRDJURXSZKLFKFDQEHFRQVLGHUHG
JUDGXDOO\ EHFRPH DQ LPSRUWDQW VROXWLRQ IRU GDWD PLQLQJ >@ D IRUP RI GDWD FRPSUHVVLRQ 2QH RI WKH FKDUDFWHULVWLFV RI
&OXVWHULQJDOJRULWKPLVDQLPSRUWDQWWHFKQRORJ\LQWKHILHOGRI FOXVWHULQJ LV WKDW LW LV XQVXSHUYLVHG ZKLFK LV GLIIHUHQW IURP
PDFKLQH OHDUQLQJ ,W LV GHHSO\ XVHG LQ D ZLGH UDQJH RI GDWD FODVVLILFDWLRQZKLFKUHTXLUHVKLJKRYHUKHDGLQREMHFWPRGHOLQJ
PLQLQJ VFHQDULRV VXFK DV FRPPRGLW\ UHFRPPHQGDWLRQ $QRWKHU FKDUDFWHULVWLF RI FOXVWHULQJ LV LWV DGDSWDELOLW\ WR
QXPHULFDOSUHGLFWLRQSDWWHUQUHFRJQLWLRQDQGVRRQ>@ FKDQJHV LQ GDWD &OXVWHULQJ FDQ DXWRPDWLFDOO\ GLVFRYHU DQG
LGHQWLI\ WKH VSDUVH DQG GHQVH DUHDV RI WKH REMHFW GDWD VHW DQG
,QRXUGDLO\OLIHZHRIWHQHQFRXQWHUWKLVNLQGRIVLWXDWLRQ GLVFRYHU WKH SRWHQWLDO FRUUHODWLRQ EHWZHHQ WKH JOREDO
WKHPDUNHWHURIWKHVKRSSLQJPDOOZLOOSXW WKHJRRGVZLWKWKH GLVWULEXWLRQ DQG GDWD DWWULEXWHV ,Q WKH EXVLQHVV ILHOG FOXVWHU
KLJKHVWVDOHVYROXPHLQWKHVDPHSODFHLQRUGHUWRLQFUHDVHWKH DQDO\VLV KHOSV PDUNHW DQDO\VWV WR DQDO\]H DQG GHVFULEH WKH
SRVVLELOLW\RIWKHJRRGVEHLQJERXJKWDWWKHVDPHWLPH+RXVLQJ
licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on September 07,2022 at 21:47:02 UTC from IEEE Xplore. Restrict
FKDUDFWHULVWLFV RI FXVWRPHU JURXSV VR DV WR JDLQ DQ B. Density Peak Point Discovery
XQGHUVWDQGLQJ RI WKH FRQVXPSWLRQ GLUHFWLRQ FRQVXPSWLRQ 7KH QHLJKERUKRRG RI D GDWD SRLQW *LYHQ D PLQLPXP
DELOLW\ FRQVXPSWLRQ ZLOOLQJQHVV DQG VR RQ RI GLIIHUHQW QXPEHU RI SRLQWV PLQ3WV WKH QHLJKERUKRRG FHQWHU RI GDWD
FXVWRPHU JURXSV &OXVWHU DQDO\VLV LV DOVR KHOSIXO IRU WKH SRLQW3LVGHILQHGDVDVHWRI1SGDWDSRLQWVFORVHVWWR3
DQDO\VLVRIJHRJUDSKLFDOGDWD7KHDQDO\VLVRIUDLQIDOOJHRORJ\
DQG RWKHU GDWD LV FORVHO\ UHODWHG WR WKH WLPHO\ SUHGLFWLRQ RI 7KHORFDOGHQVLW\RIDGDWDSRLQW*LYHQDPLQLPXPQXPEHU
QDWXUDOGLVDVWHUV Up
RI SRLQWV PLQ3WV WKH ORFDO GHQVLW\ RI GDWD SRLQW 3 LV
'%6&$1 DOJRULWKP KDV JRRG FOXVWHULQJ UHVXOWV LQ WKH GHILQHGDV
DSSOLFDWLRQ ZKLFK LV D W\SLFDO UHSUHVHQWDWLYH RI GHQVLW\
DOJRULWKP 7KH DOJRULWKP KDV WKH FKDUDFWHULVWLF RI ILQGLQJ DQ\ Up PLQ Pts PD[ Dist p x
xN p
VKDSH FODVV LQ WKH GDWD VHW 7KH '%6&$1 DOJRULWKP UHTXLUHV
WKH XVHU WR VHW WKH JOREDO FRQVWDQW SDUDPHWHU QHLJKERUKRRG
GLVWDQFH DQG WKUHVKROG EHIRUH UXQQLQJ 7KH QHLJKERUKRRG N PLQ P
GLVWDQFH VHWV WKH UDGLXV RI WKH QHLJKERUKRRG UDQJH RI WKH :KHUH p LV S
V QHLJKERUKRRG FRQWDLQV ts GDWD
VDPSOH SRLQW 7KH WKUHVKROG LV VHW E\ FKDQJLQJ WKH PLQLPXP SRLQWVFORVHVWWR3DQG'LVWLVWKHGLVWDQFHIXQFWLRQ,QRUGHU
QXPEHURIVDPSOHSRLQWVZLWKLQWKHUDGLXVRIWKHQHLJKERUKRRG WR LQWURGXFH WKH DOJRULWKP LQ WKLV SDSHU WKH GLVWDQFH EHWZHHQ
UDQJHWREHPDUNHGDVDFRUHSRLQW,WLVZRUWKQRWLQJWKDWWKH GDWD SRLQWV LV (XFOLGHDQ GLVWDQFH $FFRUGLQJ WR IRUPXOD
QHLJKERUKRRGGLVWDQFHDQGWKUHVKROGDUHERWKFRQVWDQWVZKLFK PD[ Dist p x
xN
KDYH EHHQ VHW EHIRUH WKH SURJUDP VWDUWHG DQG GR QRW FKDQJH p
LVWKHORFDOUDGLXVRI3(367KHODUJHU
DIWHUVHWWLQJ'%6&$1DOJRULWKPLVLQVHQVLWLYHWRQRLVHSRLQWV WKHORFDOGHQVLW\RI3LVWKHVPDOOHUWKHORFDOUDGLXVLVWKHPRUH
DQGFDQEH DSSOLHG WR GDWD VHWVFRQWDLQLQJQRLVHSRLQWV ,WFDQ PLQLPXPSRLQWVFDQEHPHW,QRUGHUWRPHDVXUHWKHGLIIHUHQFH
LGHQWLI\QRLVHSRLQWVDQGH[FOXGHWKHPIURPFOXVWHULQJUHVXOWV EHWZHHQ SHDN GHQVLW\ SRLQW DQG RWKHU FRUH SRLQWV WKH
)RU HDFK FODVV LQ WKH FOXVWHULQJ UHVXOW WKH GHQVLW\ ZLWKLQ WKH Gp
FODVV LV KLJKHU WKDQ WKH GHQVLW\ DW WKH HGJH RI WKH FODVV 7KH GLIIHUHQFHPHWULF RIGDWDSRLQW3LVGHILQHGDV
GHQVLW\ RI WKH QRLVH SRLQW LV ORZHU WKDQ WKDW RI WKH HGJH
$FFRUGLQJWRWKHGDWDGLVWULEXWLRQFKDUDFWHULVWLFVWKHDOJRULWKP Gp PLQ Dist p x
Ux !U p
XVHVWKHGHQVLW\GLIIHUHQFHWRLGHQWLI\GLIIHUHQWGHQVLW\UHJLRQV
DQGPDUNVWKHFOXVWHULQJUHVXOWV
G
7KHDOJRULWKPVWHSVDUHDVIROORZV 7KDWLV p LVWKHPLQLPXPGLVWDQFHRIDOOSRLQWV1SZLWKD
ORFDOGHQVLW\JUHDWHUWKDQSRLQW3:KHQWKHORFDOGHQVLW\RI3
(QWHU WKH PLQLPXP UDGLXV H DQG WKH PLQLPXP GHQVLW\ LV WKH ODUJHVW 3 LV PRVW OLNHO\ WR EH D GHQVLW\ FRUH SRLQW DQG
WKUHVKROGPLQS Gp
WKHQ LVGHILQHGDV
6HTXHQWLDO UHDGLQJ RI GDWD LQWR D WH[W ILOH 7KH GDWD LV
VHTXHQWLDOO\ UHDG LQWR D WH[W ILOH WKDW KROGV WKH RULJLQDO WZR Gp PD[ Dist p x
GLPHQVLRQDO;DQG<FRRUGLQDWHVRIWKHSRLQWVDQGVWRUHGLQWR xD
D SRLQW/LVW WKDW KROGV WKH 3RLQW VWUXFWXUH ZKLFK UHFRUGV
LQIRUPDWLRQDERXWWKHLQSXWSRLQWV
,,, (;3(5,0(17$/'(6,*12)'%6&$1$/*25,7+021
'HWHUPLQH LI WKH SRLQW LV D FRUH SRLQW 5HDG D SRLQW IURP ',))(5(17'$7$6(76
SRLQW/LVWUHDGLQRUGHULIWKHSRLQWLVQRWPDUNHGQRWSDUWRI
DFOXVWHUWKHQFDOFXODWHWKHGLVWDQFHEHWZHHQWKHSRLQWDQGDOO A. Data Acquisition
RWKHUSRLQWVLIWKHGLVWDQFHEHWZHHQWKHWZRSRLQWVLVOHVVWKDQ ,Q RUGHU WR H[SHULPHQW WKH HIIHFW RI '%6&$1 DOJRULWKP
RUHTXDOWRWKHPLQLPXPUDGLXVHSXWWKHWZRSRLQWVHOLPLQDWH WKLV SDSHU VLPXOWDQHRXVO\ XVHV '%6&$1 DOJRULWKP FOXVWHULQJ
WKH VDPH SRLQWV LQWR WKH WPS/VW DUUD\ DQG FRXQW ,I WKH RQWKUHHGDWDVHWV7KHVHWKUHHGDWDVHWVDUH'GDWDVHW5
GLVWDQFHEHWZHHQWZRSRLQWVLVJUHDWHUWKDQWKHPLQLPXPUDGLXV GDWDVHWDQGFUHGLWFDUGXVHUGDWDVHWUHVSHFWLYHO\'DQG5
HWKHQVNLSWKLVSRLQWDQGSURFHHGWRWKHQH[WSRLQW,QWKHHQG GDWDVHWVDUHVDPSOHSRLQWVREH\LQJ*DXVVLDQGLVWULEXWLRQDQG
ZKHQWKHWRWDOQXPEHULVJUHDWHUWKDQRUHTXDOWRWKHPLQLPXP FUHGLW FDUG XVHU GDWD VHWV DUH UHDO FRQVXPSWLRQ GDWD VHWV RI
GHQVLW\ WKUHVKROG WKH HOHPHQWV LQ WPS/VW DUH PDUNHG DV WHOHFRP FXVWRPHUV 7KH FKDUDFWHULVWLFV RI WKH GDWD VHW DUH
JURXSHG DQG WKH HOHPHQWV ZLWK H[FHVVLYH JURXSV DUH SXW LQWR VKRZQLQ7DEOH
WKH UHVXOW/LVW DUUD\ DV D FOXVWHU VWRUHG DV DQ HOHPHQW ,I WKH
SRLQW LV PDUNHG WKH SRLQW LV VNLSSHG DQG WKH MXGJPHQW RI WKH
QH[WSRLQWFRQWLQXHV8QWLODOOWKHSRLQWVDUHWUDYHUVHGRQFH 7$%/(&+$5$&7(5,67,&62)'$7$6(7
950
licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on September 07,2022 at 21:47:02 UTC from IEEE Xplore. Restrict
B. Experimental Environment $V FDQ EH VHHQ IURP )LJXUH DIWHU WKH (SV SDUDPHWHU RI
8EXQWX XVLQJ /LQX[ V\VWHP KDV D &38RI JK] D '%6&$1DOJRULWKPLVUHGXFHGWKHUHVXOWFODVVDIWHUFOXVWHULQJ
KRVWPHPRU\RI*%DQGDKDUGGLVNFDSDFLW\RI*%7KH FDQ EH LQFUHDVHG ZKLFK LQFUHDVHV WKH GLYHUVLW\ WR D FHUWDLQ
GHYHORSPHQWODQJXDJHXVHVYHUVLRQRI3\WKRQDQGWKH,'( H[WHQW +RZHYHU IURP WKH YDOXHV RI VWDQGDUG GHYLDWLRQ DQG
GHYHORSPHQWHQYLURQPHQWXVHV3\FKDUP FRPPXQLW\ GLIIHUHQFH ZLWKLQ WKH UHVXOW FODVV WKH GLIIHUHQFH
ZLWKLQWKHUHVXOWFODVVLVYHU\ODUJHDQGXQVWDEOH,WVKRZVWKDW
WKH KRPRJHQHLW\ ZLWKLQ WKH FODVV LV QRW FRQVLGHUHG LQ WKH
C. Performance Indicators
FOXVWHULQJ SURFHVV DQG WKHUH LV VRPH UDQGRPQHVV XQGHU WKH
7KH UHVXOW FODVV VWDQGDUG GHYLDWLRQ DQG FRPPXQLW\ LQIOXHQFH RI LQLWLDOL]DWLRQ '%6&$1 DOJRULWKP LQFUHDVHV WKH
GLIIHUHQFH DUH XVHG LQ WKLV SDSHU IRU D UHVXOW FODVV LI WKH GLYHUVLW\ RI UHVXOWV E\ UHGXFLQJ (SV UHVXOWLQJ LQ WKH HIIHFW RI
VWDQGDUG GHYLDWLRQ DQG FRPPXQLW\ GLIIHUHQFH DUH VPDOO LW ODUJHLQWUDFODVVGLIIHUHQFHVRIUHVXOWFODVVHV
PHDQV WKDW WKH UHVXOW FODVV LV KLJKO\ KRPRJHQHRXV ,I WKH
GLIIHUHQFHEHWZHHQVWDQGDUGGHYLDWLRQDQGFRPPXQLW\LVVPDOO 7KH UDQJH RI HLJHQYDOXH UDQJH RI 5 GDWD VHW LV VPDOOHU
LWPHDQVWKDWWKHGDWDZLWKLQWKHUHVXOWFODVVIOXFWXDWHVJUHDWO\ WKDQWKDWRI'GDWDVHW,WFDQEHVHHQ IURPWKHDERYHWDEOH
DQGWKHKRPRJHQHLW\LVORZ WKDWWKHFOXVWHULQJUHVXOWVDUHFODVVHVZKLFKDUHWKHVDPHDV
WKH ELJ FDWHJRULHV DQQRWDWHG ZLWK WKH RULJLQDO GDWD LQGLFDWLQJ
WKDW'%6&$1DOJRULWKPLVH[FHOOHQWLQWKHFODVVLILFDWLRQRIELJ
,9 (;3(5,0(17$/5(68/762)'%6&$1$/*25,7+021
FDWHJRULHV EXW LW FDQQRW PHHW WKH FRQWUROODEOH LQFUHDVH RI
',))(5(17'$7$6(76 GLYHUVLW\DQGWKHFOXVWHULQJSURFHVVLVXQFRQWUROODEOH
A. Experimental Results of D31 and R15 Data Sets
B. Experimental Results of Credit Card User Data Set
$OO GDWD VHWV ZHUH SXW LQWR '%6&$1 DOJRULWKP $IWHU WKH
FOXVWHULQJRIWKHDOJRULWKPWKHFOXVWHULQJUHVXOWVZHUHFRXQWHG $IWHU WKH FOXVWHULQJ RI WKH FUHGLW FDUG XVHU GDWD VHW E\
'%6&$1DOJRULWKPZDVXVHGIRUH[SHULPHQWVRQ'DQG5 '%6&$1 DOJRULWKP DQG &($9'%6&$1 DOJRULWKP WKH
GDWD VHWV 7KH JHQHUDWLRQ RI HDFK FODVV LQ ' DQG 5 GDWD UHVXOWFODVVVWDWLVWLFVDIWHUWKHFOXVWHULQJRI'%6&$1DOJRULWKP
VHWV LV JDXVVLDQ DQG WKHUH DUH FDWHJRULHV DQQRWDWHG LQ WKH DUHVKRZQLQ7DEOHDQG)LJXUH
RULJLQDO GDWD VHW RI ' 7KH FOXVWHULQJ UHVXOWV DUH VKRZQ LQ
7DEOHDQG)LJXUH 7$%/(&/867(5,1*5(68/76
)LJXUH 5HVXOWFODVVVWDWLVWLFVDIWHU'%6&$1DOJRULWKPFOXVWHULQJ
951
licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on September 07,2022 at 21:47:02 UTC from IEEE Xplore. Restrict
QHLJKERUKRRG LV WRR VPDOO HIIHFWLYH FOXVWHULQJ FDQ RQO\ EH '%6&$1 DOJRULWKP ZH FRXQWHG WKH HYDOXDWLRQ GDWD RI WKH
FDUULHGRXWRQWKHDUHDFORVHWRWKHFRRUGLQDWHD[LVLQWKHVSDFH UHVXOWFODVVHVDVVKRZQLQ7DEOH
DQGWKHHIIHFWLYHFOXVWHULQJFDQQRWEHFDUULHGRXWRQWKHDUHDRI
WKHSULQFLSOHFRRUGLQDWHD[LV
7$%/(&/867(5,1*5(68/76
$GMXVW QHLJKERUKRRG RI '%6&$1 DOJRULWKP GLVWDQFH (SV
IRU FRQWLQXH WR H[SHULPHQW DIWHU '%6&$1 DOJRULWKP RI PHDQ VWG PLQ PHGLDQ PD[
1XPEHURILQFODVV
FOXVWHULQJ RXU VWDWLVWLFV FODVV DVVHVVPHQW GDWD ZKHQ WKH VDPSOHSRLQWV
QHLJKERUKRRG RI '%6&$1 DOJRULWKP IURP (SV LQ 6WDQGDUGGHYLDWLRQRI
FODVVHVDIWHUFOXVWHULQJDOJRULWKPWKHUHVXOWVRIWKHQXPEHURI LQFODVVVDPSOH
FODVVHVLVPRUHWKDQZKHQ(SVWDNHEXWWKHWRWDOLVQRWPXFK SRLQWV
WKHQHLJKERUKRRGRIUHVXOWVWRGDWDJDWKHUHGLQWKHPLGGOH 'LPHQVLRQ
RIWKHVSDFH\RXFDQLPDJLQHIRUFKDUDFWHULVWLFYDOXHIRUPRUH FRPPXQLW\
GLIIHUHQFH
WKDQVDPSOHVWKHOHQJWKRIQHLJKERUKRRGGLVWDQFHLV 'LPHQVLRQ
WRR VKRUW KDUG FOXVWHULQJ IRU HLJHQYDOXHV RI WKH FRPPXQLW\
IROORZLQJVDPSOHVWKHQHLJKERUKRRGGLVWDQFHZLWKDOHQJWKRI GLIIHUHQFH
LV D OLWWOH ODUJH HVSHFLDOO\ IRU WKH VDPSOHV ZLWK DQ
HLJHQYDOXH OHVV WKDQ WKH QHLJKERUKRRG GLVWDQFH ZLWK D 9 &21&/86,21
OHQJWK RI FRPSOHWHO\ FRYHUV WKH GDWD VSDFH DQG WKH GDWD ,Q WKH HUD RI ELJ GDWD KXPDQ HOHFWURQLF LQWHUDFWLRQ LV
EHORZ DQG DURXQG DUH FODVVLILHG LQWR RQH FDWHJRU\ :KHQ WUDQVIRUPHG LQWR D VHULHV RI GDWD ZKLFK FRQWDLQV JUHDW YDOXH
(SVLVVHWDWWKHGDWDFDQRQO\EHHIIHFWLYHO\FOXVWHUHGLQ 0DFKLQH OHDUQLQJ KDV VKRZQ H[FHOOHQW UHVXOWV LQ GDWD PLQLQJ
WKH UDQJH RI WR ZKHQ WKH QHLJKERUKRRG LV IL[HG DQGKDVJUDGXDOO\EHFRPHWKHPDLQWHFKQRORJ\RIGDWDPLQLQJ
WKHGDWDFDQRQO\EHHIIHFWLYHO\FOXVWHUHGLQDFHUWDLQUHJLRQRI +RZHYHUWKHODFNRIODEHOLQJGDWDLQDFWXDOSURGXFWLRQPDNHV
WKHVSDFHDQGWKHGDWDFDQQRWEHHIIHFWLYHO\FOXVWHUHGLQRWKHU XQVXSHUYLVHG OHDUQLQJ PRUH DGDSWDEOH &OXVWHULQJ DOJRULWKP LV
UHJLRQV $IWHU WKH FOXVWHULQJ RI '%6&$1 DOJRULWKP ZH DQ LPSRUWDQW WHFKQLTXH LQ XQVXSHUYLVHG OHDUQLQJ ,W LV ZLGHO\
FRXQWHG WKH HYDOXDWLRQ GDWD RI WKH UHVXOW FODVVHV DV VKRZQ LQ XVHG LQ PDQ\ VFHQDULRV VXFK DV FRPPRGLW\ UHFRPPHQGDWLRQ
7DEOH DQG QXPHULFDO SUHGLFWLRQ +RZHYHU LQ WKHVH VFHQHV WKH
QXPHULFDO UDQJH RI GDWD LVYHU\ZLGH DQG VRPH RI WKHPKDYH
7$%/(&/867(5,1*5(68/76 FXVWRPL]HG SHUVRQDOL]HG VHUYLFHV ZKLFK QRW RQO\ UHTXLUHV WKH
FOXVWHULQJDOJRULWKPWREHVXLWDEOHIRUWKHQRQXQLIRUPGHQVLW\
PHDQ VWG PLQ PHGLDQ PD[ GDWDVHWZLWKQXPHULFDOYDVWGHQVLW\JUDGXDOO\VSDUVHEXWDOVR
1XPEHURILQFODVV QHHGVWRKDYHGLYHUVLILHGUHVXOWVDQGKLJKKRPRJHQHLW\&OXVWHU
VDPSOHSRLQWV DQDO\VLVSOD\VDQH[WUHPHO\LPSRUWDQWUROHLQGDWDPLQLQJDQG
6WDQGDUGGHYLDWLRQ
RILQFODVVVDPSOH
FDQ PDNH D YHU\ LPSRUWDQW FRQWULEXWLRQ LQ D ODUJH QXPEHU RI
SRLQWV GDWD DQDO\VLV EXVLQHVVHV 1RZDGD\V WKH GDWD YROXPH LV
'LPHQVLRQ LQFUHDVLQJUDSLGO\VRLWLVXUJHQWWRLPSURYHWKHHIILFLHQF\DQG
FRPPXQLW\ UHOLDELOLW\ RI FOXVWHULQJ DOJRULWKP LQ WKH VWDJH RI FOXVWHULQJ
GLIIHUHQFH DQDO\VLV
'LPHQVLRQ
FRPPXQLW\
GLIIHUHQFH $&.12:/('*0(17
(SV GR FRQWLQXH WR QHLJKERUKRRG RI '%6&$1 DOJRULWKP 7KLVSDSHUZDVILQDQFLDOO\VXSSRUWHGE\WKH.H\3URMHFWRI
GLVWDQFHDGMXVWPHQWLVVHWWRFRQWLQXHWRH[SHULPHQWDIWHU 1DWXUDO 6FLHQFH RI 6LFKXDQ 3URYLQFLDO (GXFDWLRQ 'HSDUWPHQW
'%6&$1 DOJRULWKP RI FOXVWHULQJ RXU VWDWLVWLFV FODVV 1R=$$SSOLHG'HPRQVWUDWLRQ&RXUVH3URMHFW
DVVHVVPHQWGDWDZKHQWKH'%6&$1DOJRULWKPRI(SVLQ RI 6LFKXDQ 8QLYHUVLW\ IRU 1DWLRQDOLWLHV 1R VINF DQG
WKH QHLJKERUKRRG GLVWDQFH FOXVWHULQJ DOJRULWKP XVLQJ WKH .H\ 3URMHFW RI 1DWXUDO 6FLHQFH RI 6LFKXDQ 8QLYHUVLW\ IRU
FODVVHVWKHUHVXOWVRIWKHQXPEHURIFODVVHVLVPRUHWKDQZKHQ 1DWLRQDOLWLHV1R;<=%=$
(SVIURPWREXWWKHWRWDOLVQRWPXFKWKHQHLJKERUKRRG
RIUHVXOWVLQWKHGDWDJDWKHUHGLQWKHDUHDRIWKHVSDFH\RX 5()(5(1&(6
FDQ LPDJLQH IRU FKDUDFWHULVWLF YDOXH IRU PRUH WKDQ >@ /L 6 6 $Q ,PSURYHG '%6&$1 $OJRULWKP %DVHG RQ WKH 1HLJKERU
VDPSOHVWKHOHQJWKRIQHLJKERUKRRGGLVWDQFHLVWRRVKRUW 6LPLODULW\DQG)DVW1HDUHVW1HLJKERU4XHU\,((($FFHVV33
KDUGFOXVWHULQJULJKW)RUVDPSOHVZLWKDQHLJHQYDOXHOHVVWKDQ >@ /HH 6 $ +\EULG )UDPHZRUN XVLQJ )X]]\ LIWKHQ UXOHV IRU '%6&$1
WKH QHLJKERUKRRG GLVWDQFH ZLWK D OHQJWK RI LV D $OJRULWKP ,QWHUQDWLRQDO MRXUQDO RI FRPSXWDWLRQDO LQWHOOLJHQFH UHVHDUFK
OLWWOHODUJHHVSHFLDOO\IRUVDPSOHVZLWKDQHLJHQYDOXHOHVVWKDQ SS
WKH QHLJKERUKRRG GLVWDQFH ZLWK D OHQJWK RI >@ &KHQ*&KHQJ<-LQJ:'%6&$1360DQLPSURYHPHQWPHWKRG
FRPSOHWHO\FRYHUVWKHGDWDVSDFHDQGWKHGDWDEHORZDQG RI '%6&$1 DOJRULWKP RQ 6SDUN ,QWHUQDWLRQDO -RXUQDO RI +LJK
3HUIRUPDQFH&RPSXWLQJDQG1HWZRUNLQJSS
QHDUE\ DUH JURXSHG LQWR RQH FODVV ZKLFK PDNHV LW LPSRVVLEOH
>@ 0DOLN 1 6XSHUQRYD 7\SH ,D 'LYHUVLW\ $ 6WXG\ XVLQJ '%6&$1
WRFRQGXFWHIIHFWLYHFOXVWHULQJIRUVDPSOHVLQWKLVUHJLRQ:KHQ $OJRULWKP,QWHUQDWLRQDO-RXUQDORI$GYDQFHG7UHQGVLQ&RPSXWHUHQFH
(SV LV HIIHFWLYH FOXVWHULQJ FDQ RQO\ EH FDUULHG RXW IRU DQG(QJLQHHULQJSS
GDWD EHWZHHQ DQG $IWHU WKH FOXVWHULQJ RI
952
licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on September 07,2022 at 21:47:02 UTC from IEEE Xplore. Restrict
>@ .D]HPL%H\GRNKWL0$OL$EEDVSRXU50RMDUDE06SDWLR7HPSRUDO
0RGHOLQJRI6HLVPLF3URYLQFHVRI,UDQ8VLQJ'%6&$1$OJRULWKP3XUH
DQG$SSOLHG*HRSK\VLFVSS
>@ =KDQJ:DQJ/LDQJ6KRUW7HUP:LQG3RZHU3UHGLFWLRQ8VLQJ*$%3
1HXUDO 1HWZRUN %DVHG RQ '%6&$1 $OJRULWKP 2XWOLHU
,GHQWLILFDWLRQ3URFHVVHV
>@ =KDQJ+/LX3*XR<%OLQGPRGXODWLRQIRUPDWLGHQWLILFDWLRQXVLQJ
WKH '%6&$1 DOJRULWKP IRU FRQWLQXRXVYDULDEOH TXDQWXP NH\
GLVWULEXWLRQ -RXUQDO RI WKH 2SWLFDO 6RFLHW\ RI $PHULFD %
%
>@ =KDQJ70D),PSURYHGURXJKNPHDQVFOXVWHULQJDOJRULWKPEDVHGRQ
ZHLJKWHGGLVWDQFHPHDVXUHZLWK*DXVVLDQIXQFWLRQ,QWHUQDWLRQDO-RXUQDO
RI&RPSXWHU0DWKHPDWLFVSS
>@ 0HPRQ.+/HH'+*HQHUDOLVHGIX]]\FPHDQVFOXVWHULQJDOJRULWKP
ZLWKORFDOLQIRUPDWLRQ)X]]\6HWV 6\VWHPVSS
953
licensed use limited to: MINISTERE DE L'ENSEIGNEMENT SUPERIEUR ET DE LA RECHERCHE SCIENTIFIQUE. Downloaded on September 07,2022 at 21:47:02 UTC from IEEE Xplore. Restrict