0% found this document useful (0 votes)

82 views

Fuzzy Extensions of The DBScan Clustering Algorithm

Uploaded by

Nghiem Quoc Anh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views

Fuzzy Extensions of The DBScan Clustering Algorithm

Uploaded by

Nghiem Quoc Anh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Soft Comput

DOI 10.1007/s00500-016-2435-0

METHODOLOGIES AND APPLICATION

Fuzzy extensions of the DBScan clustering algorithm

Dino Ienco1,2 · Gloria Bordogna3

© Springer-Verlag Berlin Heidelberg 2016

Abstract The DBSCAN algorithm is a well-known density- ous ones, thus allowing to generate clusters with both fuzzy
based clustering approach particularly useful in spatial data cores and fuzzy overlapping borders. Our proposals are com-
mining for its ability to find objects’ groups with heteroge- pared w.r.t. state of the art fuzzy clustering methods over
neous shapes and homogeneous local density distributions real-world datasets.
in the feature space. Furthermore, it can be suitable as scal-
ing down approach to deal with big data for its ability to Keywords Fuzzy clustering · Density-based clustering ·
remove noise. Nevertheless, it suffers for some limitations, DBSCAN clustering
mainly the inability to identify clusters with variable den-
sity distributions and partially overlapping borders, which is
often a characteristics of both scientific data and real-world
1 Introduction
data. To this end, in this work, we propose three fuzzy exten-
sions of the DBSCAN algorithm to generate clusters with
The advent of the big data era has launched new challenges
distinct fuzzy density characteristics. The original version of
to the research community who reacted either by introducing
DBSCAN requires two precise parameters (min Pts and ) to
new algorithms or by extending existing algorithms to man-
define locally dense areas which serve as seeds of the clus-
age large datasets. Specifically, the first approaches focus on
ters. Nevertheless, precise values of both parameters may be
the “scaling up” objective to deal with big data sets. Never-
not appropriate in all regions of the dataset. In the proposed
theless, they risk to become useless in a short time, due the
extensions of DBSCAN, we define soft constraints to model
data continuous growth. CISCO1 estimated that the data on
approximate values of the input parameters. The first exten-
the Internet will increase at a compound annual growth rate
sion, named Fuzzy Core DBSCAN, relaxes the constraint on
of 25% by the year 2017. Thus, to deal with datasets continu-
the neighbourhood’s density to generate clusters with fuzzy
ously growing in size it will be necessary to frequently scale
core points, i.e. cores with distinct density; the second exten-
up algorithms. The second kind of approach aims at scaling
sion, named Fuzzy Border DBSCAN, relaxes to allow the
down, i.e. at synthesizing, the data sets by reducing their size,
generation of clusters with overlapping borders. Finally, the
and to use existing algorithms on the reduced data. Although
third extension, named Fuzzy DBSCAN subsumes the previ-
scaling down may risk to cancel important information it has
Communicated by V. Loia. the chance of reducing the datasets by eliminating noise or
redundant data. Clustering techniques can be categorized as
B Dino Ienco scaling down approaches, since their objective is to identify
dino.ienco@irstea.fr groups of items within the dataset with common character-
Gloria Bordogna istics in a feature space, while removing outliers and noise
bordogna.g@irea.cnr.it which are considered uninteresting for further analysis.
1 IRSTEA, UMR TETIS, Montpellier, France
2 LIRMM, Montpellier, France 1 http://www.cisco.com/c/en/us/solutions/collateral/service-provider/
3 CNR IREA, Milan, Italy global-cloud-index-gci/Cloud_Index_White_Paper.html.

123
D. Ienco, G. Bordogna

Many real-world applications such as social network soft clustering approaches have been defined which gener-
community identification and satellite image analysis need ate clusters with fuzzy overlapping boundaries (Pal et al.
effective means to identify regions that are characterized 2005; Ji et al. 2014; Yager and Filev 1994). Most of the
by locally dense areas in the feature space representing the soft clustering approaches detect fuzzy clusters with same
objects of interests. For instance, such regions may repre- shape, with each object of the dataset belonging to all clus-
sent communities of users linked by the friend of friend ters to a distinct degree. Moreover, even the fuzzy extensions
relationships in social networks, ecosystems may appear as of DBSCAN generate fuzzy clusters with the same charac-
regions characterized by homogeneous feature’ values in teristics of fuzziness, i.e. all clusters have same fain borders
satellite images. To detect such objects, density-based clus- (Ulutagaya and Nasibov 2012; Smiti and Eloudi 2013). In this
tering algorithms have been widely applied. They evaluate paper, we investigate new extensions of the DBSCAN algo-
a local criterion to group objects: clusters are regarded as rithm, defined within the framework of fuzzy set theory, with
regions in the feature space where the objects are densely dis- the aim to cope with the limitation of both classic DBSCAN
tributed and separated by regions with sparse objects, which and soft clustering algorithms. The idea is to define distinct
are considered noise or outliers. DBSCAN extensions capable to manage approximate values
Indeed, in many real applications, such as in satellite of the input parameters, thus less sensible to the input parame-
image analysis, one needs to cope with noise invariably ter setting, and capable to detect possibly fuzzy overlapping
affecting data. Furthermore, one does not have any knowl- clusters with distinct density characteristics and profiles.
edge about the number of clusters, the possible clusters’ There are several real applications in which it could be
shapes and the objects distribution in the feature space. useful specifying approximate values of the input parameters
Finally, crisp clustering algorithms fail to detect the variable and detecting fuzzy clusters with both distinct shapes and dis-
and fuzzy nature of cluster borders which are often faint and tinct density profiles. Consider the detection of communities
overlapped one another. Among the proposed crisp density- of users in a social network based on the FoF relationships:
based clustering algorithms DBSCAN (Ester et al. 1996) is while one can specify easily that the users must have a given
a well-known and widely applied approach as it does not number of degrees of separation from other users on the net-
require to specify in input the number of clusters, it can detect work to belong to the community, it may be questionable
clusters of any shape, and it can remove noisy points. Fur- defining the precise minimum number of users that consti-
thermore, this algorithm is suitable to process big data when tutes a community. In this case, it can be useful to apply a
adopting a spatial index, such as an R-tree, since its com- clustering algorithm, as in our first extension of DBSCAN
plexity varies as O(N ∗ log N ) (Sander et al. 1998). in Bordogna and Ienco (2014), by allowing the specifica-
Nevertheless, it suffers for some drawbacks: first, to drive tion of an approximate density, i.e. an approximate number
the process, this algorithm needs two numeric input para- of users and detecting non-overlapping fuzzy communities
meters, minPts, i.e. the neighbourhood density, and , i.e. where a user can belong to a community to a degree.
the distance to define the local neighbourhood size, which On the other side, in the case, one has to detect stars and
together define the desired local density of the generated clus- galaxies in astronomical optical images, appearing with a
ters. Specifically, minPts is a positive integer specifying the crisp nucleus and faint borders, it can be easier to specify
minimum number of objects that must exist within a max- an approximate local neighbourhood size, as in our second
imum distance in the feature space in order for an object extension of DBSCAN proposed in this paper, and thus detect-
to belong to a cluster. Second, the results of DBSCAN are ing objects with crisp core and faint border.
strongly dependent on the setting of these input parameters Last but not least, there are applications in which objects
that must be chosen with great accuracy (Ester et al. 1996) are characterized by distinct local densities and faint, possi-
by considering both the scale of the dataset and the closeness bly overlapping, borders, such as in remote sensing images,
of the objects in order to achieve both speed and effective- where distinct ecosystems have distinct densities of trees
ness of the results. To set the right values of these parameters (objects) and they may appear merged one another; in this
one generally engages a trial and error exploratory phase in case, it would be useful to allow specifying both an approx-
which the algorithm is run several times with distinct val- imate neighbourhood density, and an approximate local
ues of the input parameters. These repeated trials are costly neighbourhood size to generate fuzzy overlapping clusters
when dealing with big data volumes. A final drawback of as in our third extension of DBSCAN.
DBSCAN is that it detects clusters with sharp boundaries, In the literature, several fuzzy extensions of DBSCAN have
which is a common limitation of all crisp clustering algo- been proposed with the objective of leveraging the setting of
rithms when used to group objects whose distribution has a the precise input parameters (Ulutagaya and Nasibov 2012).
faint and smooth density profile in the feature space. They Nevertheless, none of them have tackled the objective of gen-
draw crisp boundaries to separate clusters, which are often erating fuzzy clusters modelling distinct kind of fuzziness as
somewhat arbitrary. To cope with undesired crisp boundaries, we do in this paper. We leverage the setting of either one or

123
Fuzzy extensions of the DBScan clustering algorithm

both the input parameters of DBSCAN by allowing the speci- simultaneously belong to several clusters (Ji et al. 2014; Pal
fication of soft constraints on both the number of objects and et al. 2005).
the closeness (reachability) between objects. Specifically, the Density-based clustering algorithms grow clusters around
precise value minPts is replaced by a soft constraint defined seeds located in regions of the feature space which are locally
by a pair (min Ptsmin , min Ptsmax ) that specifies an approx- dense of objects. DBSCAN (Ester et al. 1996) is one of the
imate minimum number of objects for defining a cluster, i.e. most popular density-based methods used in data mining due
there is a tolerance on the crisp limit min Ptsmax defined by to both its ability to detect irregularly shaped clusters by
min Ptsmax − min Ptsmin ; in the same way, the precise dis- copying with noise data, and to its relatively low complexity
tance , is replaced by a soft constraints (min , max ) on the that varies as O(N ∗ log N ) when adopting a spatial index,
closeness of objects so that again on the crisp limit min we thus making it suitable to process big data (Sander et al.
have a tolerance defined by max − min . 1998). Nevertheless, its effectiveness in detecting clusters is
The three extensions of DBSCAN generate clusters with strongly dependent on the parameters setting, and this is the
either a fuzzy core, i.e. clusters whose elements are associated main reason that leads to its soft extensions. Besides this
with a numeric membership degree in [0,1] but not overlap- motivation we argue that, in order to properly adopt a soft
ping one another, clusters with fuzzy overlapping border and density-based clustering approach with respect to another
a crisp core, and clusters with both fuzzy core and over- one, one should be able to understand the properties of the
lapping borders. Having three extensions producing clusters generated soft clusters. This is the reason that leads us to
with distinct fuzzy and overlapping properties one can choose define three distinct extensions of DBSCAN each one gener-
the most appropriate for task to accomplish. ating fuzzy clusters with distinct characteristics.
Furthermore, fuzzy clusters allow several advantages: for Ulutagaya and Nasibov (2012) reports a survey of the main
instance, with a single run of the clustering it is possible to fuzzy density-based clustering algorithms, while Shamshir-
summarize several distinct runs of the original approach by band et al. (2014) presents a study in which they show that
specifying distinct thresholds on the membership degrees of density-based clustering algorithm coupled with fuzzy logic
the objects to the clusters. For this reason, it can be employed can efficiently deal with the task of intrusion detection in
as intelligent reduction strategy for big data. In our case, this wireless sensor networks.
allows an easy exploration of the spatial distribution of the The most cited paper (Nasibov and Ulutagay 2009) pro-
objects avoiding the tedious exact setting of the DBSCAN poses a fuzzy extension of the DBSCAN, named fuzzy
parameters (Ester et al. 1996). neighbourhood FN-DBSCAN, whose main characteristic is to
The paper is organized as follow: Sect. 2 discusses related use a fuzzy neighbourhood size definition. In this approach,
work, in Sect. 3 we recall the classic DBSCAN algorithm. the authors address the difficulty of the user in setting the
The clustering algorithm generating clusters with fuzzy core values of the input parameter when the distances of the
points Fuzzy Core DBSCAN, firstly introduced in Bordogna points are in distinct scales, as it happens in astronomical
and Ienco (2014), the extension generating clusters with images. Thus, they first normalize the distances between all
fuzzy overlapping borders Fuzzy Border DBSCAN and the points in [0, 1], and then they allow computing distinct mem-
most general strategy generating fuzzy overlapping clusters bership degrees on the distance to delimit the neighbourhood
(Fuzzy DBSCAN) are introduced in Sect. 4. of points, i.e. the decaying of the membership degrees as a
After the definitions of the three algorithms, Sect. 6 function of the distance. Then, they select as belonging to the
discusses and compares the performance of the different fuzzy neighbourhood of a point only those points belonging
approaches over real-world data sets in comparison with to the support of the membership function. This extension
those yielded by Fuzzy C-Means and Soft-DBSCAN fuzzy of DBSCAN uses a level-based neighbourhood set, instead
clustering algorithms. Section 7 concludes and summarizes of a distance-based neighbourhood size, and it uses the con-
the main achievements. cept of fuzzy cardinality, instead of classical cardinality, for
identifying core points. This last choice causes the creation
(within the same run of the algorithm) of fuzzy clusters with
very heterogeneous local density characteristics: both fuzzy
2 Related work clusters with cores having a huge number of sparse points
(points located at the border of the local neighbourhood of
The relevant works to our proposal are those related to the each other), and fuzzy clusters with small cores, constituted
literature on soft density-based clustering algorithms. Soft only by a few close points. This approach can be consid-
clustering algorithms are modelled within either fuzzy set, ered dual to our first extension, the Fuzzy Core DBSCAN
probability theory or possibilistic typicalities to allow assign- algorithm (Bordogna and Ienco 2014), since we fuzzify the
ing objects to clusters with a full or a partial membership minimum number of points min Pts defining the local neigh-
degree, in this latter case with the possibility for an object to bourhood density, while the distance is maintained crisp.

123
D. Ienco, G. Bordogna

As a consequence, the membership degree of a point to the heterogeneous. To remove noise they first map the distance of
fuzzy core depends on the number of points in its crisp neigh- any point from its k-neighbours and rank the distance values
bourhood. By this choice, and the computation of the local in decreasing order; then they determine the threshold θ on
density based on the classic set cardinality, a point is assigned the distance which corresponds to the first minimum on the
to only one cluster to a distinct extent, thus generating non- ordered values. All points in the first ranked positions having
overlapping clusters with possibly fuzzy cores. Clusters may a distance above the thresholds θ are deemed noisy points
have cores with faint profiles reflecting low density of the and are removed, while the remaining points will belong to
clusters’ nucleus. a cluster. Only these latter points are clustered with the clas-
FNDBSCAN (Parker and Downs 2013) is closer to our sec- sic DBSCAN providing as input parameters min Pts = K
ond extension, named FBorder, in which we fuzzify only the and = θ . By adopting this same procedure, we can imple-
membership of objects belonging to the border of clusters, ment the proposed algorithms: we can determine the most
this way allowing their partial overlapping. Nevertheless, appropriate distance M ax = θ (which delimits the support
differently than FNDBSCAN, FBorder grows clusters’ cores of the membership function defining the approximate size of
around points characterized by homogeneous local density, the local neighbourhood). This way, depending on the data
thus generating clusters with crisp, not overlapping and set, we remove noise and then apply one of the proposed
homogeneously dense cores. algorithms on the remaining points.
Kriegel and Pfeifle (2005) algorithm is employed to clus- Finally, the extension of DBSCAN with fuzzy logic
ter objects whose position is ill-known. The authors propose reported in Shamshirband et al. (2014) shares with our exten-
the FDBSCAN algorithm in which a fuzzy distance mea- sion the idea of generating clusters with distinct fuzziness
sure is defined as the probability that an object is directly properties, as specified by the fuzzy rules: specifically, a
density-reachable from another objects. This problem can hybrid clustering method is introduced, namely a density-
be modelled by our third extension, named FDBScan, that based fuzzy imperialist competitive clustering algorithm
allows defining the local neighbourhood density of any object (D-FICCA), to detect malicious behaviours in wireless sen-
by specifying an approximate number of objects within an sor networks (WSNs) with the aim to enhance the accuracy
approximate maximum distance, thus capturing the uncer- of malicious detection. A density-based clustering algorithm
tainty on the positions of the moving objects, and generating helps to improve the imperialist competitive algorithm for
fuzzy clusters with both faint cores and fuzzy overlapping the formation of arbitrary cluster shapes as well as han-
border. Finally, the most recent soft extension of DBSCAN dling noise. The fuzzy logic controller is introduced to avoid
has been proposed in Smiti and Eloudi (2013) where the possible errors of the worst imperialist action selection strat-
authors combine the classic DBSCAN with the Fuzzy C- egy. The results demonstrate that the proposed framework
Means algorithm (Bezdek et al. 1984) proposing a method achieves higher detection accuracy compared to existing
called soft-DBSCAN. They detect seeds points by the classic approaches.
DBSCAN and in a second phase they compute the degrees of
membership to the clusters around the seeds by relying on
the Fuzzy C-Means clustering algorithm. A similar objective 3 Classic DBScan algorithm
of selecting the seeds to feed the Fuzzy C-Means is pursued
by the mountain method proposed in Yager and Filev (1994). For sake of clarity in the following, we will consider a set
Nevertheless, these extensions do not grow the clusters of objects represented in multidimensional feature space. We
by applying density-reachable criteria as in our proposed can figure out these objects as either cars, taxi cabs, airplanes
approaches. Distinct density characteristics of clusters: faint represented in the feature space defined by their geographic
cores and not overlapping distributions are modelled by coordinates (both 2D or 3D). DBSCAN can be applied to
Fuzzy Core DBSCAN; semi-overlapping distributions with group these objects based on their local densities in the fea-
homogeneous dense cores by our Fuzzy Border DBSCAN ture space. For example, this makes it possible to identify
extension; finally, faint cores and semi-overlapping distri- traffic jams of cars on the roads.
butions by the third extension Fuzzy DBSCAN. Specifically, DBSCAN assigns points of the feature space
Another important issue when using a clustering algorithm defined on R × R × R · · · × R to particular clusters or des-
on big data is its scalability. In this respect, Parker et al. (2010) ignates them as outliers or noise if they are not sufficiently
proposes a scalable implementation of the FN-DBSCAN, close to other points. It determines cluster assignments by
named SFN-DBSCAN, with the objective of improving the assessing the local density at each point using two para-
efficiency when dealing with big data sets. Another efficient meters: distance radius () and minimum number of points
implementation is proposed in Ester et al. (1996). It tackles (min Pts) that must exists within the neighbourhood of
the problem of clustering a huge number of objects strongly the point. A single point which meets the minimum den-
affected by noise when the scale distributions of objects are sity criterion, namely it has min Pts located within distance

123
Fuzzy extensions of the DBScan clustering algorithm

, is designated a core point. Formally, given a set of points Fuzzy Core DBSCAN, for short FCore, is obtained by con-
P = ( p1 , p2 , . . . , pn ), p is a core point if at least a minimum sidering crisp distances and by introducing an approximate
number min Pts of points exist s.t p j ∈ P and || p− p j || < , value of the minimum cardinality of the local neighbourhood
where ||x|| is the Euclidean distance in the n-dimensional fea- of a point. This can be done by substituting the value min Pts
ture space. Two core points pi and p j with i = j belong to with a soft constraint defined by a monotonic non-decreasing
the same cluster c if || pi − p j || < . Both are core points of membership function on the domain of the positive integers.
c ( pi , p j ∈ cor e(c)). All the points that are not core points This soft constraint specifies the minimum approximate num-
and lie within the maximum distance from a core point of ber of points that are required in the local neighbourhood of a
a cluster c are defined as border points of c: p ∈ / cor e(c) is point for starting the generation of a fuzzy core. Let us define
a border point of c if ∃ pi ∈ cor e(c) with || p − pi || < . the piecewise linear membership function as follows:
Finally, the points that are not part of any cluster are consid- ⎧
⎪
⎨1, if n̂ ≥ M pts Max
ered noisy points: p ∈ / cor e(c) is noise if ∀c, pi ∈ cor e(c) n̂−M pts Min
μmin P (n̂) , if M pts Min < n̂ < M pts Max
with || p − pi || < . In the following, the classic DBSCAN ⎪ M pts Max −M pts Min
⎩
algorithm is formalized: 0, if n̂ ≤ M pts Min
(1)
Algorithm 1 DBSCAN(P,,Min Pts)
Require: P : dataset of points
Require: : the maximum distance around a point defining its local neighbourhood This membership function gives the value 1 when the
Require: Min Pts : minimum local density, in points, around a point to be a candidate number n̂ of elements in the neighbourhood of a point is
core point
1: C = 0 greater than M pts Max , a value 0 when n̂ is below M pts Min
2: Cluster s = ∅ and intermediate values when n̂ is in between M pts Min and
3: for all p ∈ P s.t. p is unvisited do
4: mark p as visited M pts Max .
5: neighboursPts = regionQuery( p, )
6: if (si zeo f (neighbour s Pts) <= Min Pts ) then Since users may find it difficult to specify the two values
7: mark p as NOISE M pts Min and M pts Max in the case of big data, we can try
8: else
9: C = next cluster to automatically suggest two appropriate values. This can
10: Cluster s = Cluster s ∪ expandCluster( p, neighbour s Pts, C, , Min Pts ) be done by mapping the number of points of the data sets
11: end if
12: end for which are at a maximum distance among each other below
13: return Cluster s , for increasing values of . This function is monotonically
not decreasing: we then suggest the values of the functions
Algorithm 2 ex pandCluster ( p, neighbour s Pts, C, ,
corresponding to the first two flexes as the appropriate values
Min Pts)
Require: p: the point just marked as visited
of M pts Min and M pts Max .
Require: neighbour s Pts : the neighbourhood of p Another approach is to allow a user to specify two percent-
Require: C : the actual cluster
Require: the distance around a point to compute its local density
age values, %M pts Min and %M pts Max on the total dataset
Require: Min Pts : local density, in points, defining the minimum cardinality of the size, measured in number of objects, and then convert these
neighbourhood of a point to be a candidate core point
1: add p to cluster C percentages to determine M pts Min and M pts Max as follows:

2: for all p ∈ neighbour s Pts do M pts Min = round(%M pts Min ∗N ) and pts Max = round

3: if p is not visited then

(%M pts Max ∗N ), in which N is the total number of objects
4: mark p as visited
in the data set and r ound(m) returns the closest integer to m.

5: neighbour s Pts = regionQuery( p , )
6:

if si zeo f (neighbour s Pts ) > Min Pts then
Let us now define the fuzzy core. Considering a set P of
7: neighbour s Pts = neighbour s Pts ∪ neighbour s Pts
N objects represented by N points p1 , p2 , . . . , p N in the n-
8: end if dimensional space R n , so that each pi has the coordinates
9: end if
10: if p is not yet member of any cluster then xi1 , xi2 , . . . , xin .

11: add p to cluster C Given a point p ∈ P, if n̂ points pi exist in the local neigh-
12: end if
13: end for bourhood of point p, i.e. with pi − p < , s.t. μmin P (n̂) >
14: return C 0 then p is a fuzzy core point with membership degree
to the fuzzy core given by Fuzzycor e( p) = μ Min P (n̂) If
two fuzzy core points pi , p j with Fuzzycor e( pi ) > 0 and
4 Generating clusters with distinct fuzzy Fuzzycor e( p j ) > 0 ∃ with i = j s.t. pi − p < then
characteristics they belong to the same cluster c ( pi , p j ∈ c) and both
are fuzzy core points of c, ( pi , p j ∈ f uzzycor e(c)) with
4.1 Generating clusters with fuzzy cores membership degrees f uzzycor ec ( pi ) = Fuzzycor e( pi )
and f uzzycor ec ( p j ) = Fuzzycor e( p j ). They belong to the
The first extension of the classic DBSCAN algorithm we pro- cluster with membership degree μc ( pi ) = Fuzzycor e( pi )
posed in Bordogna and Ienco (2014), named and μc ( p j ) = Fuzzycor e( p j ).

123
D. Ienco, G. Bordogna

A point p of a cluster c is a border point if it is not a fuzzy point p is crisp, while we introduce a fuzzy assignment (line
core point and ∃ pi ∈ f cor e(c) s.t. pi − p < then p 1) modelled by the fuzzy function μ Min P () defined in Eq. 1.
gets a membership degree to c defined as: The same function is employed when a new fuzzy core point
is detected (line 8). Also in this case, firstly we verify the

density around a given point p w.r.t. M pts Min and then, if
μc ( p) = min pi ∈neighcor e( p) f uzzycor ec ( pi ) (2) the point satisfies the soft constraint to a positive degree, we
add the point to the fuzzy core of cluster C with its associated
where neighcor e( p) = { pi s.t. f uzzycor ec ( pi ) > 0 ∧ membership value.
pi − p < }.
Finally, points p that are neither fuzzy core points nor
border points are considered as noisy points Algorithm 3 Fuzzy Core DBSCAN(P,,M pts M in,
Notice that, the points belonging to a cluster c get dis- M pts Max )
Require: P : dataset of points
tinct membership values to the cluster reflecting the number Require: : the maximum distance around a point defining the point neighbourhood
of their neighbours within a maximum distance . This def- Require: M pts Min , M pts Max : soft constraint on the density around a point to be a
candidate fuzzy core point
inition allows generating fuzzy clusters with a fuzzy core, 1: C = 0
where the membership degrees represent the variable cluster 2: Cluster s = ∅
3: for all p ∈ P s.t. p is unvisited do
density. 4: mark p as visited
Moreover, a border point p can partially belong to a single 5: nPts = regionQuery( p, )
6: if (si zeo f (n Pts) ≤ M pts Min ) then
cluster c since its membership degree is upperbounded by the 7: mark p as NOISE
8: else
minimum membership degree of its neighbours fuzzy core 9: C = next cluster
points. Notice that, this algorithm does not generate overlap- 10: Cluster s = Cluster s ∪ expandClusterFuzzyCore
( p, n Pts, C, , M pts Min , M pts Max )
ping fuzzy clusters, but the support of the fuzzy clusters is 11: end if
still a crisp partition as in the classic DBSCAN: ci ∩ c j = 12: end for
13: return Cluster s
Further property, the FCore DBSCAN reduces to the clas-
sic DBSCAN when the input values M pts Min = M pts Max :
in this case FCore DBSCAN produces the same results of
the classic DBSCAN with min Pts = M pts Min = M pts Max
and same distance . In fact, the level-based soft condition Algorithm 4 ex pandCluster FuzzyCor e( p, n Pts, C, ,
imposed by μmin P is indeed a crisp condition μ Min P (x) ∈ M pts Min , M pts Max )
{0, 1} on the minimum number of points defining the local Require: p: the point just marked as visited
Require: n Pts : the points in the neighbourhood of p
density of the neighbourhood: μmin P (n̂) = 0 when the num- Require: C : the actual cluster
ber of points n̂ within a maximum distance of any point p is Require: the distance around a point to compute its density
Require: M pts Min , M pts Max : soft constraint on the density around a point to be a
less than min Pts = M pts Min = M pts Max , on the contrary candidate fuzzy core point
μmin P (n̂) = 1. In this case, the membership degrees of all 1: add p to C with membership Fuzzycor e( p) = μ Min P (|n Pts|)

2: for all p ∈ n Pts do
fuzzy core points is 1, and thus the fuzzy core reduces to a
3: if p is not visited then
crisp core as in the classic DBSCAN.
4: mark p as visited

The border points are thus defined as in the classic 5: n Pts = regionQuery( p , )

approach too, since their membership degrees are the mini- 6: if si zeo f (n Pts ) > M pts Min then

7: n Pts = n Pts ∪ n Pts
mum of the membership degrees of the core points in their 8:

add p to C with membership Fuzzycor e( p ) = μ Min P (|n Pts |)

neighbourhood, which in the crisp case is always 1. 9: end if

10: if p is not yet member of any cluster then
The fuzzy procedure is sketched in Algorithms 3 and 4. 11:

add p to C (as border point)
Considering the outer loop of the process (Algorithm 3), the 12: end if
difference with the original version (Algorithm 1) lies at line 13: end if
14: end for
6. 15: return C
In the fuzzy version, a point is marked as NOISE if its
neighbourhood size is less than or equal to M pts Min , other-
wise it will be a fuzzy core point with a given membership
value. Once the point is recognized as fuzzy core point 4.2 Generating clusters with overlapping fuzzy border
the procedure ex pandCluster FuzzyCor e is called (Algo- and classic core points
rithm 4).
As in the classical DBSCAN, this procedure is devoted to A second extension of DBSCAN, named
find all the reachable points from p and to mark them as core Fuzzy Border DBSCAN (FBorder), can be defined by allow-
or border points. In the original version, the assignment of the ing the specification of an approximate value of the maximum

123
Fuzzy extensions of the DBScan clustering algorithm

distance instead of asking for a precise numeric parame- c. While a point p is considered as noise if ∀c pi ∈
ter and in defining a soft constraint with a monotonic not cor e(c) s.t. μdist ( pi , p) > 0.
increasing membership function on the positive real domain The strategy is outlined in Algorithm 5 and 6. The outer
of distance values. The soft constraint defines the concept of loop (Algorithm 5) starts the process. Given a point, the
fuzzy neighbourhood size, so that a point can belong to the neighbourhood is selected considering Min . If the Min Pts
fuzzy neighbourhood of another point to a degree in (0,1]. constraint is not satisfied the point is initially marked as
This allows computing a gradual membership to the clusters. NOISE otherwise the creation of a new cluster begins, and the
Differently than the proposal of Nasibov and Ulutagay procedure ex pandCluster Fuzzy Bor der is called. Algo-
(2009) we allow to specify the membership function on the rithm 6 tries to expand the current cluster C as much as
distance as a soft constraint with piecewise linear shape possible. The difference with the original version of DBSCAN
defined by two values Min and Max so that when the lies in the way the border points are managed and detected.
distance is smaller than Min the membership degree is max- Here, we employ a temporary structure f uzzy Bor der Pts
imum (1), when it is greater than Max its membership is null to collect the current set of border points. Border points are
(0) and it decreases linearly when it is in between Min and points with density lower than Min Pts (line 6) but, differ-
Max : ently from the original algorithm, a point can be a border
⎧ point if it is reasonably at a distance from the cluster in
⎪
⎨1, if p − pi || ≤ Min between Min and Max . To verify this second condition,
Max − p− pi ||
μdist ( p, pi) = , if Min < p − pi || < Max we query the neighbourhood of a point p for both Min and
⎪ Max − Min
⎩
0, if p − pi || > Max Max distances (line 2 and 8). Formula 4 specifies that the
(3) membership of a border point is the minimum of the mem-
berships μdist between the point and all the core points of
the cluster directly reachable. In order to compute the min-
In this definition, p − pi || can be defined as either the imum, we need first to detect all core points of the cluster
Euclidean distance or the complement of a cosine similarity and then compute the μc (·) for all the border points (line 15–
distance or any other distance measure more suitable in the 18). Line 15 is particularly important because a point that
application context. was inserted in the temporary structure f uzzy Bor der Pts,
We can then redefine a core point of a cluster with fuzzy successively can verify the condition to be a core point. The
border: given a point p if at least a number min Pts of points difference between the two sets ( f uzzy Bor der Pts and C)
P = { p1 , . . . , pmin Pts }∃ s.t. ∀ pi ∈ P, μdist ( pi, p) = 1 then guarantees that only the border points are considered after
p is a core point. line 15. Note that when min = max this extension reduces
If two core points pi p j with i = j and μdist ( pi , p j ) = 1 to the classic DBSCAN algorithm, since a point will get from
then pi , p j belongs to c, i.e. they define a cluster c with fuzzy Eq. 1 either a zero or a full (1) membership degree to the clus-
border and are core points of c, i.e. pi , p j ∈ cor e(c) and thus ter. This extension is very similar to the approach proposed
they get a membership degree to the cluster μc ( p) = 1. in Nasibov and Ulutagay (2009), since we fuzzify the input
A point p of a cluster that is not a core point is a parameter too. Nevertheless, in our proposal, the core is
fuzzy border point if it satisfies the following: ∀ p s.t. p ∈ / still crisp and not fuzzy as in Nasibov and Ulutagay (2009).
cor e(c) and pi ∈ cor e(c) and μdist ( pi , p) > 0 then p gets Further, differently than in the previous cited paper, minPts
a membership degree to the fuzzy border of cluster c defined is still a numeric value that defines the local density of a core
as: as in the classic DBSCAN. This allows generating fuzzy clus-
ters with a crisp core, and a fuzzy border. More clearly, in
this extension of DBSCAN, all generated clusters have cores
μc ( p) = min pi ∈neighcor e( p) μdist ( p, pi ) (4) with same density but that may differ for the density of their
border, which may have faint overlapping profiles.
where neighcor e( p) = { pi ∈ cor e(c) ∧ μdist ( p, pi ) > 0}
This definition allows generating fuzzy clusters with faint
borders. 4.3 Generating clusters with fuzzy cores and
Moreover, a point p can partially belong to the fuzzy overlapping fuzzy border
borders of more clusters at the same time with distinct mem-
bership values. This allows generating fuzzy clusters with In this subsection, we introduce how to model fuzziness over
overlapping boundaries, i.e. semi-overlapping fuzzy clus- both cores and borders in order to subsume the previous pro-
ters. This is guaranteed by the condition for the selection posed approaches into what is named Fuzzy DBSCAN, i.e.
of the points to be evaluated as border points of clus- F D B Scan. The two soft constraints defined in (1) and (3)
ters which requires that μc ( p) < 1 for each generated replace both min Pts and to allow the definition of the

123
D. Ienco, G. Bordogna

Algorithm 5 Fuzzy Border DBSCAN(P, Max , Min , If μmin P (dens( p)) > 0 then the point p belongs to the
Mint Pts) fuzzycore of a certain cluster with a membership degree
Require: P : dataset of points Fuzzycor e( p) = μmin P (dens( p))
Require: Min Pts : the minimum density around a point to be a candidate core point
Require: Min , Max : soft constraint on the distance around a point defining the point If μmin P (dens( p)) = 0, then p is a border or a noise
fuzzy neighbourhood size point.
1: C = 0
2: Cluster s = ∅ If in the local neighbourhood of a fuzzy core point pi there
3: for all p ∈ P s.t. p is unvisited do exists another fuzzy core point p j , then a cluster c is gener-
4: mark p as visited
5: nPts = regionQuery( p, Min ) ated: ∃ pi , p j , s.t. μdist ( pi , p j ) > 0∧ Fuzzycor e( pi ) > 0∧
6: if (si zeo f (n Pts) ≤ Min Pts ) then
7: mark p as NOISE Fuzzycor e( p j ) > 0 then f uzzycor ec ( pi ) =
8: else Fuzzycor e( pi ) ∧ f uzzycor ec ( p j ) = Fuzzycor e( p j ).
9: C = next cluster
10: Cluster s = Cluster s ∪ expandClusterFuzzyBorder A point p that is not a fuzzy core point is a fuzzy border
( p, n Pts, C, Max , Min , Min Pts ) point of a cluster c, if it satisfies the following condition:
11: end if
12: end for ∃ pi and ∃ p s.t. f uzzycor e( pi ) = 0 ∧ μdist ( p, pi ) > 0 ∧
13: return Cluster s f uzzycor ec ( p) > 0.
If a point is a border point it cannot be a fuzzy core point
Algorithm 6 ex pandCluster Fuzzy Bor der ( p, n Pts, C, of any cluster:
Max , Min , Min Pts)
Require: p: the point just marked as visited c s.t. f uzzycor ec ( p) > 0
Require: n Pts : the points in the neighbourhood of p
Require: C : the actual cluster
Require: Max , Min : soft constraint on the distance around a point defining its fuzzy If all the conditions are respected we define p as a fuzzy
neighbourhood size
Require: Min Pts : the density around a point to be considered a core point border point of a cluster c with a membership function to the
1: add p to C (as core point) cluster defined as:
2: f uzzy Bor der Pts = r egion Quer y( p, Max ) \ n Pts

3: for all p ∈ n Pts do

4: mark p as visited μb( p) = min pi ∈neigh f cor e( p) (min( f uzzycor ec ( pi ),

5: n Pts = regionQuery( p , Min ) μdist ( p, pi ))) (6)

6: if si zeo f (n Pts ) > Min Pts then

7: n Pts = n Pts ∪ n Pts
8:

f uzzy Bor der Pts = r egion Quer y( p , Max ) \ n Pts
where neigh f cor e( p) = { pi s.t. f uzzycor ec ( pi ) > 0 ∧

9: f uzzy Bor der Pts = f uzzy Bor der Pts ∪ f uzzy Bor der Pts μdist ( p, pi ) > 0}

10: add p to C (as core) The procedures are described in Algorithms 7 and 8.
11: else
12: f uzzy Bor der Pts = f uzzy Bor der Pts ∪ p The general schema is similar to the original DBSCAN.
13: end if The main difference concerns the decision between core
14: end for
15: f uzzy Bor der Pts = f uzzy Bor der Pts \ C and border points which is made by considering μmin P (·)

16: for all p ∈ f uzzy Bor der Pts do and the possibility of a point to belong to multiple clus-

17: add p to C (as border point) with membership μc ( p ) Equation (4)
18: end for ters. Note that, this algorithm reduces to either FCore when
19: return C M pts Min = M pts Max or to FBorder when Min = Max

fuzzy local density and the fuzzy local neighbourhood size Algorithm 7 Fuzzy DBSCAN(P, Max , Min ,M pts Max ,
of points respectively: M pts Min )
Require: P : dataset of points
Require: Min , Max : soft constraint on the distance around a point defining the point
– a soft constraint specified by two values (M ptsmin ≤ fuzzy neighbourhood size
Require: M pts Min , M pts Max : soft constraint on the density around a point to be
M ptsmax ) on the Natural domain defines a fuzzy local considered as fuzzy core point
1: C = ∅
dense region; 2: Cluster s = ∅
– a soft constraint specified by a pair (min ≤ max ) on the 3: for all p ∈ P s.t. p is unvisited do
4: mark p as visited
positive reals defines the local fuzzy neighbourhood size 5: nPts = regionQuery( p, Max )
of a point. 6: dens(p) = as in equation (5)
7: if (μmin P (dens( p)) == 0) then
8: mark p as NOISE
9: else
We define the local density of a point p as follows: 10: C = next cluster
11: Cluster s = Cluster s ∪ expandClusterFuzzy
( p, n Pts, C, Max , Min , M pts Max , M pts Min )
dens( p) = μdist ( p, pi ) (5) 12: end if
13: end for
pi ∈neigh( p,max ) 14: return Cluster s

where neigh( p, Max ) = { pi s.t. pi − p < Max }

123
Fuzzy extensions of the DBScan clustering algorithm

Algorithm 8 ex pandCluster Fuzzy( p, n Pts, C, Max , Min , Table 1 Dataset characteristics

M pts Max , M pts Min ) Dataset No. of instances No. of attributes No. of classes
Require: p: the point just marked as visited
Require: n Pts : the points in the fuzzy neighbourhood of p Breast 699 9 2
Require: C : the actual cluster
Require: Max , Min : soft constraint on the distance around a point to compute its Diabetes 768 8 2
fuzzy neighbourhood
Require: M pts Min , M pts Max : soft constraint on the density around a point to be
Ecoli 336 7 8
considered a fuzzy core point Glass 214 9 6
1: add p to C (as core) with membership μ Min P (dens( p))
Iris 150 4 3
2: for all p ∈ n Pts do

3: mark p as visited Parkinsons 197 23 2

4: if μmin P (dens( p )) > 0 then Vehicle 846 18 4

5: n Pts = regionQuery( p , Max )

6: n Pts = n Pts ∪ n Pts
7: add p to C (as core) with membership μ Min P (dens( p ))
8: else quantities need to be summed up, the final space complexity
9: add p to C (as border point) with membership computed by equation (6)
10: end if is O(n 2 ).
11: end for
12: return C

6 Experiments

5 Computational complexity In this section, we discuss the evaluation of our pro-

posals on real-world datasets by comparing them w.r.t.
In this section, we introduce a discussion about the com- state of the art soft clustering approaches. We choose as
putational complexity of the different approaches we pro- competitors the Fuzzy C-Means algorithm (Bezdek et al.
pose. Regarding the time complexity of the three pro- 1984) (FCM) due to its popularity, and the (FN-DBSCAN)
posals: Fuzzy Core DBSCAN, Fuzzy Border DBSCAN and (Smiti and Eloudi 2013) (Soft-DBSCAN) as representa-
Fuzzy DBSCAN, all of them have the same complexity of tive of fuzzy density-based approaches extending DBSCAN.
the original DBSCAN. In the DBSCAN algorithm, the com- The comparison includes all the three fuzzy DBSCAN
putational time is mainly influenced by the number of time extensions we introduced: Fuzzy Core DBSCAN (FCore),
the function r egion Quer y(·, ·) is invoked. If we support this Fuzzy Border DBSCAN (FBorder) and Fuzzy DBSCAN
operation with a spatial indexing structure like an R-Tree, (FDBScan). In order to benchmark all the different
we can avoid a linear search and perform such operation in approaches, we use datasets from the UCI Machine Learn-
O(log n), where n is the number of elements in the dataset. In ing Repository2 whose characteristics are reported in Table 1.
the Fuzzy variants, it can happen to traverse the same object More in detail, we use seven datasets with different character-
multiple times because we can reach it from different start- istics (number of instances, number of features and number
ing points. This means that the r egion Quer y(·, ·) can be of classes). We summarize the data characteristics in Table 1.
applied more than once for the same element. To avoid extra The behaviour of the different algorithms is evaluated by
computation, we can simply employ an hash table to store computing both external and internal measures of validity
the set of retrieved elements. Before performing the costly of the results, which express the conformity of the results
r egion Quer y(·, ·) action, we check if the neighbour points with the a-priori classifications (external measures) and the
are already in the hash table, otherwise we perform the query optimization of an objective function (internal measures).
and we store the results in the hash table for future use. In For the FCM algorithm we use the implementation available
the case of the original DBSCAN algorithm, the worst case under the R 3 statistical computing software. We set the value
complexity is O(n 2 ) that involves the case in which no spa- of the fuzzification parameter m equals to 2, that controls the
tial indexing structure is employed or the parameter are not fuzziness of cluster boundaries, and the number of clusters
carefully set (e.g. all points are within a distance less than ). equals to the number of classes. This way we drive the FCM
Considering our three fuzzy extensions, for all three cases, clustering with correct information, thus favourably biasing
the computational complexity is the same (O(n 2 )) as the one its results. We run the FCM 50 times and then average the
of the original DBSCAN since the general schemas are very results thus obtained.
similar. From a practical point of view, we have observed For the Soft-DBSCAN approach we vary the M pts para-
that the three approaches behave similarly with an average meter between 2 and 15 and the threshold between 0.1 and
computational complexity lower than the worst case. 1.0 with step of 0.05.
Regarding the space complexity, the materialization of the
pointwise distance matrix implies a cost of O(n 2 ), while in 2 https://archive.ics.uci.edu/ml/datasets.html.
the worst case, the hash table can be O(n 2 ). Since the two 3 http://www.r-project.org/.

123
D. Ienco, G. Bordogna

For FCore, FBorder and FDBScan we vary the soft con- where
straints considering all the possible values combination in
the previous intervals. For each method we retain the solu- i∈C j ∩Dcl μi j
FPrecision(C j , cl) =
tion with the least number of noise points for, in principle, |C j |
the used datasets should not contain noise.
and

6.1 Internal and external clustering validity measures i∈C j ∩Dcl μi j
FRecall(C j , cl) =
|Dcl |
The clustering results are assessed under both internal and
external validity measures. As internal criteria we choose the and the final Fuzzy F-Measure is defined as:
Partition Coefficient (Guillén et al. 2007) and the Fuzzy Per-
formance Index (Smiti and Eloudi 2013), while we employ |C j |
× Fuzzy F-Measure(C j , cl)
the Fuzzy F-Measure as external one (Suanmali et al. 2009), |D|
C j ∈C
which is a combination of Recall and Precision.
We define with D the dataset, with |D| the size of the
Each cluster C j is associated with the class cl that maximizes
dataset, with Dcl the instances of the dataset belonging to
the corresponding Fuzzy F-Measure(C j , cl). The final solu-
class cl, and with C the obtained cluster solution. We indicate
tion is a weighted sum between the Fuzzy F-Measure of a
with μi j the membership degree of the ith object to the jth
cluster C j and its importance considering the clustering solu-
cluster.
tion.
The Partition Coefficient (Guillén et al. 2007) is calculated
as follows:
6.2 Results

|D| |C|
1 We report the evaluation results of the different approaches
PC = × μi2j
|D| in Tables 2, 3 and 4. Table 2 shows the results in term of
i=1 j=1
Fuzzy F-Measure. We can observe that, most of the time,
our proposals outperform the competitors. Considering the
This internal evaluation measure allows to compute the Breast and (Iris) datasets, FCM obtains the highest score,
amount of overlap between clusters. High values of this mea- while our strategies still obtain reasonable and competitive
sure indicate more cluster cohesion and density. results. Regarding the comparison among the three differ-
As second internal measure we employ the Fuzzy Partition ent fuzzy extensions we proposed, we can observe that the
Index (Smiti and Eloudi 2013). This measure is defined as: FBorder strategy always reaches the same or the best score
⎛ ⎞ in term of Fuzzy F-Measure w.r.t. the other extensions. This
|D|
|C| model, contrary to the others we proposed, allows a fuzzi-
|C| μi2j
FPI = 1 − × ⎝1 − ⎠ ness degree only for border points, while it considers that core
|C| − 1 |D|
i=1 j=1 points can belong to only one cluster. The empirical results
underline that the assumption behind the FBorder well fits
This measure evaluates the degree of separation of the the underlined data distribution of the real-world benchmark
fuzzy partition produced by the clustering algorithm. More we considered.
in detail, the Fuzzy Partition Index quantifies the average Tables 3 and 4 summarize the results in term of Partition
cohesion of the clusters according to the membership func- Coefficient and Fuzzy Partition Index of the different algo-
tion of the element of each cluster. Also for this measure, rithms. Both measures highlight the quality of our new fuzzy
high values indicate more cluster cohesion. DBSCAN extensions. We can see that all the three extensions
The external measure we use is the Fuzzy F-Measure yield high values for the internal measures and outperform
(Suanmali et al. 2009). This measure is a fuzzy adaptation any of the competitors.
of the standard F-Measure commonly involved to compare In order to explain this result, we deeply inspected the
clustering results with the reference classification. First of all different clustering solutions. We observed that, first, the
we define the Fuzzy F-Measure for a cluster C j given a class Soft-DBSCAN and the FCM algorithms assign each object
cl as: to more clusters than the FCore, FBorder and FDBScan.
Second, for an object its membership values distribution over
all fuzzy clusters has often a multi-modal shape for both the
FPrecision(C j , cl) ∗ FRecall(C j , cl) Soft-DBSCAN and the FCM. This means that, several clus-
FMeasure(C j , cl) = 2 ×
FPrecision(C j , cl) + FRecall(C j , cl) ters share high membership values for the same object. This

123
Fuzzy extensions of the DBScan clustering algorithm

Table 2 Achieved Fuzzy

Dataset FCore FBorder FDBScan Soft-DBSCAN FCM
F-Measure of the different
methods over the UCI datasets Breast 0.76 0.79 0.77 0.42 0.88
Diabetes 0.73 0.75 0.71 0.56 0.57
Ecoli 0.60 0.61 0.61 0.32 0.38
Glass 0.55 0.74 0.48 0.25 0.35
Iris 0.78 0.78 0.78 0.39 0.8
Parkinsons 0.79 0.84 0.78 0.46 0.6
Vehicle 0.41 0.41 0.41 0.30 0.34
The best results yielded by one of the tested algorithms on each dataset are marked in bold

Table 3 Achieved Partition

Dataset FCore FBorder FDBScan Soft-DBSCAN FCM
Coefficient of the different
methods over the UCI datasets Breast 0.95 0.98 0.97 0.58 0.84
Diabetes 0.96 0.98 0.97 0.50 0.57
Ecoli 1.0 1.0 1.0 0.38 0.31
Glass 0.89 0.81 0.92 0.46 0.43
Iris 1.0 1.0 1.0 0.68 0.74
Parkinsons 0.97 0.79 0.97 0.41 0.61
Vehicle 1.0 1.0 1.0 0.53 0.42
The best results yielded by one of the tested algorithms on each dataset are marked in bold

Table 4 Achieved Fuzzy

Dataset FCore FBorder FDBScan FN-DBSCAN FCM
Performance Index of the
different methods over the UCI Breast 0.9 0.96 0.94 0.37 0.68
datasets
Diabetes 0.92 0.96 0.94 0 0.14
Ecoli 1.0 1.0 1.0 0.07 0.21
Glass 0.78 0.72 0.84 0.28 0.43
Iris 1.0 1.0 1.0 0.36 0.61
Parkinsons 0.94 0.72 0.94 0.12 0.22
Vehicle 1.0 1.0 1.0 0.04 0.23

is not the case for our fuzzy DBSCAN extensions where, in contain core points with different membership values, thus
theory, an object can belong to multiple clusters but, in prac- allowing to detect clusters with heterogeneous densities of
tice, it has membership degrees greater than zero for a limited their nucleus with a single run of the algorithm. The second
number of clusters (usually no more than two or three clus- extension, Fuzzy Border DBSCAN, allows generating semi-
ters), which seems a reasonable characteristics of real data overlapping clusters with fuzzy border and homogeneous
distributions. dense cores. The third extension, Fuzzy DBSCAN, combines
the previous ones thus detecting clusters with both fuzzy core
and fuzzy border points, i.e. heterogeneous dense cores and
7 Conclusion overlapping borders.
The main novelty of the proposal is the intent to con-
In this contribution, we presented three fuzzy extensions of trol distinct fuzzification characteristics of the clusters that
the DBSCAN clustering algorithm, to the aim of modelling can be generated when using a clustering algorithm, thus
distinct density-based characteristics of the objects spatial suiting distinct application domains, such as user commu-
distributions in the feature space. The main characteristics nity detection in social networks with partial membership
of these algorithms are the definition of distinct soft con- either to disjoint communities Fuzzy Core DBSCAN or to
straints to specify the approximate local density of points semi-overlapped communities Fuzzy Border DBSCAN, and
needed for generating a cluster. Specifically, the first exten- ecosystems detection in satellite images Fuzzy DBSCAN.
sion, Fuzzy Core DBSCAN allows assigning a core point to Furthermore, besides leveraging the specification of the
a cluster with a membership value; in doing so, clusters can precise input, the proposals supply with a single run a solu-

123
D. Ienco, G. Bordogna

tion that summarizes multiple runs of the original classic Ji Z, Xia Y, Sun Q, Cao G (2014) Interval-valued possibilistic fuzzy
DBSCAN algorithm. Experimental comparison w.r.t. state of c-means clustering algorithm. Fuzzy Sets Syst 253:138–156
Kriegel H, Pfeifle M (2005) Density-based clustering of uncertain data.
the art fuzzy clustering approaches over real-world datasets In: KDD’05 vol 17, pp 672–677
underlined the higher quality of the results produced by our Nasibov EN, Ulutagay G (2009) Robustness of density-based cluster-
proposals, which better model the fuzzy characteristics of the ing methods with various neighborhood relations. Fuzzy Sets Syst
real datasets. 160(24):3601–3615
Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-
means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–
Compliance with ethical standards
530
Parker J, Downs J (2013) Footprint generation using fuzzy-
Conflict of interest Dino Ienco and Gloria Bordogna declares that they neighborhood clustering. Geoinformatica 17:283–299
have no conflict of interest. Parker J, Hall L, Kandel A (2010) Scalable fuzzy neighborhood dbscan.
In: IEEE-fuzzy, pp 1–8
Ethical standard This article does not contain any studies with human Sander J, Ester M, Kriegel H, Xu X (1998) Density-based clustering in
participants or animals performed by any of the authors. spatial databases: the algorithm GDBSCAN and its applications.
Data Min Knowl Discov 2(2):169–194
Shamshirband S, Amini A, Anuar NB, Kiah LM, Wah TY, Furnell
S (2014) D-ficca: a density-based fuzzy imperialist competitive
References clustering algorithm for intrusion detection in wireless sensor net-
works. Measurement 55:212–226
Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering Smiti A, Eloudi Z (2013) Soft dbscan: improving dbscan clustering
algorithm. Comput Geosci 10(2):191–203 method using fuzzy set theory. Hum Syst Interact 1:380–385
Bordogna, G., Ienco, D.: Fuzzy core dbscan clustering algorithm. In: Suanmali L, Salim N, Binwahlan MS (2009) Fuzzy logic based method
IPMU, pp. 100–109 (2014) for improving text summarization. CoRR arXiv:0906.4690
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm Ulutagaya G, Nasibov E (2012) Fuzzy and crisp clustering methods
for discovering clusters in large spatial databases with noise. KDD based on the neighborhood concept: a comprehensive review. J
160:226–231 Intell Fuzzy Syst 23:1–11
Guillén A, González J, Rojas I, Pomares H, Herrera LJ, Valenzuela Yager R, Filev D (1994) Approximate clustering via the mountain
O, Prieto A (2007) Using fuzzy logic to improve a clustering method. IEEE Trans Syst Man Cybern 24(8):1279–1284
technique for function approximation. Neurocomputing 70(16–
18):2853–2860

123

Design of Fixed Bed Adsorption Columns
No ratings yet
Design of Fixed Bed Adsorption Columns
9 pages
Dbscan: Fast Density-Based Clustering With R: Michael Hahsler Matthew Piekenbrock
No ratings yet
Dbscan: Fast Density-Based Clustering With R: Michael Hahsler Matthew Piekenbrock
28 pages
Density Based Clustering
No ratings yet
Density Based Clustering
25 pages
A Fast DBSCAN Algorithm for Big Data Based on Efficient Density
No ratings yet
A Fast DBSCAN Algorithm for Big Data Based on Efficient Density
12 pages
An Empirical Evaluation of Density-Based Clustering Techniques
No ratings yet
An Empirical Evaluation of Density-Based Clustering Techniques
8 pages
32. DBSCAN - A simple fast DBSCAN algorithm for big data Author Shaoyuan Weng, Jin Gou and Zongwen Fan
No ratings yet
32. DBSCAN - A simple fast DBSCAN algorithm for big data Author Shaoyuan Weng, Jin Gou and Zongwen Fan
16 pages
20 - 1 - ML - Unsup - 03 - Dbscan Hdbscan
No ratings yet
20 - 1 - ML - Unsup - 03 - Dbscan Hdbscan
21 pages
Fuzzy DBScan
No ratings yet
Fuzzy DBScan
12 pages
Density ML
No ratings yet
Density ML
51 pages
An Improvement of DBSCAN Algorithm To Analyze Cluster For Large Dataset
No ratings yet
An Improvement of DBSCAN Algorithm To Analyze Cluster For Large Dataset
5 pages
Choosing DBSCAN Parameters
No ratings yet
Choosing DBSCAN Parameters
11 pages
SSRN Id3768295
No ratings yet
SSRN Id3768295
7 pages
Author's Accepted Manuscript: Pattern Recognition
No ratings yet
Author's Accepted Manuscript: Pattern Recognition
41 pages
DBSCAN AND OPTICS
No ratings yet
DBSCAN AND OPTICS
28 pages
Density-Based Methods: DBSCAN: Density-Based Clustering Based On Connected Regions With High Density
No ratings yet
Density-Based Methods: DBSCAN: Density-Based Clustering Based On Connected Regions With High Density
3 pages
ML Exp 9
No ratings yet
ML Exp 9
5 pages
Multi Density DBScan
No ratings yet
Multi Density DBScan
8 pages
Data Mining - Density Based Clustering
No ratings yet
Data Mining - Density Based Clustering
8 pages
KDD96 037
No ratings yet
KDD96 037
6 pages
DBSCAN Clustering in ML _ Density Based Clustering
No ratings yet
DBSCAN Clustering in ML _ Density Based Clustering
5 pages
Clustering Algorithm (Dbscan) : Vishal Bharti Computer Science Dept. GC, Cuny
No ratings yet
Clustering Algorithm (Dbscan) : Vishal Bharti Computer Science Dept. GC, Cuny
27 pages
Bde Dbscan
No ratings yet
Bde Dbscan
11 pages
Density Based
No ratings yet
Density Based
27 pages
DBSCAN_An_Assessment_of_Density_Based_Cl
No ratings yet
DBSCAN_An_Assessment_of_Density_Based_Cl
5 pages
GDBSCAN
No ratings yet
GDBSCAN
30 pages
Applying SR-Tree Technique in DBSCAN Clustering Algorithm
No ratings yet
Applying SR-Tree Technique in DBSCAN Clustering Algorithm
4 pages
Autoepsdbscan: Dbscan With Eps Automatic For Large Dataset: Manisha Naik Gaonkar & Kedar Sawant
No ratings yet
Autoepsdbscan: Dbscan With Eps Automatic For Large Dataset: Manisha Naik Gaonkar & Kedar Sawant
6 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
6 pages
Sihem Jebari
No ratings yet
Sihem Jebari
10 pages
Cluster Analysis
No ratings yet
Cluster Analysis
22 pages
Comparison of Density-Based Clustering Algorithms: Mariam Rehman
No ratings yet
Comparison of Density-Based Clustering Algorithms: Mariam Rehman
5 pages
M6
No ratings yet
M6
23 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
17 pages
Dbscan: Presented By: Garrett Poppe
No ratings yet
Dbscan: Presented By: Garrett Poppe
22 pages
DBSCAN.docx
No ratings yet
DBSCAN.docx
7 pages
Birch
No ratings yet
Birch
6 pages
Ktustudents - In: 1. Hierarchical Methods
No ratings yet
Ktustudents - In: 1. Hierarchical Methods
21 pages
Clustering
No ratings yet
Clustering
12 pages
4.6 Dbscan
No ratings yet
4.6 Dbscan
27 pages
VDBSCAN
No ratings yet
VDBSCAN
4 pages
DIP Lab 13 DBSCAN Clustering
No ratings yet
DIP Lab 13 DBSCAN Clustering
6 pages
A Survey of Some Density Based Clustering Techniques PDF
No ratings yet
A Survey of Some Density Based Clustering Techniques PDF
5 pages
Clustering Density Based
No ratings yet
Clustering Density Based
14 pages
DBSCAN Clustering Python
No ratings yet
DBSCAN Clustering Python
4 pages
A Distribution-Based Clustering Algorithm For Mining in Large Spatial Databases
No ratings yet
A Distribution-Based Clustering Algorithm For Mining in Large Spatial Databases
8 pages
Knn Block Dbscan
No ratings yet
Knn Block Dbscan
15 pages
ST-DBSCAN: An Algorithm For Clustering Spatial-Temporal Data
No ratings yet
ST-DBSCAN: An Algorithm For Clustering Spatial-Temporal Data
14 pages
Chatgpt Unit - 4
No ratings yet
Chatgpt Unit - 4
4 pages
USL3
No ratings yet
USL3
19 pages
14_DBSCAN
No ratings yet
14_DBSCAN
7 pages
DBSCAN Clustering Algorithm: Presented by
No ratings yet
DBSCAN Clustering Algorithm: Presented by
22 pages
DBSCAN
No ratings yet
DBSCAN
42 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
Lecture 11 DBSCAN
No ratings yet
Lecture 11 DBSCAN
6 pages
DBSCAN
No ratings yet
DBSCAN
18 pages
UNIT-6 DBSCAN Clustering
No ratings yet
UNIT-6 DBSCAN Clustering
6 pages
A_Theoretical_Analysis_of_Density_Peaks_Clustering_and_the_Component-Wise_Peak-Finding_Algorithm
No ratings yet
A_Theoretical_Analysis_of_Density_Peaks_Clustering_and_the_Component-Wise_Peak-Finding_Algorithm
12 pages
DBSCAN Algorithm
No ratings yet
DBSCAN Algorithm
15 pages
Journal of Parallel and Distributed Computing
No ratings yet
Journal of Parallel and Distributed Computing
13 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Lecture - 02 - Fluid Statics
No ratings yet
Lecture - 02 - Fluid Statics
6 pages
Information Processing Approach To Cognitive Psychology
No ratings yet
Information Processing Approach To Cognitive Psychology
11 pages
Quadratic Equation
No ratings yet
Quadratic Equation
5 pages
Performing Mathematical Operation and Performance Analysis in MATLAB
No ratings yet
Performing Mathematical Operation and Performance Analysis in MATLAB
3 pages
Stability Analysis of Box-Girder Cable-Stayed Bridges
No ratings yet
Stability Analysis of Box-Girder Cable-Stayed Bridges
6 pages
5. Math 121 Notes Sets II
No ratings yet
5. Math 121 Notes Sets II
8 pages
Case Analysis: Alumni Giving: Felicisimo, Jaurigue, Laset, Santos, Tiu
No ratings yet
Case Analysis: Alumni Giving: Felicisimo, Jaurigue, Laset, Santos, Tiu
34 pages
Most Important Question For Class 9 of Areas of Parallelograms and Triangles
No ratings yet
Most Important Question For Class 9 of Areas of Parallelograms and Triangles
12 pages
Chapter 1 - Crystal Structure - Part 1 PDF
No ratings yet
Chapter 1 - Crystal Structure - Part 1 PDF
45 pages
Ezy-Guard Smart Z-Post (Australia)
No ratings yet
Ezy-Guard Smart Z-Post (Australia)
26 pages
Grade 9 TERM 2 2023-2024
No ratings yet
Grade 9 TERM 2 2023-2024
25 pages
Lecture 4-Counting Techniques and Probability
No ratings yet
Lecture 4-Counting Techniques and Probability
26 pages
Sinusoids and Phasors Compress
No ratings yet
Sinusoids and Phasors Compress
38 pages
3-Classification of Clothing Using Midlevel Layers
No ratings yet
3-Classification of Clothing Using Midlevel Layers
17 pages
CSE211 - Project - Jump Search Algorithm: S. M Farhan Ishrak (1810615)
No ratings yet
CSE211 - Project - Jump Search Algorithm: S. M Farhan Ishrak (1810615)
6 pages
Stats Exam 1 Cheat Sheet
No ratings yet
Stats Exam 1 Cheat Sheet
3 pages
Birla Institute of Technology & Science, Pilani Course Handout Part A: Content Design
No ratings yet
Birla Institute of Technology & Science, Pilani Course Handout Part A: Content Design
8 pages
SBE Assignment
No ratings yet
SBE Assignment
16 pages
Distributive Property
No ratings yet
Distributive Property
4 pages
Finite Automata
No ratings yet
Finite Automata
41 pages
Final Draft Ict Diploma November 2024 Exam Timetable
No ratings yet
Final Draft Ict Diploma November 2024 Exam Timetable
5 pages
project PROPOSAL final
No ratings yet
project PROPOSAL final
9 pages
Structural Calculation For Gl1 Beam: 1. Description
No ratings yet
Structural Calculation For Gl1 Beam: 1. Description
13 pages
C Chapter 10
No ratings yet
C Chapter 10
32 pages
4.constraints
No ratings yet
4.constraints
14 pages
CATIA-CADAM Interface
No ratings yet
CATIA-CADAM Interface
53 pages
CS 228: Logic in Computer Science: Krishna. S
No ratings yet
CS 228: Logic in Computer Science: Krishna. S
53 pages
Haphazard Selection: Is It Time To Change Audit Standards?: Tom - Hall@uta - Edu
No ratings yet
Haphazard Selection: Is It Time To Change Audit Standards?: Tom - Hall@uta - Edu
43 pages
Newton Three Laws-Winfield
No ratings yet
Newton Three Laws-Winfield
9 pages