N-19248
N-19248
N-19248
Abstract— 3D shape feature learning plays a pivotal role Voxelization is a commonly used method for pre-
in both industry and academia. PointCNN is one of excellent processing, which is similar to pixelation in 2D image deep
neural networks for 3D object databases classification. Instead learning [10].VoxNet [11] ,using this method, has achieved
of selecting representative points arbitrarily in PointCNN,
clustering-enhanced PointCNN proposed in this paper can make 92% test accuracy on ModelNet10 and 83% test accuracy on
representative points more logical and efficient for point cloud ModelNet40. Y. Wang [12] proposed a network using pixel
classification learning. The proposed clustering-based selection data and achieved 88.66% test accuracy on ModelNet10 and
approach is able to distinguish more features and catch more 82.66% test accuracy on ModelNet40.
details from 3D shapes. Both K-Means and Gaussian-Mixture-
Model (GMM) clustering methods are applied during the point Another method is transforming a 3D object model to
selection period. Both methods have been tested on several multi-view figures. Hang Su [13] proposed MVCNN (Multi-
public data sets, which substantiates the superior classification View Convolutional Neural Network) which can achieve
accuracy with comparable training time. 86.3% accuracy on ModelNet40. Some neural networks
I. INTRODUCTION adopting the similar methods have also been proposed like
MVRNN (Multi-View Recurrent Neural Network) [14] and
A. Motivation and Background
MVCNN-New [15].
Deep learning, which usually refers to deep artificial neu-
Although the performance of these methods is not bad,
ral networks, has achieved great performance in numerous
they cannot satisfy industry requirement with the wide usage
applications [1]. In particular, convolutional neural network
of 3D point cloud data. Because of the development of indus-
[2], proposed by Kunihiko Fukushima, is widely applied
try 3D camera, point cloud becomes the main data structure
in machine learning problems such as natural language
in industry. At the same time, point cloud could be used for
processing [3] and image classification [4].
scene recognition and segmentation directly [16]. Therefore,
3D shape learning can be widely applied in 3D printing,
researchers have been paying more attention to point cloud
topography, intelligent manufacturing, and quality control
learning. In recent years, some significant networks with
[5], [6], [7]. Meanwhile, it can also be used for 3D scene
point cloud data input have been drawing growing interests.
recognition and object segmentation. In 3D shape learning,
PointNet proposed by [17] is a typical point cloud learning
we use ModelNet [8] for training and testing as 3D shape
network which achieves 89.2% accuracy on ModelNet40. T-
data sets, which is similar to Imagenet [9] in image identifi-
Net is an ingenious design for solving invariance of point
cation field. However, different from 2D image learning, 3D
cloud learning in PointNet. Then PointNet++ proposed by
object learning cannot use regular data directly on account
[18] , as an improvement of PointNet, achieves 90.7% accu-
of the diversity and specificity of 3D object data. In addition,
racy on ModelNet40. However, the classification accuracy of
the 3D point cloud is not distributed uniformly. So, we
all these approaches is not satisfactory enough and an open
cannot use convolution transformation directly. Due to these
problem of 3D shape classification still remains.
difficulties, accuracy of 3D classification is not satisfactory
enough at present.
B. Literature Review C. Contribution and Organization
In order to solve the problems in 3D classification,
pre-processing of 3D data is necessary, and various pre- In this paper, we proposed a clustering-enhanced method
processing methods have been developed. to improve the classification accuracy of PointCNN. We
use clustering algorithms(K-Means and Gaussian Mixture
The work described in the paper was jointly sponsored by Open Fund
of State Key Laboratory of Intelligent Manufacturing System Technology, Model) to select representative points of X-Conv transfor-
Natural Science Foundation of Shanghai (18ZR1420100), and National mation which is the kernel operation in PointCNN. The
Natural Science Foundation of China (61703274). experiment result shows that this method can increase the
Yikuan Yu, Yu Zheng, and Xinyi Le (corresponding author) are with
Shanghai Key Laboratory of Advanced Manufacturing Environment, School accuracy of classification compared to original PointCNN.
of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai The paper is organized as follows. In Section II, some
200240, China (yyyykkkk1995, lexinyi@sjtu.edu.cn).
Fei Li is with State Key Laboratory of Intelligent Manufacturing System related work of point cloud learning and clustering methods
Technology, Beijing Institute of Electronic System Engineering, Beijing are introduced. In Section III, we describe the mathematical
100854, China, and also with Beijing Complex Product Advanced Manufac- procedure of X-Conv and clustering-enhanced PointCNN.
turing Research Center, Beijing Simulation Center, Beijing 100854, China.
Min Han is with Faculty of Electronic Information and Electrical Engi- Section IV includes our experimental results. Section V
neering, Dalian University of Technology, Dalian 116023, China. concludes this paper and comes up with some future work.
978-1-7281-2009-6/$31.00 ©2019 IEEE paper N-19248.pdf
Fig. 1. (a) CAD Model and (b) Point Cloud of a Teapot
II. R ELATED W ORK network can describe and learn local relationship of point
A. Point Cloud Learning cloud. The input of image classification is regular pixel
matrix with color channel and pixel matrix can be convo-
Point cloud is a set of data points in space (can be any
lutionally operated directly as Figure 2. However, it is hard
dimension) which usually measures a number of points on
to use the convolutional kernel scanning point cloud due to
the external surface of an object. Now, 3D point cloud can
its irregular property as Figure 3.
be produced by other 3D data as Figure 1.
Point cloud can be regarded as a set of unordered points
{Pi ∈ X d |i = 1, . . . , n} in a d-dimension space, where each
point Pi is a vector of its coordinate (x1i , . . . , xdi ) plus its
feature channels with m-dimension Fi (fi1 , . . . , fim ) such
like color, material and diaphaneity if necessary [19].
Point cloud classification task is given by the function:
f ({Pi |i = 1, . . . , n}), where f : X n×(d+m) → Y k . The
output of this function is k-dimension vector which presents
the probability of k classes. We always take the classification
result which has highest probability value.
Therefore, 3D point cloud learning has three following Fig. 2. Convolution Processing of CNN for 2D Image Feature Extraction.
properties:
• Disorder. Sequence of points in the input set can not
affect the final result of deep learning. The mathematical
expression is:
f (Pa1 , . . . , Pan ) = f (Pb1 , . . . , Pbn ) (1)
where {Pai |i = 1, . . . , n} and {Pbi |i = 1, . . . , n} are
two different arrays of points.
• Invariance. This property usually refers to rotation
invariance. Point cloud location data expressions of an
object would be different in diverse coordinate systems.
But our learner can eliminate the variance caused by
rotation. The mathematical expression is:
Fig. 3. Conventional Convolution Processing Failure on Point Cloud.
f (Px1 , . . . , Pxn ) = f (Py1 , . . . , Pyn ) (2)
where {Pxi ∈ X d |i = 1, . . . , n} and {Pyi ∈ Y d |i = PointCNN was proposed by Li [20]. This network gives a
1, . . . , n} are any two different expressions in their method called ”X-Conv” operator in the paper to replace
coordinates of an object. Moreover, X d , Y d are linear conventional convolution. PointCNN for classification is
equivalent d-dimension coordinate. structured by several hierarchical X-Conv layers and one
• Interaction. Relationship among points makes 3D SoftMax layer like Figure 4. Therefore, X-Conv operator
learning more complicated. Both global and local rela- plays a crucial role in PointCNN.
tionship (or can be called feature) should be considered In Figure 4, blue points represent input points(with or
in learning network. without features) and first X-Conv layer turns the input points
into less representation points(orange points) with richer
B. PointCNN and X-Conv features. The second X-Conv layer repeats this processing
Due to great performance of CNN on 2D image classi- and parameter of X-Conv operator will be different. After
fication, various convolutional neural networks on 3D point X-Conv layer, number of points is decreased and feature
cloud gradually appear in recent years. Convolutional neural becomes richer.
C. Clustering Methods
When dealing with big data, a proper clustering method
can provide assistance to find internal rules [21]. Clustering
algorithm is an unsupervised learning process and classifica-
tion is a supervised learning process. Meanwhile, classifica-
tion method is using known features to verdict the affiliation
and clustering is a method for obtaining these features. So,
clustering is widely applied for data preprocessing.
Frequently used clustering method includes K-Means,
GMM (Gaussian Mixture Model), Mean Shift and Graph
Community Detection. In our model, K-Means and GMM
Fig. 4. PointCNN (Using For Multi-Classification) Structure.
are adopted.
K-Means is a well-understood method [22]. Its procedure
can be posted as Algorithm 1 shows.
The main effect of X-Conv is feature extraction as con-
volution in 2D deep learning. Figure 5 compares traditional
Algorithm 1 K-Means Approach
convolution and X-Conv. This figure shows that X-Conv can
extract the local regions around representative points and Input: Input parameter K and data set S ∈ Rn
learn the information of this region. This processing executes Output: Output Si ⊆ S|i = 1, . . . , K
the same function as traditional convolution. 1: Select K points in space Rn randomly
2: for Iterations are not enough do
3: For any P ∈ S, denote P ∈ Si |i = argmin{||P −
Ki ||}
4: Let Ki = S̄i
5: return result
B. Experiment Results
Fig. 7. Examples of 3D Models in ModelNet. Classification task and its result are usually deemed to
the standard of evaluation for a network. A learner which
• CIFAR-10. This dataset (Figure 8) includes 60, 000 achieves great performance on classification task can also
color images from 10 categories (6000 images per cat- be adopted to other tasks (e.g. segmentation task) suc-
egory) with 50, 000/10, 000 training/testing split. The cessfully. We evaluate Clustering-enhanced PointCNN on
dataset is divided into five training batches and one test the classification task of ModelNet40, CIFAR-10, MNIST
batch, each with 10000 images. The test batch contains and TU Berlin using two clustering methods: K-Means and
exactly 1000 randomly-selected images from each class. Gaussian Mixture Model(GMM). We use TensorFlow(1.12)
The training batches contain the remaining images in to run this classification task with NVIDIA 1080Ti GPU.
random order, but some training batches may contain We summarize our classification results including running
more images from one class than another. Between time and accuracy on Table I and Table II. We trained with
them, the training batches contain exactly 5000 images the same iteration on one dataset with different algorithms.
from each class. We compared the K-Means-enhanced PointCNN, GMM-
• MNIST. It is a subset of NIST (National Institude of enhanced PointCNN, initial PointCNN (random selection)
Standards and Technoloty) constructed of handwritten and PointNet on four datasets.
digits as Figure 9. It has a training set of 60, 000 According to the aforementioned experimental results, it
examples and a test of 10, 000 examples. The digits can be found that our network (K-Means and GMM) has
have been size-normalized and centered in a fixed-size better classification performance than initial PointCNN on
image. MNIST is widely used for sanity check of image some datasets. Our method sacrifices some running effi-
classification networks. ciency but achieves superior behavior. Moreover, GMM-
• TU Berlin. TU Berlin Sketch, which is proposed by enhanced PointCNN has more stable performace than K-
Eitz et al. [26], has 20, 000 human sketches from 250 Means-enhanced PointCNN.
TABLE II
RUNNING T IME OF C LASSIFICATION U SING F OUR N ETWORKS