Orleary 2020
Orleary 2020
Orleary 2020
1, FEBRUARY 2020
Abstract—Manual classification of particle defects on assemblies, new manufacturing methods, and close control on
semiconductor wafers is labor-intensive, which leads to slow solu- cleaning and handling techniques [1], [2], [3]. The introduction
tions and longer learning curves on product failures while being of new aspects of the manufacturing process can be a major
prone to human error. This work explores the promise of deep
learning for the classification of the chemical composition of these source of unwanted particles depositing on wafers. In addition,
defects to reduce analysis time and inconsistencies due to human changes in existing process conditions can result in particle
error, which in turn can result in systematic root cause analysis generation depending on the reaction dynamics of the system.
for sources of semiconductor defects. We investigate a deep convo- These particle defects on semiconductor wafers can be one of
lutional neural network (CNN) for defect classification based on a the many causes of product failure. In fact, over 75% of the
combination of scanning electron microscopy (SEM) images and
energy-dispersive x-ray (EDX) spectroscopy data. SEM images total chip defects seen in standard semiconductor manufactur-
of sections of semiconductor wafers that contain particle defects ing processes are due to particle defects [4], [5], [6].
are fed into a CNN in which the defects’ EDX spectroscopy To understand the impact of manufacturing tools on their
data is merged directly with the CNN’s fully connected layer. defects, the common practice in the semiconductor equipment
The proposed CNN classifies the chemical composition of semi- industry is to run thousands of wafers per year per characteri-
conductor wafer particle defects with an industrially pragmatic
accuracy. We also demonstrate that merging spectral data with zation tool, such as scanning electron microscopy and optical
the CNN’s fully connected layer significantly improves classifica- scattering tools. Each wafer can contain tens of defects. As
tion performance over CNNs that only take either SEM image manufacturing processes and therefore particle defect compo-
data or EDX spectral data as an input. The impact of train- sitions become more complex, the amount of defects greatly
ing data collection and augmentation on CNN performance is increases. As a result, the time spent by process and pro-
explored and the promise of transfer learning for improving
training speed and testing accuracy is investigated. ductivity engineers to classify defects becomes excessively
large. In addition, manual classification techniques are prone
Index Terms—Convolutional neural networks, defect to human error. Clearly, implementing automated classification
classification, semiconductor manufacturing, particle defects,
chemical composition, transfer learning, data augmentation, techniques are required to improve both defect classification
outlier detection. accuracy and efficiency.
B. Motivation
I. I NTRODUCTION In any defect analysis study, the ultimate goal is to eliminate
A. Background the sources producing these defects in order to maximize the
EMICONDUCTOR devices are found in almost every yield on the wafer. While specific steps and sequences may
S facet of modern life — from smartphones to ultra-high
definition television sets. To sustain the ever-increasing
vary depending on the level of inspection desired for a given
application, the first step of a general defect analysis workflow
demand for lower cost, faster computing power, and/or (see Fig. 1) is to measure a wafer before and after a certain
higher memory capacity devices, the design of semiconduc- step in the manufacturing process using optical scattering tech-
tor manufacturing processes is becoming more complex and niques. The data from the surface scattering tool provides a
requires introduction of new hardware components, complex snapshot of the changes that occurred in that step by producing
wafer maps based on the size distribution of the defects and
Manuscript received November 5, 2019; revised December 13, 2019; their locations on the wafer. In several instances, looking at
accepted December 30, 2019. Date of publication January 1, 2020; date of
current version February 3, 2020. This work was supported by Lam Research this global picture can provide some guidance into the source
Corporation. (Corresponding author: Ali Mesbah.) of the defects. For example, a scratch may indicate mechan-
Jared O’Leary and Ali Mesbah are with the Department of Chemical ical contact during operations, or, depending on the chamber
and Biomolecular Engineering, University of California, Berkeley, CA 94720
USA. geometry, agglomeration of the defects in one area may indi-
Kapil Sawlani is with the Deposition Product Group (Digital Initiative), cate requirement of maintenance of the chamber parts. Many
Lam Research Corporation, Fremont, CA 94538 USA. authors have recently provided machine learning solutions
Color versions of one or more of the figures in this article are available
online at http://ieeexplore.ieee.org. with high accuracy for these problems using convolutional
Digital Object Identifier 10.1109/TSM.2019.2963656 neural networks [7], adaptive balancing generative adversarial
0894-6507 c 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Canberra. Downloaded on April 27,2020 at 12:29:04 UTC from IEEE Xplore. Restrictions apply.
O’LEARY et al.: DEEP LEARNING FOR CLASSIFICATION OF CHEMICAL COMPOSITION OF PARTICLE DEFECTS 73
Fig. 1. Semiconductor defect classification workflow. To fully characterize the defects on a semiconductor wafer, the wafer’s defect map must first be
determined. Next, the morphologies and chemical spectra of individual particle defects are examined. Due to morphology similarity and peak overlap among
different defect classes, it is necessary to combine EDX spectral and SEM image data from steps 2 and 3 to fully characterize a wafer’s defects. Several
automated classification techniques exist for each of the three steps in this process, such as convolutional neural networks (CNNs), support vector machines
(SVM), randomized general regression networks (RGRNs), and binning techniques. However, no automated classification technique exists that combines
multiple steps in this workflow. The main contribution of this paper is automating the classification of particle defects on semiconductor wafers based on
combined information of the defect SEM image and chemical spectra obtained using EDX spectroscopy.
Authorized licensed use limited to: University of Canberra. Downloaded on April 27,2020 at 12:29:04 UTC from IEEE Xplore. Restrictions apply.
74 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 33, NO. 1, FEBRUARY 2020
Authorized licensed use limited to: University of Canberra. Downloaded on April 27,2020 at 12:29:04 UTC from IEEE Xplore. Restrictions apply.
O’LEARY et al.: DEEP LEARNING FOR CLASSIFICATION OF CHEMICAL COMPOSITION OF PARTICLE DEFECTS 75
Authorized licensed use limited to: University of Canberra. Downloaded on April 27,2020 at 12:29:04 UTC from IEEE Xplore. Restrictions apply.
76 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 33, NO. 1, FEBRUARY 2020
Authorized licensed use limited to: University of Canberra. Downloaded on April 27,2020 at 12:29:04 UTC from IEEE Xplore. Restrictions apply.
O’LEARY et al.: DEEP LEARNING FOR CLASSIFICATION OF CHEMICAL COMPOSITION OF PARTICLE DEFECTS 77
TABLE I
N UMBER OF D EFECTS P ER C LASS . D EFECTS B ELONGING TO THE
S AME C LASS MAY O CCASIONALLY H AVE S LIGHTLY D IFFERENT
C OMPOSITIONS ( E . G ., A L Ox Fy AND A L Ox Fy Nz ). THE C LASS
L ABELS W ERE P ROVIDED BY L AM R ESEARCH C ORP
Authorized licensed use limited to: University of Canberra. Downloaded on April 27,2020 at 12:29:04 UTC from IEEE Xplore. Restrictions apply.
78 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 33, NO. 1, FEBRUARY 2020
Authorized licensed use limited to: University of Canberra. Downloaded on April 27,2020 at 12:29:04 UTC from IEEE Xplore. Restrictions apply.
O’LEARY et al.: DEEP LEARNING FOR CLASSIFICATION OF CHEMICAL COMPOSITION OF PARTICLE DEFECTS 79
Authorized licensed use limited to: University of Canberra. Downloaded on April 27,2020 at 12:29:04 UTC from IEEE Xplore. Restrictions apply.
80 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 33, NO. 1, FEBRUARY 2020
TABLE II
CNN T RAINING AND VALIDATION ACCURACIES FOR 5-F OLD C ROSS -VALIDATION . M EAN AND VARIANCE OF THE T RAINING AND VALIDATION
ACCURACIES A MONG THE F IVE M ODELS C REATED D URING 5-F OLD C ROSS -VALIDATION A RE G IVEN . ACCURACY DATA I S A LSO G IVEN FOR
BASELINE C OMPARISON CNN S T HAT E ITHER D O N OT I NCLUDE SEM-I MAGE DATA OR D O N OT I NCLUDE EDX-S PECTROSCOPY DATA .
L OW VARIANCE IN T RAINING AND VALIDATION ACCURACIES S UGGESTS THE C ONSISTENCY OF THE P ROPOSED CNN S TRATEGY.
N OTE THE L ARGE P ERFORMANCE D IFFERENCE B ETWEEN THE CNN T HAT I NCLUDES B OTH I MAGE AND S PECTRAL DATA
I TS I MAGE -O NLY BASELINE
TABLE III
S UMMARY OF CNN C LASSIFICATION ACCURACY ON THE T EST DATA S ET. T OP -1 AND T OP -3 C LASSIFICATION ACCURACIES A RE G IVEN FOR THE
T EST DATA S ET FOR CNN AND ITS SEM I MAGE -O NLY AND EDX S PECTRA -O NLY BASELINES . THE FIRST ROW R EPRESENTS THE T EST ACCURACY
OF A “S TRATIFIED D UMMY ” C LASSIFIER , W HICH G ENERATES P REDICTIONS BY R ESPECTING THE DATA S ET ’ S T RAINING D ISTRIBUTION
A. CNN Performance
The mean and variance of the 5-fold cross validation
training and validation accuracies are reported in Table II. The
small variances of training and validation model accuracies
suggest the consistency of the proposed CNN. Discrepancies
between training and validation accuracies are due to the fact
that validation accuracies are calculated without a dropout
layer, thereby making the CNN more robust [41].
The Top-1 and Top-3 accuracies are next reported for the
test data set in Table III. Clearly, the CNN that uses image
and spectroscopy data outperforms the baseline CNNs that
use either only SEM image data or only EDX spectral data.
For example, Fig. 11 shows a comparison of the classifica-
tion probabilities output for an example Fe-Ni/O defect. The
image-only CNN misclassifies the example defect as an SiO2
defect, while the spectra-only CNN misclassifies the exam-
ple defect as an SiOx Cy defect. Meanwhile, the combined
SEM/EDX CNN correctly classifies the defect with a very high
confidence. A closer look at Fig. 11 reveals that the image-only
CNN determined the two highest-probability classes as SiO2
and Fe-Ni/O, while the spectra-only CNN determined the three
highest probability classes as SiO2 , SiOx Cy , and Fe-Ni/O.
The EDX spectrum clearly shows strong oxgyen and silicon Fig. 11. Classification probabilities for Fe-Ni/O defect. A top-view, centered
SEM image and an EDX spectrum for the example defect. (top) Comparison
peaks, a weak carbon peak, and weak, semi-overlapping nickel of classification probabilities of the example defect for CNNs trained with
and iron peaks. The combined SEM image and EDX spectra only SEM image data, only EDX spectral data, and both SEM image and
CNN clearly identifies some correlation between the contours EDX spectral data. (bottom) The CNN that only uses SEM image data mis-
classifies the example defect as SiO2 while the CNN that uses only EDX
seen in the SEM image and the shape of the EDX spectra to spectral data misclassifies the defect as SiOx Cy . The CNN that uses both
correctly classify the defect with high confidence. SEM image and EDX spectroscopy correctly classifies the defect as Fe-Ni/O.
The combined SEM image and EDX spectra CNN yields The combined data CNN clearly identifies a correlation between the image
and spectral features to create a high probability classification.
a greater than 99% Top-3 accuracy. The high performance
of the CNN indicates that the proposed defect classification
framework meets the previously specified 95% requirement
for pragmatic viability of defect chemical composition classi- in Table IV. The table shows that certain defects are classi-
fication in the semiconductor industry. fied with greater than 90% accuracies (e.g., Al/Ox Fy /Nz , SiO2 ,
A closer examination of the combined SEM image data Y). Meanwhile, SiOx Cy and CuO/S defects are often misclas-
and EDX spectral data CNN’s performance reveals that certain sified as SiO2 and Fe-Ni/O defects, respectively. For example,
classes in the test data set are much more accurately classi- the CNN only correctly classified 29.3% of SiOx Cy defects
fied than others. The confusion matrix for the CNN is shown while incorrectly classifying SiOx Cy as SiO2 61.2% of the
Authorized licensed use limited to: University of Canberra. Downloaded on April 27,2020 at 12:29:04 UTC from IEEE Xplore. Restrictions apply.
O’LEARY et al.: DEEP LEARNING FOR CLASSIFICATION OF CHEMICAL COMPOSITION OF PARTICLE DEFECTS 81
TABLE IV
C ONFUSION M ATRIX FOR THE CNN ON T EST DATA S ET. T HE CNN M ODEL C HOSEN F ROM 5-F OLD C ROSS -VALIDATION WAS U SED TO C LASSIFY
D EFECTS IN THE T EST DATA S ET. T HE C ONFUSION M ATRIX S HOWS THE P ERCENTAGE OF C LASSES T HAT W ERE C ORRECTLY C LASSIFIED
(T HE D IAGONAL E NTRIES ) AND THE P ERCENTAGE OF C LASSES T HAT W ERE I NCORRECTLY C LASSIFIED . N OTE T HAT D EFECTS
B ELONGING TO THE C U O/S AND S I Ox Cy C LASSES A RE O FTEN I NCORRECTLY C LASSIFIED AS M EMBERS
OF THE F E -N I /O AND S I O 2 C LASSES , R ESPECTIVELY
Authorized licensed use limited to: University of Canberra. Downloaded on April 27,2020 at 12:29:04 UTC from IEEE Xplore. Restrictions apply.
82 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 33, NO. 1, FEBRUARY 2020
TABLE V
C ONFUSION M ATRIX FOR THE CNN U SING 150 D EFECTS P ER C LASS . T HE CNN WAS R ETRAINED W ITH 150 D EFECTS P ER C LASS . T HE CNN WAS
T HEN U SED TO C LASSIFY THE D EFECTS IN THE T EST DATA S ET. C LASSIFICATION ACCURACY D ECREASES FOR H EAVILY P OPULATED C LASSES
( E . G ., A L , S I O2 ). C LASSIFICATION ACCURACY S IGNIFICANTLY I NCREASES FOR C U O/S AND S I Ox Cy , W HICH W ERE P REVIOUSLY O FTEN
I NCORRECTLY C LASSIFIED AS F E -N I /O AND S I O2 R ESPECTIVELY (S EE TABLE IV FOR C OMPARISON ). N OTE T HAT T WO C LASSES
H AVE B ETWEEN 100 AND 150 AVAILABLE D EFECTS FOR T RAINING (Y, C U O/S) AND THE M AXIMUM N UMBER OF AVAILABLE
D EFECTS W ERE U SED FOR CNN T RAINING IN T HESE C ASES ( I . E ., 136 FOR C U O/S AND 111 FOR Y). F URTHER N OTE T HAT
THE 150 D EFECTS P ER E ACH C LASS W ERE R ANDOMLY S ELECTED B EFORE DATA AUGMENTATION
TABLE VI
possibly lead to enhanced performance on a larger, more C ONFUSION M ATRIX FOR THE O UTLIER D ETECTION S TRATEGY. T HE
uniform data set, implementing such complex nets on this data O UTLIER D ETECTION S TRATEGY WAS U SED TO D ETERMINE
THE P RESENCE OF D EFECTS IN THE O UTLIER D ETECTION
set would likely lead to overfitting. As such, we cannot claim
T EST DATA S ET. T HE C ONFUSION M ATRIX S HOWS THE
that the designed CNN is by any means “state-of-the-art”, and P ERCENTAGE OF C LASSES T HAT W ERE C ORRECTLY
instead primarily serves as a proof of concept for the idea C LASSIFIED (T HE D IAGONAL E NTRIES ) AND THE
of using a CNN to classify real industrial defects based on a P ERCENTAGE OF C LASSES T HAT W ERE
I NCORRECTLY C LASSIFIED . N OTE T HAT
combination of SEM image and EDX spectral data. The ear- O NLY SEM I MAGES W ITH V ERY
lier provided 95% Top-3 accuracy metric was a target metric S MALL D EFECTS W ERE
for pragmatic industrial utility. I NCORRECTLY C LASSIFIED
Authorized licensed use limited to: University of Canberra. Downloaded on April 27,2020 at 12:29:04 UTC from IEEE Xplore. Restrictions apply.
O’LEARY et al.: DEEP LEARNING FOR CLASSIFICATION OF CHEMICAL COMPOSITION OF PARTICLE DEFECTS 83
TABLE VII
T RANSFER L EARNING P ERFORMANCE S UMMARY. T OP -1 AND T OP -3 C LASSIFICATION ACCURACIES A RE G IVEN FOR THE T EST DATA S ET AS W ELL
T RAINING T IMES T HAT A RE N ORMALIZED IN C OMPARISON TO THE T RAINING T IME FOR THE O RIGINAL CNN. H ERE , THE T RAINING
T IME FOR THE O RIGINAL CNN I S S ET TO 1. CNN TLV 1 I NVOLVES R ETRAINING A LL OF THE O RIGINAL CNN A FTER I NITIALIZING
THE W EIGHTS AND B IASES IN THE C ONVOLUTIONAL L AYER BASED ON A M ODEL W ITH 4 D EFECT C LASSES . CNN TLV 2
I NVOLVES THE S AME W EIGHT AND B IAS I NITIALIZATION , Y ET O NLY THE F ULLY C ONNECTED L AYER I S R ETRAINED .
B OTH CNN TLV 1 AND TLV 2 A RE FASTER , Y ET S LIGHTLY L ESS ACCURATE T HAN THE O RIGINAL .
CNN TLV 2 S LIGHTLY O UTPERFORMS CNN TLV 1 IN T RAINING S PEED AND ACCURACY
for the efficacy of easily implementable, computationally VI. C ONCLUSION AND F UTURE W ORK
cheap detection strategies for this particular problem. It is We design a deep convolutional neural network for the clas-
possible that more advanced noise filtering methods or types sification of the chemical composition of particle defects on
of image pre-processing could further improve performance. semiconductor wafers. The CNN takes one centered, top-view
Meanwhile, some of the inherently more advanced outlier SEM image of a given particle defect an input. The CNN then
detection methods mentioned earlier (e.g., principal com- merges its fully connected layer with that defect’s correspond-
ponent analysis) may yield better performance as well. ing EDX spectral data. The CNN represents the first example
The main purpose of including outlier detection in this of a CNN that uses multiple data types (i.e., image and spectral
paper is to demonstrate that although the defect-free and data) for semiconductor defect classification. This CNN was
defect-containing images should be separated, the problem is able to classify semiconductor defects with a high, industri-
not particularly difficult to solve. ally pragmatic accuracy, that seemed to be primarily limited
As process changes occur, it is possible that particle defects by the imbalance and low numbers of defects belonging to
become progressively smaller and render the previously cal- certain, similar classes. The strong accuracy of the proposed
culated defect threshold area mean and variance invalid. It is CNN suggests that merging spectroscopy data, or potentially
further possible that defect size variance increases to the point other object metadata, directly with the fully connected layer
at which the recommended three standard deviation back-off of a CNN can greatly improve that CNN’s classification capa-
is no longer appropriate. As a result, statistical process con- bilities. It is expected that larger training data sets with a
trol methods [68], [69] can be implemented to monitor outlier more uniform distribution among defect classes will improve
detection efficacy. Here, time-frames at which threshold defect classification accuracy. Data augmentation methods separate
areas should be recalculated will be identified. from rotation can also be explored (e.g., translation, re-scaling,
light-scattering) to further increase classification accuracy.
From a practical implementation standpoint, transfer learning
D. Transfer Learning Analysis studies showed that when new defects are introduced, only
Testing accuracy and training times of the original CNN the fully connected layers of the CNN need to be retrained to
and the two CNNs trained using transfer learning strategies account for such defects.
(i.e., CNN TLv1, CNN TLv2) are reported in Table VII. The
weights and biases for CNN TLv1 were initialized by training ACKNOWLEDGMENT
the original CNN with the 4 classes with the fewest numbers of The authors would like to thank several members from Lam
defects (i.e., CuO/S, Organic, SiOx Cy , and Y). CNN TLv1 was Research Corporation who have provided data, insight and
then trained for all 8 classes with 2 epochs. CNN TLv2 was directions for this work. They acknowledge the contributions
initialized with the same weights and biases, yet only its fully of Lam’s metrology team (N. Tran, B. Skyberg, and H. Li)
connected layers were retrained for all 8 classes. CNN TLv1 in collection of defect data from various review SEM/EDX
and TLv2 perform comparably to the original CNN. However, tools and also the engineers from different product groups
CNN TLv2 slightly outperforms CNN TLv1 in terms of both (C. La, S. Zhang, and A. Radocea) who helped organize labels
training speed and accuracy. The results of CNN TLv2 indicate for the defects to perform the supervised learning study in this
that once convolutional layers have been trained with a set work. Neural network discussions with H. Li and S. Riggs
of SEM images of particle defects, these weights and biases were useful. K. Sawlani would also like to acknowledge the
extract features well enough to be applied to separate, much guidance and mentoring provided by R. Gottscho, K. Ashtiani,
larger groups of defects. Further note that the accuracy of M. Danek, K. Wells, K. Hansen, E. Gurer, D. Pirkle, Y. Feng,
CNN TLv1 eventually converges to that of the original CNN, R. Roberts, and several others from Lam Research.
but does so after between 6 and 8 epochs of training, which
does not significantly reduce training time. Therefore, when R EFERENCES
training future CNNs for new classes of defects, it will be
[1] H. C. Pfeiffer, “PREVAIL: IBM’s e-beam technology for next
more efficient and accurate to only re-train the fully connected generation lithography,” in Proc. Emerg. Lithograph. Technol. IV,
layers following the strategy outlined for CNN TLv2. vol. 3997, 2000, pp. 206–214.
Authorized licensed use limited to: University of Canberra. Downloaded on April 27,2020 at 12:29:04 UTC from IEEE Xplore. Restrictions apply.
84 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 33, NO. 1, FEBRUARY 2020
[2] L. Harriott, “Next generation lithography,” Mater. Today, vol. 2, no. 2, [25] M. Egmont-Petersen, D. de Ridder, and H. Handels, “Image processing
pp. 9–12, 1999. with neural networks—A review,” Pattern Recognit., vol. 35, no. 10,
[3] Y. Gomei, “Cost analysis on the next-generation lithography tech- pp. 2279–2301, 2002.
nology,” in Proc. Emerg. Lithograph. Technol. III, vol. 3676, 1999, [26] M. R. G. Meireles, P. E. M. Almeida, and M. G. Simões, “A compre-
pp. 1–9. hensive review for industrial applicability of artificial neural networks,”
[4] D. A. Drabold and S. K. Estreicher, Theory of Defects in IEEE Trans. Ind. Electron., vol. 50, no. 3, pp. 585–601, Jun. 2003.
Semiconductors. Heidelberg, Germany: Springer, 2007. [27] J. Paola and R. Schowengerdt, “A review and analysis of backpropaga-
[5] F. A. Aziz, I. H. Ahmad, N. Zulkifli, and R. M. Yusuff, “Particle reduc- tion neural networks for classification of remotely-sensed multi-spectral
tion at metal deposition process in wafer fabrication,” in Manufacturing imagery,” Int. J. Remote Sens., vol. 16, no. 16, pp. 3033–3058, 1995.
System. Rijeka, Croatia: InTech, 2012. [28] W. E. Reddick, J. O. Glass, E. N. Cook, T. D. Elkin, and R. J. Deaton,
[6] S. H. Park, S. Kim, and J.-G. Baek, “Kernel-density-based particle “Automated segmentation and classification of multispectral magnetic
defect management for semiconductor manufacturing facilities,” Appl. resonance images of brain using artificial neural networks,” IEEE Trans.
Sci., vol. 8, no. 2, p. 224, 2018. Med. Imag., vol. 16, no. 6, pp. 911–918, Dec. 1997.
[7] T. Nakazawa and D. V. Kulkarni, “Wafer map defect pattern classifi- [29] J. Gu et al., “Recent advances in convolutional neural networks,” Pattern
cation and image retrieval using convolutional neural network,” IEEE Recognit., vol. 77, pp. 354–377, May 2018.
Trans. Semicond. Manuf., vol. 31, no. 2, pp. 309–314, May 2018. [30] S. Srinivas et al., “A taxonomy of deep convolutional neural nets for
[8] J. Wang, Z. Yang, J. Zhang, Q. Zhang, and W.-T. K. Chien, computer vision,” Front. Robot. AI, vol. 2, p. 36, Jan. 2016.
“AdaBalGAN: An improved generative adversarial network with imbal-
[31] W. Rawat and Z. Wang, “Deep convolutional neural networks for image
anced learning for wafer defective pattern recognition,” IEEE Trans.
classification: A comprehensive review,” Neural Comput., vol. 29, no. 9,
Semicond. Manuf., vol. 32, no. 3, pp. 310–319, Aug. 2019.
pp. 2352–2449, 2017.
[9] F. Adly, P. D. Yoo, S. Muhaidat, Y. Al-Hammadi, U. Lee, and M. Ismail,
“Randomized general regression network for identification of defect pat- [32] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
terns in semiconductor wafer maps,” IEEE Trans. Semicond. Manuf., no. 7553, p. 436, 2015.
vol. 28, no. 2, pp. 145–152, May 2015. [33] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learn-
[10] R. Baly and H. Hajj, “Wafer classification using support vector ing applied to document recognition,” Proc. IEEE, vol. 86, no. 11,
machines,” IEEE Trans. Semicond. Manuf., vol. 25, no. 3, pp. 373–383, pp. 2278–2324, 1998.
Aug. 2012. [34] D. Yu, H. Wang, P. Chen, and Z. Wei, “Mixed pooling for convolutional
[11] G. Tello, O. Y. Al-Jarrah, P. D. Yoo, Y. Al-Hammadi, S. Muhaidat, and neural networks,” in Proc. Int. Conf. Rough Sets Knowl. Technol., 2014,
U. Lee, “Deep-structured machine learning model for the recognition pp. 364–375.
of mixed-defect patterns in semiconductor fabrication processes,” IEEE [35] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge,
Trans. Semicond. Manuf., vol. 31, no. 2, pp. 315–322, May 2018. MA, USA: MIT Press, 2016.
[12] K. Nakata, R. Orihara, Y. Mizuoka, and K. Takagi, “A comprehensive [36] Y. LeCun et al., “Handwritten digit recognition with a back-propagation
big-data-based monitoring system for yield enhancement in semicon- network,” in Proc. Adv. Neural Inf. Process. Syst., 1990, pp. 396–404.
ductor manufacturing,” IEEE Trans. Semicond. Manuf., vol. 30, no. 4, [37] Y. LeCun et al., “Backpropagation applied to handwritten zip code
pp. 339–344, Nov. 2017. recognition,” Neural Comput., vol. 1, no. 4, pp. 541–551, 1989.
[13] K. B. Lee, S. Cheon, and C. O. Kim, “A convolutional neural network [38] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classifica-
for fault classification and diagnosis in semiconductor manufacturing tion with deep convolutional neural networks,” in Proc. Adv. Neural Inf.
processes,” IEEE Trans. Semicond. Manuf., vol. 30, no. 2, pp. 135–142, Process. Syst., 2012, pp. 1097–1105.
May 2017. [39] B. Xu, N. Wang, T. Chen, and M. Li, “Empirical evaluation of rectified
[14] T. Nakazawa and D. V. Kulkarni, “Anomaly detection and segmentation activations in convolutional network,” CoRR, vol. abs/1505.00853, 2015.
for wafer defect patterns using deep convolutional encoder–decoder neu- [Online]. Available: http://arxiv.org/abs/1505.00853
ral network architectures in semiconductor manufacturing,” IEEE Trans. [40] M. Ranzato, F. J. Huang, Y.-L. Boureau, and Y. LeCun, “Unsupervised
Semicond. Manuf., vol. 32, no. 2, pp. 250–256, May 2019. learning of invariant feature hierarchies with applications to object
[15] C.-F. Chien, S.-C. Hsu, and Y.-J. Chen, “A system for online detection recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
and classification of wafer bin map defect patterns for manufacturing (CVPR), 2007, pp. 1–8.
intelligence,” Int. J. Prod. Res., vol. 51, no. 8, pp. 2324–2338, 2013. [41] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and
[16] F.-L. Chen and S.-F. Liu, “A neural-network approach to recognize R. R. Salakhutdinov, “Improving neural networks by preventing co-
defect spatial pattern in semiconductor fabrication,” IEEE Trans. adaptation of feature detectors,” CoRR, vol. abs/1207.0580, 2012.
Semicond. Manuf., vol. 13, no. 3, pp. 366–373, Aug. 2000. [Online]. Available: http://arxiv.org/abs/1207.0580
[17] T. Yuan, S. J. Bae, and J. I. Park, “Bayesian spatial defect pattern [42] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
recognition in semiconductor fabrication using support vector clus- large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
tering,” Int. J. Adv. Manuf. Technol., vol. 51, nos. 5–8, pp. 671–683, [Online]. Available: https://arxiv.org/abs/1409.1556v6
2010.
[43] M. D. Zeiler and R. Fergus, “Visualizing and understanding con-
[18] T. Yuan, W. Kuo, and S. J. Bae, “Detection of spatial defect patterns gen-
volutional networks,” in Proc. Eur. Conf. Comput. Vis., 2014,
erated in semiconductor fabrication processes,” IEEE Trans. Semicond.
pp. 818–833.
Manuf., vol. 24, no. 3, pp. 392–403, Aug. 2011.
[19] K. W. Tobin, Jr., S. S. Gleason, T. P. Karnowski, and S. L. Cohen, [44] S. Borra and A. Di Ciaccio, “Methods to compare nonparametric
“Feature analysis and classification of manufacturing signatures based classifiers and to select the predictors,” in New Developments in
on semiconductor wafer maps,” in Proc. Mach. Vis. Appl. Ind. Inspection Classification and Data Analysis. Heidelberg, Germany: Springer, 2005,
V, vol. 3029, 1997, pp. 14–25. pp. 11–19.
[20] T. P. Karnowski, K. W. Tobin, Jr., S. S. Gleason, and F. Lakhani, [45] S. C. Wong, A. Gatt, V. Stamatescu, and M. D. McDonnell,
“Application of spatial signature analysis to electrical test data: “Understanding data augmentation for classification: When to warp?”
Validation study,” in Proc. Metrol. Inspection Process Control in Proc. Int. Conf. Digit. Image Comput. Techn. Appl. (DICTA), 2016,
Microlithography XIII, vol. 3677, 1999, pp. 530–541. pp. 1–6.
[21] M.-J. Wu, J.-S. R. Jang, and J.-L. Chen, “Wafer map failure pattern [46] J. Ding, B. Chen, H. Liu, and M. Huang, “Convolutional neural network
recognition and similarity ranking for large-scale data sets,” IEEE Trans. with data augmentation for SAR target recognition,” IEEE Geosci.
Semicond. Manuf., vol. 28, no. 1, pp. 1–12, Feb. 2015. Remote Sens. Lett., vol. 13, no. 3, pp. 364–368, Mar. 2016.
[22] S. Cheon, H. Lee, C. O. Kim, and S. H. Lee, “Convolutional neu- [47] P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convo-
ral network for wafer surface defect classification and the detection of lutional neural networks applied to visual document analysis,” in Proc.
unknown defect class,” IEEE Trans. Semicond. Manuf., vol. 32, no. 2, IEEE 7th Int. Conf. Document Anal. Recognit., 2003, p. 958.
pp. 163–170, May 2019. [48] D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, and
[23] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: Analysis J. Schmidhuber, “Flexible, high performance convolutional neural
and an algorithm,” in Proc. Adv. Neural Inf. Process. Syst., 2002, networks for image classification,” in Proc. Int. Joint Conf. Artif. Intell.
pp. 849–856. (IJCAI), vol. 22. Barcelona, Spain, 2011, p. 1237.
[24] C.-H. Wang, “Recognition of semiconductor defect patterns using spec- [49] D. C. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column deep neu-
tral clustering,” in Proc. IEEE Int. Conf. Ind. Eng. Eng. Manag., 2007, ral networks for image classification,” CoRR, vol. abs/1202.2745, 2012.
pp. 587–591. [Online]. Available: http://arxiv.org/abs/1202.2745
Authorized licensed use limited to: University of Canberra. Downloaded on April 27,2020 at 12:29:04 UTC from IEEE Xplore. Restrictions apply.
O’LEARY et al.: DEEP LEARNING FOR CLASSIFICATION OF CHEMICAL COMPOSITION OF PARTICLE DEFECTS 85
[50] J. W. Johnson, “Adapting mask-RCNN for automatic nucleus seg- [60] E. S. Olivas, J. D. M. Guerrero, M. M. Sober, J. R. M. Benedito,
mentation,” CoRR, vol. abs/1805.00500, 2018. [Online]. Available: and A. J. S. Lopez, Handbook of Research on Machine Learning
http://arxiv.org/abs/1805.00500 Applications and Trends: Algorithms, Methods and Techniques—2
[51] S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: Towards Volumes, Hershey, PA, USA: IGI Glob., 2009.
real-time object detection with region proposal networks,” in Proc. Adv. [61] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are
Neural Inf. Process. Syst., 2015, pp. 91–99. features in deep neural networks?” in Proc. Adv. Neural Inf. Process.
[52] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only Syst., 2014, pp. 3320–3328.
look once: Unified, real-time object detection,” in Proc. IEEE Conf. [62] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual
Comput. Vis. Pattern Recognit., 2016, pp. 779–788. networks,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 630–645.
[53] S. S. Raj, K. S. Kannan, and K. Manoj, “Principal component analy- [63] S. Xie, R. B. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual
sis for outlier detection,” Res. Rev. J. Stat., vol. 7, no. 1, pp. 62–68, transformations for deep neural networks,” in Proc. IEEE Conf. Comput.
2018. Vis. Pattern Recognit., 2017, pp. 1492–1500.
[54] I. T. Jolliffe and J. Cadima, “Principal component analysis: A review [64] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely
and recent developments,” Philos. Trans. Roy. Soc. A Math. Phys. Eng. connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis.
Sci., vol. 374, no. 2065, 2016, Art. no. 20150202. Pattern Recognit., 2017, pp. 4700–4708.
[55] R. Muthukrishnan and M. Radha, “Edge detection techniques for image [65] J. Chen, S. Sathe, C. Aggarwal, and D. Turaga, “Outlier detection with
segmentation,” Int. J. Comput. Sci. Inf. Technol., vol. 3, no. 6, p. 259, autoencoder ensembles,” in Proc. SIAM Int. Conf. Data Min., 2017,
2011. pp. 90–98.
[56] J. Canny, “A computational approach to edge detection,” IEEE Trans. [66] M. M. Lau and K. H. Lim, “Investigation of activation functions in
Pattern Anal. Mach. Intell., vol. PAMI-8, no. 6, pp. 679–698, Nov. 1986. deep belief network,” in Proc. IEEE 2nd Int. Conf. Control Robot. Eng.
[57] S. Mika, G. Ratsch, J. Weston, B. Scholkopf, and K.-R. Mullers, “Fisher (ICCRE), 2017, pp. 201–206.
discriminant analysis with kernels,” in Proc. IEEE Neural Netw. Signal [67] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities
Process. IX IEEE Signal Process. Soc. Workshop, 1999, pp. 41–48. improve neural network acoustic models,” in Proc. ICML, vol. 30, 2013,
[58] N. R. Pal and S. K. Pal, “A review on image segmentation techniques,” p. 3.
Pattern Recognit., vol. 26, no. 9, pp. 1277–1294, 1993. [68] J. F. MacGregor and T. Kourti, “Statistical process control of multivariate
[59] P. Dhankhar and N. Sahu, “A review and research of edge detection tech- processes,” Control Eng. Pract., vol. 3, no. 3, pp. 403–414, 1995.
niques for image segmentation,” Int. J. Comput. Sci. Mobile Comput., [69] J. S. Oakland, Statistical Process Control. Oxford, U.K.: Routledge,
vol. 2, no. 7, pp. 86–92, 2013. 2007.
Authorized licensed use limited to: University of Canberra. Downloaded on April 27,2020 at 12:29:04 UTC from IEEE Xplore. Restrictions apply.