Extreme multi-label classification is a rapidly growing research area with many applications. In ... more Extreme multi-label classification is a rapidly growing research area with many applications. In this paper we propose a system design of extreme multi-label text classification (XMTC) on query classification in the e-commerce domain. Search query classification is more challenging than conventional document classification because queries are usually very short and often ambiguous. We design a hybrid model that uses a deep neural network for long queries and uses a Naive Bayes model for short queries. We formulate and apply new data augmentation techniques and create new evaluation metrics that are more suitable for the extreme multi-label task in e-commerce. We also design end-to-end system level evaluation methods to address the challenge in human judgment due to the extremely large label space. We compare our deep neural network model with the state-of-the-art method on our real e-commerce data and observe about a 15% improvement in the F1 score. The end-to-end system evaluation results show that our new system improves query classification performance for a variety of query sets. In particular, for the torso and tail queries in e-commerce, we see 0.3% and 1.1% improvements in the NDCG score.
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP, 2018
The lack of high-quality labeled training data has been one of the critical challenges facing man... more The lack of high-quality labeled training data has been one of the critical challenges facing many industrial machine learning tasks. To tackle this challenge, in this paper, we propose a semi-supervised learning method to utilize unlabeled data and user feedback signals to improve the performance of ML models. The method employs a primary model M ain and an auxiliary evaluation model Eval, where M ain and Eval models are trained iteratively by automatically generating labeled data from unlabeled data and/or users feedback signals. The proposed approach is applied to different text classification tasks. We report results on both the publicly available Yahoo! Answers dataset and our e-commerce product classification dataset. The experimental results show that the proposed method reduces the classification error rate by 4% and up to 15% across various experimental setups and datasets. A detailed comparison with other semi-supervised learning approaches is also presented later in the paper. The results from various text classification tasks demonstrate that our method outperforms those developed in previous related studies.
2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236), 2001
... An overview of World Wide Web caching Mingkuan Liu Fei-Yue Wang Zeng, D. Lizhi Yang Dept. of ... more ... An overview of World Wide Web caching Mingkuan Liu Fei-Yue Wang Zeng, D. Lizhi Yang Dept. of Syst. & Ind. ... Abstract This paper studied the state-of-art in web caching schemes and techniques. An introduction on web caching was presented at the first part. ...
Extreme multi-label classification is a rapidly growing research area with many applications. In ... more Extreme multi-label classification is a rapidly growing research area with many applications. In this paper we propose a system design of extreme multi-label text classification (XMTC) on query classification in the e-commerce domain. Search query classification is more challenging than conventional document classification because queries are usually very short and often ambiguous. We design a hybrid model that uses a deep neural network for long queries and uses a Naive Bayes model for short queries. We formulate and apply new data augmentation techniques and create new evaluation metrics that are more suitable for the extreme multi-label task in e-commerce. We also design end-to-end system level evaluation methods to address the challenge in human judgment due to the extremely large label space. We compare our deep neural network model with the state-of-the-art method on our real e-commerce data and observe about a 15% improvement in the F1 score. The end-to-end system evaluation results show that our new system improves query classification performance for a variety of query sets. In particular, for the torso and tail queries in e-commerce, we see 0.3% and 1.1% improvements in the NDCG score.
Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP, 2018
The lack of high-quality labeled training data has been one of the critical challenges facing man... more The lack of high-quality labeled training data has been one of the critical challenges facing many industrial machine learning tasks. To tackle this challenge, in this paper, we propose a semi-supervised learning method to utilize unlabeled data and user feedback signals to improve the performance of ML models. The method employs a primary model M ain and an auxiliary evaluation model Eval, where M ain and Eval models are trained iteratively by automatically generating labeled data from unlabeled data and/or users feedback signals. The proposed approach is applied to different text classification tasks. We report results on both the publicly available Yahoo! Answers dataset and our e-commerce product classification dataset. The experimental results show that the proposed method reduces the classification error rate by 4% and up to 15% across various experimental setups and datasets. A detailed comparison with other semi-supervised learning approaches is also presented later in the paper. The results from various text classification tasks demonstrate that our method outperforms those developed in previous related studies.
2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236), 2001
... An overview of World Wide Web caching Mingkuan Liu Fei-Yue Wang Zeng, D. Lizhi Yang Dept. of ... more ... An overview of World Wide Web caching Mingkuan Liu Fei-Yue Wang Zeng, D. Lizhi Yang Dept. of Syst. & Ind. ... Abstract This paper studied the state-of-art in web caching schemes and techniques. An introduction on web caching was presented at the first part. ...
Uploads
Papers by mingkuan liu