paper 3 -- OnLineNewClassificationUsingMachineLearning

Uploaded by

sasobaid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

paper 3 -- OnLineNewClassificationUsingMachineLearning

Uploaded by

sasobaid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

paper 3 :- OnLineNewClassificationUsingMachineLearning

ABSTRACT

The paper addresses the increasing demand for automatically organizing large amounts of
unstructured online data, particularly news articles. It uses supervised machine learning to sort
these articles into categories like politics, sports, and entertainment. With a dataset of 75,000
articles, several classifiers were tested, and the Naive Bayes classifier stood out, achieving 93%
accuracy, proving its effectiveness for this task

INTRODUCTION

The paper highlights the rapid increase in digital content and the difficulties in organizing
unstructured online data efficiently. It explains how automatic text classification is essential for
applications like search engines, content summarization, and question-answering systems. The
paper used supervised learning to deal with the variety of sources, writing styles, and
vocabularies found in news articles. Their goal was to personalize content for users by sorting
articles into categories such as crime, sports, politics, and entertainment.

TECHNIQUES USED

It is using many techniques such as: -

1. Data Preprocessing: Tokenization using Python NLTK, Stop-word removal, Label encoding
to convert categorical data to numerical labels.
2. Dataset Preparation: Utilized a dataset with 75,000 news articles from Huff Post, split
70% for training and 30% for testing.
3. Train-Test Splits and Cross-Validation: Performed 10-fold cross-validation to minimize
bias.
4. Evaluation Metrics:

MODELS USED

In the paper many models were used such as: -

1. Naive Bayes (NB): Best-performing model with 93% accuracy.
2. Logistic Regression (LR): Moderate accuracy (81%).
3. Support Vector Machine (SVM): Lower accuracy compared to NB.
4. k-Nearest Neighbors (KNN): Lowest accuracy (72%).

RESULTS

The results were as the following: -

1. Naive Bayes excelled with the highest accuracy, precision, and recall at 93%.
2. Logistic Regression achieved 81% accuracy
3. SVM and KNN underperformed, with 76% and 72% accuracy, respectively
USE AI FOR ARTICLE CLASSIFICATION

We can summarize the AI using as the following: -

1. Machine learning algorithms: - used for single-label classification.

2. Data preprocessing, feature extraction, and vectorization: - used to prepare textual data.
3. AI enabled the categorization of articles into predefined labels based on the content.
4. The models trained and evaluated using large datasets to achieve robust predictions

CONCLUSION

The study highlights the effectiveness of Naive Bayes for classifying news articles in addition to
the importance of text preprocessing and dataset quality in achieving high classification
accuracy. Future improvements may include extending the work to regional languages and
experimenting with more sophisticated algorithms.

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification
No ratings yet
A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification
16 pages
Moodle 4.3 User Guide PDF
No ratings yet
Moodle 4.3 User Guide PDF
125 pages
M4 - Introduction To Kubernetes Workloads v1.7
No ratings yet
M4 - Introduction To Kubernetes Workloads v1.7
107 pages
VESDA Commissioning Forms VLP Rev01
100% (1)
VESDA Commissioning Forms VLP Rev01
7 pages
Libronix Bible Lbxfile - C - /program Files (x86) /libronix DLS/Com
100% (1)
Libronix Bible Lbxfile - C - /program Files (x86) /libronix DLS/Com
1 page
Checklist Edp
100% (1)
Checklist Edp
5 pages
paper 1-- 1662-Article Text-12759-12507-10-20210526
No ratings yet
paper 1-- 1662-Article Text-12759-12507-10-20210526
2 pages
Comparison of Text Classifiers On News Articles
No ratings yet
Comparison of Text Classifiers On News Articles
5 pages
A New Text Mining Approach Based On HMM-SVM For Web News Classification
No ratings yet
A New Text Mining Approach Based On HMM-SVM For Web News Classification
8 pages
19_ArticleClassificationusingNaturalLanguageProcessingandMachineLearning
No ratings yet
19_ArticleClassificationusingNaturalLanguageProcessingandMachineLearning
8 pages
Researchpaperclassification IEEEprocedding 1
No ratings yet
Researchpaperclassification IEEEprocedding 1
7 pages
17 Result Analysis NLP
No ratings yet
17 Result Analysis NLP
13 pages
Project Proposal - Group 17-2-5
No ratings yet
Project Proposal - Group 17-2-5
4 pages
Perfect
No ratings yet
Perfect
10 pages
Bogery Et Al. - 2019 - Automatic Semantic Categorization of News Headline
No ratings yet
Bogery Et Al. - 2019 - Automatic Semantic Categorization of News Headline
8 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
A Complete Process of Text Classification System Using State‐of‐the‐Art NLP Models
No ratings yet
A Complete Process of Text Classification System Using State‐of‐the‐Art NLP Models
26 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
IEEE-paper on NLP
No ratings yet
IEEE-paper on NLP
3 pages
127 1498038923 - 21-06-2017 PDF
No ratings yet
127 1498038923 - 21-06-2017 PDF
9 pages
Science Research Journal
No ratings yet
Science Research Journal
7 pages
Machine Learning in Automated Text Categorization
No ratings yet
Machine Learning in Automated Text Categorization
55 pages
A Survey On Machine Learning Techniques
No ratings yet
A Survey On Machine Learning Techniques
8 pages
research paper 3
No ratings yet
research paper 3
7 pages
Text Classification Based on Machine Learning and
No ratings yet
Text Classification Based on Machine Learning and
12 pages
Group08_BDM01_Topic-Modelling-in-Text-Classification
No ratings yet
Group08_BDM01_Topic-Modelling-in-Text-Classification
19 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
27 pages
17 - Project Report - NLP-2-27
No ratings yet
17 - Project Report - NLP-2-27
26 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Deep Learning
No ratings yet
Deep Learning
42 pages
text classification research paper 2
No ratings yet
text classification research paper 2
7 pages
Unit-3
No ratings yet
Unit-3
27 pages
News Catagorization System
No ratings yet
News Catagorization System
6 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Report Rohun Sjmoon
No ratings yet
Report Rohun Sjmoon
6 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
News Classification Using Machine Learning
No ratings yet
News Classification Using Machine Learning
5 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
IR - Group1
No ratings yet
IR - Group1
27 pages
END CRP
No ratings yet
END CRP
26 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
4 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Unit 2
No ratings yet
Unit 2
26 pages
DM Chapter 0
No ratings yet
DM Chapter 0
4 pages
Lect05
No ratings yet
Lect05
17 pages
Text Classification MLND Project Report Prasann Pandya
No ratings yet
Text Classification MLND Project Report Prasann Pandya
17 pages
NLP m4
No ratings yet
NLP m4
97 pages
Supervised Learning - A Systematic Literature Review
No ratings yet
Supervised Learning - A Systematic Literature Review
22 pages
An Automatic Document Classifier System Based On Genetic Algorithm and Taxonomy
No ratings yet
An Automatic Document Classifier System Based On Genetic Algorithm and Taxonomy
8 pages
Name: Tran Nguyen Anh Thoai: Course Code: Courseword Leader: Due Date: Centre: Greenwich, HCMC Word
No ratings yet
Name: Tran Nguyen Anh Thoai: Course Code: Courseword Leader: Due Date: Centre: Greenwich, HCMC Word
53 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Irs Unit 4 CH 1
No ratings yet
Irs Unit 4 CH 1
58 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
News Classsification
No ratings yet
News Classsification
11 pages
ML Summer Training
No ratings yet
ML Summer Training
20 pages
Automatic Irony and Sarcasm Detection in Socmed
No ratings yet
Automatic Irony and Sarcasm Detection in Socmed
49 pages
Text categorization Performance examination using machine learning algorithms (PRINTED)
No ratings yet
Text categorization Performance examination using machine learning algorithms (PRINTED)
6 pages
text classification reseach paper
No ratings yet
text classification reseach paper
4 pages
Machine Learning Telugu
No ratings yet
Machine Learning Telugu
9 pages
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
No ratings yet
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
55 pages
What Is AI 1610590751
No ratings yet
What Is AI 1610590751
8 pages
TM05
No ratings yet
TM05
21 pages
mining text data and classificatin
No ratings yet
mining text data and classificatin
4 pages
111 1460444112 - 12-04-2016 PDF
No ratings yet
111 1460444112 - 12-04-2016 PDF
7 pages
Musa IEEE
No ratings yet
Musa IEEE
6 pages
PAYMENT CERTIFICATE No. 25 FOR BEBAN HOTEL PROJECT- 25-04-2022
No ratings yet
PAYMENT CERTIFICATE No. 25 FOR BEBAN HOTEL PROJECT- 25-04-2022
408 pages
scraped_data
No ratings yet
scraped_data
8 pages
Finance for Startups
No ratings yet
Finance for Startups
57 pages
step by step guide to raising fund from angel investors
No ratings yet
step by step guide to raising fund from angel investors
54 pages
Equity Funding
No ratings yet
Equity Funding
52 pages
Eemium and Subscription Business Model Section 3
No ratings yet
Eemium and Subscription Business Model Section 3
38 pages
What Is A Platform?
No ratings yet
What Is A Platform?
49 pages
CH 14
No ratings yet
CH 14
59 pages
9.-Soluciones Industriales para El Oil&amp Gas
No ratings yet
9.-Soluciones Industriales para El Oil&amp Gas
22 pages
How To... Verify The Variable Input
No ratings yet
How To... Verify The Variable Input
9 pages
Getting Real - 37 Signals
100% (1)
Getting Real - 37 Signals
177 pages
User Manual English (UK) Edition X.6-13: Tank Designer
No ratings yet
User Manual English (UK) Edition X.6-13: Tank Designer
84 pages
Eliminate Mobility Compromises: HP Helion Mobility Cloud File Management With HP Storeall
No ratings yet
Eliminate Mobility Compromises: HP Helion Mobility Cloud File Management With HP Storeall
2 pages
Clustering Algorithms SciKit Learn 1705740354
No ratings yet
Clustering Algorithms SciKit Learn 1705740354
22 pages
2
No ratings yet
2
2 pages
DIGITAL TWIN IN INDUSTRY 4.0
No ratings yet
DIGITAL TWIN IN INDUSTRY 4.0
9 pages
Spos DJB+SSC Te Comp 15
No ratings yet
Spos DJB+SSC Te Comp 15
102 pages
Ebook A Users Guide To Intelligent Document Processing
No ratings yet
Ebook A Users Guide To Intelligent Document Processing
24 pages
Active Directory Replication
No ratings yet
Active Directory Replication
63 pages
Febook HTML
No ratings yet
Febook HTML
37 pages
Cisco UCS Mini Blade Server Chassis: Spec Sheet
No ratings yet
Cisco UCS Mini Blade Server Chassis: Spec Sheet
46 pages
Module 1 - Software Process Models
No ratings yet
Module 1 - Software Process Models
7 pages
Multiwii Software GUIDE
No ratings yet
Multiwii Software GUIDE
10 pages
IMC05-CAN, Version 2.0: March 2000
No ratings yet
IMC05-CAN, Version 2.0: March 2000
68 pages
Mazak M32, M32A & M32B Re-Initialization Procedure
100% (2)
Mazak M32, M32A & M32B Re-Initialization Procedure
5 pages
IARE OOPS Lecture-Notes
No ratings yet
IARE OOPS Lecture-Notes
119 pages
SeeLeveL-709-Series-Display-Manual_v1.1
No ratings yet
SeeLeveL-709-Series-Display-Manual_v1.1
18 pages
iot 2,3,4
No ratings yet
iot 2,3,4
32 pages
CCNA Security v2.0 Final Exam Answers 100 1 PDF
100% (3)
CCNA Security v2.0 Final Exam Answers 100 1 PDF
26 pages
Team Feasibility
No ratings yet
Team Feasibility
1 page
NAVORI Player 5 English
No ratings yet
NAVORI Player 5 English
2 pages
Sneh Project
No ratings yet
Sneh Project
68 pages