0% found this document useful (0 votes)
5 views

paper 3 -- OnLineNewClassificationUsingMachineLearning

Uploaded by

sasobaid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

paper 3 -- OnLineNewClassificationUsingMachineLearning

Uploaded by

sasobaid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

paper 3 :- OnLineNewClassificationUsingMachineLearning

ABSTRACT

The paper addresses the increasing demand for automatically organizing large amounts of
unstructured online data, particularly news articles. It uses supervised machine learning to sort
these articles into categories like politics, sports, and entertainment. With a dataset of 75,000
articles, several classifiers were tested, and the Naive Bayes classifier stood out, achieving 93%
accuracy, proving its effectiveness for this task

INTRODUCTION

The paper highlights the rapid increase in digital content and the difficulties in organizing
unstructured online data efficiently. It explains how automatic text classification is essential for
applications like search engines, content summarization, and question-answering systems. The
paper used supervised learning to deal with the variety of sources, writing styles, and
vocabularies found in news articles. Their goal was to personalize content for users by sorting
articles into categories such as crime, sports, politics, and entertainment.

TECHNIQUES USED

It is using many techniques such as: -


1. Data Preprocessing: Tokenization using Python NLTK, Stop-word removal, Label encoding
to convert categorical data to numerical labels.
2. Dataset Preparation: Utilized a dataset with 75,000 news articles from Huff Post, split
70% for training and 30% for testing.
3. Train-Test Splits and Cross-Validation: Performed 10-fold cross-validation to minimize
bias.
4. Evaluation Metrics:

MODELS USED

In the paper many models were used such as: -


1. Naive Bayes (NB): Best-performing model with 93% accuracy.
2. Logistic Regression (LR): Moderate accuracy (81%).
3. Support Vector Machine (SVM): Lower accuracy compared to NB.
4. k-Nearest Neighbors (KNN): Lowest accuracy (72%).

RESULTS

The results were as the following: -


1. Naive Bayes excelled with the highest accuracy, precision, and recall at 93%.
2. Logistic Regression achieved 81% accuracy
3. SVM and KNN underperformed, with 76% and 72% accuracy, respectively
USE AI FOR ARTICLE CLASSIFICATION

We can summarize the AI using as the following: -

1. Machine learning algorithms: - used for single-label classification.


2. Data preprocessing, feature extraction, and vectorization: - used to prepare textual data.
3. AI enabled the categorization of articles into predefined labels based on the content.
4. The models trained and evaluated using large datasets to achieve robust predictions

CONCLUSION

The study highlights the effectiveness of Naive Bayes for classifying news articles in addition to
the importance of text preprocessing and dataset quality in achieving high classification
accuracy. Future improvements may include extending the work to regional languages and
experimenting with more sophisticated algorithms.

You might also like