Assignment 5 - Text Web and Social Media Analytics

This document discusses how textual data can be captured from websites automatically using web scraping techniques. It explains that web scrapers use crawlers to start from a seed URL and follow instructions to crawl websites and extract specified data from pages to a repository. The extracted unstructured data can then be cleaned, deduplicated, structured and imported into databases and analytics systems for analysis. Potential patterns that can be extracted include structured data points specified during the crawling process.

Uploaded by

MKWD NRWM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views2 pages

Assignment 5 - Text Web and Social Media Analytics

Uploaded by

MKWD NRWM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

University of the Immaculate Conception

Jacinto St., Davao City

BA 315 - BUSINESS INTELLIGENCE AND ANALYTICS

Prepared by: RODIBEE B. ROJO

ROLAN SEAN U. MAGAWAY
RICHARD D. SISION

Submitted to: RAYMOND PIDOR

GROUP ASSIGNMENT: 5_TEXT, WEB, AND SOCIAL MEDIA ANALYTICS

Instructions:

Examine how textual data can be captured automatically using Web-based technologies. Once
captured, what are the potential patterns that you can extract from these unstructured data sources?

Answer:

The technique of data extraction from the web automatically is called web scraping. It is used to
capture data in huge amounts from the websites on the internet and also it can be accessed only
through the web browser.

Web data is of great use for the e-commerce portals, media companies and also in research firms, data
scientists and even sometimes the government make use of these techniques.

There are many web-based technologies being utilized to make use of this data extraction approaches.
Some of them are as follows:
 DaaS (Desktop as a Service)
 In-house data extraction
 Many freely available data extraction tools

How do this work?

First of all for extracting data from the websites we make use of the bots which are called crawlers. A
crawler always starts from a URL which we will be providing to it. Hence, the first and foremost thing
to do in data extraction is to know which website we want to crawl and pass the URL to the crawler as
“Seed URL”.

Once, the crawler has fetched the seed URL then comes to time for feeding the bot with the directions
to follow while crawling the website. This instruction or directions will help in making the crawler
look for the data which is required.

After the directions, we need to make the crawler understand the depth of the website and to reach the
pages from where the data needs to be extracted. As soon as the crawler understands the depth of the
website the ultimate next job is to compile all the pages and then save it to the repository.
After this, the crawler will scrape through all the saved pages and will only scrape the required data
points. This is the time when we need to instruct the crawler to only pick the data which is required
and ignore the rest of the data.

After all of this data is captured at the crawler’s end then there is a need to check the data and then de-
duplicate all the data that is being resided in the fresh data. It will also delete the unwanted HTML
tags or the text that is getting scraped along with the relevant data.

The last and final step is to structure the data so that it can be used by the database and analytics to
provide machine-readable syntax and also import it to the database of the analytics system.

Hence, this is how data is being captured automatically from the websites using the different web-
based techniques and the formats that can be saved are XML, JSON, CSV, HTML, TXT, etc.

10 Working With Data Storage and Animation
No ratings yet
10 Working With Data Storage and Animation
31 pages
RAG Chatbot for college Support project report
No ratings yet
RAG Chatbot for college Support project report
40 pages
List of Registered Voters by Municipality B172
No ratings yet
List of Registered Voters by Municipality B172
287 pages
Text-Processing-For-NLP-Web-Scrapping (5)
No ratings yet
Text-Processing-For-NLP-Web-Scrapping (5)
18 pages
Module 6 Lecture 1(Advance Topics)
No ratings yet
Module 6 Lecture 1(Advance Topics)
18 pages
Ebraindumps Oracle 1z0 076 Exam Dumps by Edwards 24 05 2024 12qa
No ratings yet
Ebraindumps Oracle 1z0 076 Exam Dumps by Edwards 24 05 2024 12qa
26 pages
Data_Science
No ratings yet
Data_Science
207 pages
Compiled Ppt Pc4
No ratings yet
Compiled Ppt Pc4
74 pages
Document Management System (DMS) Release 4.5 User Guide: Reference: SW - 172 (PMIS/7631) Issue: 4.5
No ratings yet
Document Management System (DMS) Release 4.5 User Guide: Reference: SW - 172 (PMIS/7631) Issue: 4.5
46 pages
Web Scraping Ganesh
0% (1)
Web Scraping Ganesh
20 pages
WEBSCRAping Buildwithpython
No ratings yet
WEBSCRAping Buildwithpython
78 pages
Web Scraping
No ratings yet
Web Scraping
5 pages
Cs9152 DBT Unit IV Notes
No ratings yet
Cs9152 DBT Unit IV Notes
61 pages
Scraping
100% (1)
Scraping
25 pages
Chapter 6 - Test Bank Chapter 6 - Test Bank
No ratings yet
Chapter 6 - Test Bank Chapter 6 - Test Bank
28 pages
TDS C01
No ratings yet
TDS C01
29 pages
Web Data Extractors
No ratings yet
Web Data Extractors
26 pages
Dbms Query Evaluation
No ratings yet
Dbms Query Evaluation
28 pages
Database Design: Presentation By
No ratings yet
Database Design: Presentation By
30 pages
File Management
No ratings yet
File Management
18 pages
Data Scraping
No ratings yet
Data Scraping
14 pages
UNIT IV (1)-1
No ratings yet
UNIT IV (1)-1
11 pages
1.8 Data Scrapping PDF
No ratings yet
1.8 Data Scrapping PDF
42 pages
An Overview of Physical Storage Media
No ratings yet
An Overview of Physical Storage Media
17 pages
Data Analysis by Web Scraping Using Python
No ratings yet
Data Analysis by Web Scraping Using Python
6 pages
Keyword Research Checklist: Step 1 Status
89% (9)
Keyword Research Checklist: Step 1 Status
3 pages
Data Collection
No ratings yet
Data Collection
10 pages
Analysis of Different Web Data Extraction Techniques
No ratings yet
Analysis of Different Web Data Extraction Techniques
7 pages
Rohan report
No ratings yet
Rohan report
25 pages
Practical Web Scraping for Economists 1744341390
No ratings yet
Practical Web Scraping for Economists 1744341390
33 pages
Web Crawling - python
No ratings yet
Web Crawling - python
34 pages
Introduction To Web Scraping
100% (1)
Introduction To Web Scraping
3 pages
DBMS Chapter 1
No ratings yet
DBMS Chapter 1
14 pages
Document For Scribd
No ratings yet
Document For Scribd
54 pages
I) Web Crawling: Yash Pahlani D17B 49
No ratings yet
I) Web Crawling: Yash Pahlani D17B 49
7 pages
GoldenGate POC
No ratings yet
GoldenGate POC
13 pages
Seminar Completed
No ratings yet
Seminar Completed
22 pages
Web Scraping 101
No ratings yet
Web Scraping 101
5 pages
Final Publish Paper
No ratings yet
Final Publish Paper
4 pages
Sing Rodia 2019
No ratings yet
Sing Rodia 2019
6 pages
Abstract: YSPM'S YTC, Faculty of MCA, Satara. 1
No ratings yet
Abstract: YSPM'S YTC, Faculty of MCA, Satara. 1
15 pages
Arindam Manna, Financial Analytics
No ratings yet
Arindam Manna, Financial Analytics
9 pages
Semin
No ratings yet
Semin
8 pages
Upadhyay (2017) - Articulating The Construction of A Web Scraper For
No ratings yet
Upadhyay (2017) - Articulating The Construction of A Web Scraper For
4 pages
Web Scrapping
No ratings yet
Web Scrapping
1 page
web crawling
No ratings yet
web crawling
2 pages
Web Scraping - Notes - 321
No ratings yet
Web Scraping - Notes - 321
3 pages
Web Scraping
No ratings yet
Web Scraping
4 pages
Com 059
No ratings yet
Com 059
6 pages
001 FIDC-WB03 -@udemyrip
No ratings yet
001 FIDC-WB03 -@udemyrip
7 pages
1747399713103-1747037056197-webscraping
No ratings yet
1747399713103-1747037056197-webscraping
12 pages
IMS Status Codes
No ratings yet
IMS Status Codes
5 pages
Implementation of Web Application For Disease Prediction Using AI
No ratings yet
Implementation of Web Application For Disease Prediction Using AI
5 pages
1.to Perform Menu Driven Code For Searching Techniques
No ratings yet
1.to Perform Menu Driven Code For Searching Techniques
5 pages
20 - 3 - A Study
No ratings yet
20 - 3 - A Study
5 pages
MCQ Question Sub: Database Fundamentals Chapter (1-7)
No ratings yet
MCQ Question Sub: Database Fundamentals Chapter (1-7)
9 pages
12 Answer Key
No ratings yet
12 Answer Key
3 pages
GAT Feedback Form (Responses)
No ratings yet
GAT Feedback Form (Responses)
2 pages
Resume Profile
No ratings yet
Resume Profile
2 pages
Big Data Scope and Challenges
No ratings yet
Big Data Scope and Challenges
3 pages
SQL Course Content
No ratings yet
SQL Course Content
2 pages
10 SQL TOP, LIMIT or ROWNUM Clause
No ratings yet
10 SQL TOP, LIMIT or ROWNUM Clause
2 pages
Referential Integritydefinition
No ratings yet
Referential Integritydefinition
3 pages
SEO Mastery 2025 #1 Workbook to Learn Secret Search Engine Optimization Strategies to Boost and Improve Your Organic Search Ranking
From Everand
SEO Mastery 2025 #1 Workbook to Learn Secret Search Engine Optimization Strategies to Boost and Improve Your Organic Search Ranking
Matthew Michaels
No ratings yet
Web Applications using JSP (Java Server Page): Develop a fully functional web application
From Everand
Web Applications using JSP (Java Server Page): Develop a fully functional web application
P. Parthik
No ratings yet
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
The Invisible Network: OSINT and Social Engineering
From Everand
The Invisible Network: OSINT and Social Engineering
Mattia Vicenzi
No ratings yet
Exploring Web3
From Everand
Exploring Web3
Ayush Gupta
5/5 (3)
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Hands-on Data Virtualization with Polybase: Administer Big Data, SQL Queries and Data Accessibility Across Hadoop, Azure, Spark, Cassandra, MongoDB, CosmosDB, MySQL and PostgreSQL (English Edition)
From Everand
Hands-on Data Virtualization with Polybase: Administer Big Data, SQL Queries and Data Accessibility Across Hadoop, Azure, Spark, Cassandra, MongoDB, CosmosDB, MySQL and PostgreSQL (English Edition)
Pablo Alejandro Echeverria Barrios
No ratings yet
Python Machine Learning: A Practical Beginner's Guide to Understanding Machine Learning, Deep Learning and Neural Networks with Python, Scikit-Learn, Tensorflow and Keras
From Everand
Python Machine Learning: A Practical Beginner's Guide to Understanding Machine Learning, Deep Learning and Neural Networks with Python, Scikit-Learn, Tensorflow and Keras
Brandon Railey
No ratings yet
Web Scraping with Python Step by Step: A Practical Guide with Examples
From Everand
Web Scraping with Python Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
A Journey of Web Development With AI
From Everand
A Journey of Web Development With AI
Robiul Karim
No ratings yet
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
Professional DevExpress ASP.NET Controls
From Everand
Professional DevExpress ASP.NET Controls
Paul T. Kimmel
No ratings yet
Blockchain Data Analytics For Dummies
From Everand
Blockchain Data Analytics For Dummies
Michael G. Solomon
No ratings yet
The Web Circular
From Everand
The Web Circular
Prakash Hegade
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Create a Website with Wordpress: 6 Easy Steps to Build a Professional Website from Scratch
From Everand
Create a Website with Wordpress: 6 Easy Steps to Build a Professional Website from Scratch
No Limits Books
No ratings yet
The Browser Hacker's Handbook
From Everand
The Browser Hacker's Handbook
Wade Alcorn
No ratings yet
Reverse Image Search: Unlocking the Secrets of Visual Recognition
From Everand
Reverse Image Search: Unlocking the Secrets of Visual Recognition
Fouad Sabry
No ratings yet
Machine Learning with Tensorflow: A Deeper Look at Machine Learning with TensorFlow
From Everand
Machine Learning with Tensorflow: A Deeper Look at Machine Learning with TensorFlow
Frank Millstein
No ratings yet
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
From Everand
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
Dr. Hidaia Mahmood Alassouli
No ratings yet
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet