Sma 2
Sma 2
Sma 2
UID:
2021700028, 2021700038, 2021700057, 2021700021
Experiment No. 2
PROBLEM
Data Collection-Select the social media platforms of your choice (Twitter, Facebook, LinkedIn,
STATEMENT : YouTube, Web blogs etc) ,connect to and capture social media data for business ( scraping,
crawling, parsing).
THEORY:
1. Scraping
Theory: Scraping, or web scraping, refers to the automated process of extracting data from
websites. This technique involves sending requests to a website and parsing the HTML content
to retrieve specific pieces of information.
Example: Suppose you want to collect the latest headlines from a news website. You can use a
scraping tool to visit the site, parse the HTML, and extract the text of the headlines.
Working:
● Request: A request is sent to the website's server using libraries like requests in
Python.
● Response: The server returns the HTML content of the page.
● Parsing: Libraries like BeautifulSoup or lxml are used to parse the HTML and
extract the desired data, such as tags containing headlines, prices, or other elements.
2. Crawling
Example: A search engine like Google uses crawlers to visit billions of web pages, following
links from one page to another to index the content for search results.
Working:
3. Parsing
Theory: Parsing is the process of analyzing a string of symbols, either in natural language, code,
or data formats, to understand its structure and extract meaningful information. In web data
collection, parsing typically refers to the extraction and processing of data from HTML, XML,
or JSON formats.
Example: When you receive JSON data from an API, you need to parse it to extract specific
fields, such as a video title or number of views from YouTube's API response.
Working:
● HTML Parsing: Tools like BeautifulSoup are used to parse HTML and extract
specific tags or attributes.
● JSON Parsing: JSON data can be parsed using native methods in most programming
languages. In Python, json.loads() is used to convert a JSON string into a
dictionary.
● XML Parsing: Similar to HTML, XML data can be parsed to extract structured
information, using libraries like ElementTree.
Practical Use Case in Business:
● Scraping: Collect pricing information from competitors' websites to adjust your pricing
strategy.
● Crawling: Gather product reviews from multiple e-commerce sites to analyze customer
sentiment.
● Parsing: Extract and analyze data from an API to monitor real-time changes in market
trends.
CODE:
import requests
import json
api_url = 'https://www.googleapis.com/youtube/v3/videos'
params = {
'part': 'snippet,contentDetails,statistics,status',
'id': 'hKR57pX7-fY',
'key': 'AIzaSyAUmLri_CuNCZN5toPEzDX6gfk5yRwL2TM'
data = response.json()
print(json.dumps(data, indent=4))
video_data = data['items'][0]
title = video_data['snippet']['title']
description = video_data['snippet']['description']
published_at = video_data['snippet']['publishedAt']
channel_title = video_data['snippet']['channelTitle']
tags = video_data['snippet'].get('tags', [])
duration = video_data['contentDetails']['duration']
definition = video_data['contentDetails']['definition']
caption_status = video_data['contentDetails']['caption']
views = video_data['statistics'].get('viewCount')
likes = video_data['statistics'].get('likeCount')
comment_count = video_data['statistics'].get('commentCount')
privacy_status = video_data['status']['privacyStatus']
license_status = video_data['status']['license']
print(f"Title: {title}")
print(f"Description: {description}")
print(f"Tags: {tags}")
print(f"Duration: {duration}")
print(f"Definition: {definition}")
print(f"Views: {views}")
print(f"Likes: {likes}")
print(f"Dislikes: {dislikes}")
else:
OUTPUT: {
"kind": "youtube#videoListResponse",
"etag": "fW5nFjrg_8GRFUUSXOCOQIzthQ0",
"items": [
{
"kind": "youtube#video",
"etag": "9PKFxU5RVlM5ljTxq7JXZPdyu_Q",
"id": "hKR57pX7-fY",
"snippet": {
"publishedAt": "2024-08-21T13:00:58Z",
"channelId": "UCtxD0x6AuNNqdXO9Wp5GHew",
"title": "This is how I manage the pressure of being Cristiano",
"description": "Cristiano explains what it's like to be the most watched person
in the world and what that responsibility entails",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/hKR57pX7-fY/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/hKR57pX7-fY/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/hKR57pX7-fY/hqdefault.jpg",
"width": 480,
"height": 360
},
"standard": {
"url": "https://i.ytimg.com/vi/hKR57pX7-fY/sddefault.jpg",
"width": 640,
"height": 480
},
"maxres": {
"url": "https://i.ytimg.com/vi/hKR57pX7-fY/maxresdefault.jpg",
"width": 1280,
"height": 720
}
},
"channelTitle": "UR \u00b7 Cristiano",
"categoryId": "22",
"liveBroadcastContent": "none",
"localized": {
"title": "This is how I manage the pressure of being Cristiano",
"description": "Cristiano explains what it's like to be the most watched
person in the world and what that responsibility entails"
},
"defaultAudioLanguage": "pt"
},
"contentDetails": {
"duration": "PT1M9S",
"dimension": "2d",
"definition": "hd",
"caption": "true",
"licensedContent": true,
"contentRating": {},
"projection": "rectangular"
},
"status": {
"uploadStatus": "processed",
"privacyStatus": "public",
"license": "youtube",
"embeddable": true,
"publicStatsViewable": true,
"madeForKids": false
},
"statistics": {
"viewCount": "3835717",
"likeCount": "643439",
"favoriteCount": "0",
"commentCount": "20451"
}
}
],
"pageInfo": {
"totalResults": 1,
"resultsPerPage": 1
}
}
Title: This is how I manage the pressure of being Cristiano
Description: Cristiano explains what it's like to be the most watched person in the world
and what that responsibility entails
Published At: 2024-08-21T13:00:58Z
Channel Title: UR · Cristiano
Tags: []
Duration: PT1M9S
Definition: hd
Caption Status: true
Views: 3835717
Likes: 643439
Dislikes: N/A
Comment Count: 20451
Privacy Status: public
License Status: youtube
The experiment effectively demonstrated the process of collecting data from various social
CONCLUSION:
media platforms for business analysis. Despite technical challenges, the ability to capture and
analyze diverse data types provides valuable insights for informed decision-making and strategic
planning.