Skip to content

Commit 74d573e

Browse files
committed
added youtube extractor tutorial
1 parent cda0d19 commit 74d573e

File tree

4 files changed

+80
-0
lines changed

4 files changed

+80
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,4 +34,5 @@ This is a repository of all the tutorials of [The Python Code](https://www.thepy
3434
- [How to Make a Process Monitor in Python](https://www.thepythoncode.com/article/make-process-monitor-python). ([code](general/process-monitor))
3535
- [How to Make a Screen Recorder in Python](https://www.thepythoncode.com/article/make-screen-recorder-python). ([code](general/screen-recorder))
3636
- [How to Access Wikipedia in Python](https://www.thepythoncode.com/article/access-wikipedia-python). ([code](general/wikipedia-extractor))
37+
- [How to Extract YouTube Data in Python](https://www.thepythoncode.com/article/get-youtube-data-python). ([code](general/youtube-extractor))
3738

general/youtube-extractor/README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# [How to Extract YouTube Data in Python](https://www.thepythoncode.com/article/get-youtube-data-python)
2+
To run this:
3+
- `pip3 install -r requirements.txt`
4+
-
5+
```
6+
python extract_video_info.py https://www.youtube.com/watch?v=jNQXAC9IVRw
7+
```
8+
**Output:**
9+
```
10+
Title: Me at the zoo
11+
Views: 75910120
12+
13+
Description: The first video on YouTube. Maybe it's time to go back to the zoo?sub2sub kthxbai -- fast and loyal if not i get a subs back i will unsubs your cahnnel(Credit: The name of the music playing in the background is Darude - Sandstorm)
14+
15+
Published on Apr 23, 2005
16+
Likes: 2337841
17+
Dislikes: 81211
18+
19+
Channel Name: jawed
20+
Channel URL: https://www.youtube.com/channel/UC4QobU6STFB0P71PMvOGN5A
21+
Channel Subscribers: 616K
22+
```
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
import requests
2+
from bs4 import BeautifulSoup as bs
3+
4+
5+
def get_video_info(url):
6+
# download HTML code
7+
content = requests.get(url)
8+
# create beautiful soup object to parse HTML
9+
soup = bs(content.content, "html.parser")
10+
# initialize the result
11+
result = {}
12+
# video title
13+
result['title'] = soup.find("span", attrs={"class": "watch-title"}).text.strip()
14+
# video views (converted to integer)
15+
result['views'] = int(soup.find("div", attrs={"class": "watch-view-count"}).text[:-6].replace(",", ""))
16+
# video description
17+
result['description'] = soup.find("p", attrs={"id": "eow-description"}).text
18+
# date published
19+
result['date_published'] = soup.find("strong", attrs={"class": "watch-time-text"}).text
20+
# number of likes as integer
21+
result['likes'] = int(soup.find("button", attrs={"title": "I like this"}).text.replace(",", ""))
22+
# number of dislikes as integer
23+
result['dislikes'] = int(soup.find("button", attrs={"title": "I dislike this"}).text.replace(",", ""))
24+
# channel details
25+
channel_tag = soup.find("div", attrs={"class": "yt-user-info"}).find("a")
26+
# channel name
27+
channel_name = channel_tag.text
28+
# channel URL
29+
channel_url = f"https://www.youtube.com{channel_tag['href']}"
30+
# number of subscribers as str
31+
channel_subscribers = soup.find("span", attrs={"class": "yt-subscriber-count"}).text.strip()
32+
result['channel'] = {'name': channel_name, 'url': channel_url, 'subscribers': channel_subscribers}
33+
return result
34+
35+
if __name__ == "__main__":
36+
import argparse
37+
parser = argparse.ArgumentParser(description="YouTube Video Data Extractor")
38+
parser.add_argument("url", help="URL of the YouTube video")
39+
40+
args = parser.parse_args()
41+
# parse the video URL from command line
42+
url = args.url
43+
44+
data = get_video_info(url)
45+
46+
# print in nice format
47+
print(f"Title: {data['title']}")
48+
print(f"Views: {data['views']}")
49+
print(f"\nDescription: {data['description']}\n")
50+
print(data['date_published'])
51+
print(f"Likes: {data['likes']}")
52+
print(f"Dislikes: {data['dislikes']}")
53+
print(f"\nChannel Name: {data['channel']['name']}")
54+
print(f"Channel URL: {data['channel']['url']}")
55+
print(f"Channel Subscribers: {data['channel']['subscribers']}")
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
requests
2+
bs4

0 commit comments

Comments
 (0)