Scraping Instagram With Python

This document provides a step-by-step process to scrape Instagram images and data using Selenium and Beautiful Soup in Python. It describes how to: 1. Import necessary libraries and open a web browser using Selenium to a public Instagram profile or hashtag page. 2. Parse the HTML source code of the page using Beautiful Soup to extract image links from posts. 3. Get additional post details like user, likes, comments by opening each image link and extracting JSON data to a Pandas dataframe. 4. Download the images to a local directory using the links and shortcodes from the dataframe.

Uploaded by

Srujana Takkallapally

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

493 views

Scraping Instagram With Python

Uploaded by

Srujana Takkallapally

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Scraping Instagram with python

(using Selenium and Beautiful

Soup)
This article is about how to scrape Instagram to download
images/get information on posts from a public profile page or a
hashtag. The code uses both selenium and beautiful soup to scrape
Instagram images without much of a hassle of providing account
details or any authentication tokens.

1. Import dependencies

Pip Install selenium and download chrome driver from the

following link http://chromedriver.chromium.org/

from selenium import webdriver

from bs4 import BeautifulSoup as bs
import time
import re
from urllib.request import urlopen
import json
from pandas.io.json import json_normalize
import pandas as pd, numpy as np

2. Open the web browser: Selenium uses chrome driver to

open the profile given a username (public user). For example -

username='pickuplimes
browser = webdriver.Chrome('/path/to/chromedriver')
browser.get('https://www.instagram.com/'+username+'/?hl=en')
Pagelength = browser.execute_script("window.scrollTo(0,
document.body.scrollHeight);")
If you want to open a hashtag page -

hashtag='food'
browser = webdriver.Chrome('/path/to/chromedriver')
browser.get('https://www.instagram.com/explore/tags/'+hashtag)
Pagelength = browser.execute_script("window.scrollTo(0,
document.body.scrollHeight);")

3. Parse HTML source page: Open the source page and use
beautiful soup to parse it. Go through the body of html script and
extract link for each image in that page and pass it to an empty list
‘links[]’.

links=[]
source = browser.page_source
data=bs(source, 'html.parser')
body = data.find('body')
script = body.find('span')
for link in script.findAll('a'):
if re.match("/p", link.get('href')):
links.append('https://www.instagram.com'+link.get('href'))

Remember by default selenium opens only first page. If you want

to scroll through further pages and get more images divide the
scroll Height by a number and run the parse code multiple times.
This adds new links from each page to the list. For example -

Pagelength = browser.execute_script("window.scrollTo(0,
document.body.scrollHeight/1.5);")
links=[]
source = browser.page_source
data=bs(source, 'html.parser')
body = data.find('body')
script = body.find('span')
for link in script.findAll('a'):
if re.match("/p", link.get('href')):
links.append('https://www.instagram.com'+link.get('href'))
#sleep time is required. If you don't use this Instagram may interrupt the script and
doesn't scroll through pages
time.sleep(5)
Pagelength = browser.execute_script("window.scrollTo(document.body.scrollHeight/1.5,
document.body.scrollHeight/3.0);")
source = browser.page_source
data=bs(source, 'html.parser')
body = data.find('body')
script = body.find('span')
for link in script.findAll('a'):
if re.match("/p", link.get('href')):
links.append('https://www.instagram.com'+link.get('href'))

This may not be efficient way to scroll pages. I haven’t tried other
methods but you can check using end_cursor and has_next_page
= True or False and loop through it.

4. Get information for each image in the page: To get more

details of each image like who posted it, post type, image url,
image catpion, number of likes and comments etc. open the source
page of each image (from ‘links’ list in previous code) and extract
the JSON script to pandas dataframe.

result=pd.DataFrame()
for i in range(len(links)):
try:
page = urlopen(links[i]).read()
data=bs(page, 'html.parser')
body = data.find('body')
script = body.find('script')
raw = script.text.strip().replace('window._sharedData =', '').replace(';', '')
json_data=json.loads(raw)
posts =json_data['entry_data']['PostPage'][0]['graphql']
posts= json.dumps(posts)
posts = json.loads(posts)
x = pd.DataFrame.from_dict(json_normalize(posts), orient='columns')
x.columns = x.columns.str.replace("shortcode_media.", "")
result=result.append(x)

except:
np.nan
Just check for the duplicates
result = result.drop_duplicates(subset = 'shortcode')
result.index = range(len(result.index))

The columns you get might be slightly different for user profile
page and hashtag page. Checkout the columns and filter whatever
you need.

5. Download images from pandas data frame: Use requests

library to download images from the ‘display_url’ in pandas
‘result’ data frame and store them with respective shortcode as file
name.

(Important Note: Remember that you should respect author’s

rights when you download copyrighted content. Do not use
images/videos from Instagram for commercial intent).

import os
import requests
result.index = range(len(result.index))
directory="/directory/you/want/to/save/images/"
for i in range(len(result)):
r = requests.get(result['display_url'][i])
with open(directory+result['shortcode'][i]+".jpg", 'wb') as f:
f.write(r.content)

Thanks for reading and I hope you find this article useful. If you
have any questions, I’d be more than happy to discuss.

Namaste React
No ratings yet
Namaste React
117 pages
Themeforest Kalium Creative Theme For Professionals 3.5 - NULLED
No ratings yet
Themeforest Kalium Creative Theme For Professionals 3.5 - NULLED
4 pages
Django Allauth
100% (1)
Django Allauth
41 pages
React Cheatsheet PDF
No ratings yet
React Cheatsheet PDF
1 page
Vuepress Quick Start Guide PDF
0% (1)
Vuepress Quick Start Guide PDF
299 pages
LabHTML04 CSS Selectors and Rules
No ratings yet
LabHTML04 CSS Selectors and Rules
6 pages
Data Science Machine Learning
No ratings yet
Data Science Machine Learning
15 pages
Guides and Roadmaps
No ratings yet
Guides and Roadmaps
26 pages
Full download Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz pdf docx
No ratings yet
Full download Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz pdf docx
41 pages
React Router: Jogesh K. Muppala
No ratings yet
React Router: Jogesh K. Muppala
5 pages
Immediate download Progressive Web Apps with React Create Lightning Fast Web Apps With Native Power Using React and Firebase 1st Edition Scott Domes ebooks 2024
100% (2)
Immediate download Progressive Web Apps with React Create Lightning Fast Web Apps With Native Power Using React and Firebase 1st Edition Scott Domes ebooks 2024
55 pages
Scraping Book Python PDF
No ratings yet
Scraping Book Python PDF
50 pages
ReactJS, Logic Building Questions
No ratings yet
ReactJS, Logic Building Questions
5 pages
WebScraping Lessons 1
100% (1)
WebScraping Lessons 1
3 pages
React Assignment
No ratings yet
React Assignment
5 pages
Course Outline Python - 2024
No ratings yet
Course Outline Python - 2024
5 pages
OceanofPDF - Com The React Mega-Tutorial Learn Front End Development With React by Building A Complete Project Step-By-Step - Miguel Grinberg
No ratings yet
OceanofPDF - Com The React Mega-Tutorial Learn Front End Development With React by Building A Complete Project Step-By-Step - Miguel Grinberg
305 pages
Py App Document
No ratings yet
Py App Document
1,556 pages
How To Build A Shortened URL Service With WordPress Custom Post Type - Wptuts+
No ratings yet
How To Build A Shortened URL Service With WordPress Custom Post Type - Wptuts+
13 pages
Study Guide - WEB TECH 512 2020 PDF
No ratings yet
Study Guide - WEB TECH 512 2020 PDF
154 pages
Set Up React Native With Expo - ITNEXT
No ratings yet
Set Up React Native With Expo - ITNEXT
6 pages
65 JavaScript Interview Questions and Answers - The ULTIMATE List (PDF Download) - Web Code Geeks - 2016
No ratings yet
65 JavaScript Interview Questions and Answers - The ULTIMATE List (PDF Download) - Web Code Geeks - 2016
22 pages
Svelte Practice
No ratings yet
Svelte Practice
2 pages
React Js
No ratings yet
React Js
65 pages
The-Road-To-React-Your-Journey-To-Master-Plain-Yet-Pragmatic-React-2020 EDITION
No ratings yet
The-Road-To-React-Your-Journey-To-Master-Plain-Yet-Pragmatic-React-2020 EDITION
207 pages
Semanticweb Python
No ratings yet
Semanticweb Python
30 pages
Cheatsheet - Font Awesome - Brands
No ratings yet
Cheatsheet - Font Awesome - Brands
5 pages
Beginning Modern Javascript a Step by Step Gentle Guide to Learn
No ratings yet
Beginning Modern Javascript a Step by Step Gentle Guide to Learn
186 pages
Automation Testing: by Balaji
No ratings yet
Automation Testing: by Balaji
17 pages
Scraping
100% (1)
Scraping
25 pages
MERN Stack Technologies Used For Web Development
No ratings yet
MERN Stack Technologies Used For Web Development
10 pages
React JS
No ratings yet
React JS
49 pages
Web Dev Syllabus
No ratings yet
Web Dev Syllabus
5 pages
Data Scraping
No ratings yet
Data Scraping
63 pages
Hadoop Tutorial
No ratings yet
Hadoop Tutorial
58 pages
CSS Frameworks - The Ultimate Guide
No ratings yet
CSS Frameworks - The Ultimate Guide
511 pages
React Native Learn Once, Write Anywhere
No ratings yet
React Native Learn Once, Write Anywhere
8 pages
Node Cheat Sheet
No ratings yet
Node Cheat Sheet
6 pages
Data Loader Guide: Version 49.0, Summer '20
No ratings yet
Data Loader Guide: Version 49.0, Summer '20
56 pages
Salesforce Summer '19 Release Notes
No ratings yet
Salesforce Summer '19 Release Notes
551 pages
Data Analysis by Web Scraping Using Python
No ratings yet
Data Analysis by Web Scraping Using Python
6 pages
Hilary S One Page DevOps Engineer Resume PDF
No ratings yet
Hilary S One Page DevOps Engineer Resume PDF
1 page
MERN Stack Syllabus
No ratings yet
MERN Stack Syllabus
1 page
15 Tips For Designing Wordpress Sites Faster With Flywheel 1 PDF
No ratings yet
15 Tips For Designing Wordpress Sites Faster With Flywheel 1 PDF
19 pages
Akarshan Gupta: Work Experience Skills
No ratings yet
Akarshan Gupta: Work Experience Skills
1 page
Brana Web Guidline
No ratings yet
Brana Web Guidline
30 pages
Bootstrap Forms
No ratings yet
Bootstrap Forms
8 pages
How To Build A Real-Time Chat App With ReactJS and Firebase
No ratings yet
How To Build A Real-Time Chat App With ReactJS and Firebase
28 pages
A Sample Website Source Code
No ratings yet
A Sample Website Source Code
11 pages
Advanced JavaScript & HTML
No ratings yet
Advanced JavaScript & HTML
226 pages
50+ Best Web Development Tools and Resources PDF
No ratings yet
50+ Best Web Development Tools and Resources PDF
20 pages
What Is The Best Web Scraping Open Source
No ratings yet
What Is The Best Web Scraping Open Source
7 pages
Resume 3
No ratings yet
Resume 3
4 pages
Api-Demo: Platform-As-A-Service (Paas) Based Solution
No ratings yet
Api-Demo: Platform-As-A-Service (Paas) Based Solution
6 pages
Building Web Apps That Work Everywhere PDF
100% (1)
Building Web Apps That Work Everywhere PDF
62 pages
Harish Oraon
100% (1)
Harish Oraon
5 pages
Quick Guide for Smart Contracts Creation and Deployment on Ethereum Blockchain
From Everand
Quick Guide for Smart Contracts Creation and Deployment on Ethereum Blockchain
Dr. Hedaya Mahmood Alasooly
No ratings yet
Designing Next Generation Web Projects with CSS3
From Everand
Designing Next Generation Web Projects with CSS3
Sandro Paganotti
No ratings yet
Less Web Development Cookbook
From Everand
Less Web Development Cookbook
Bass Jobsen
No ratings yet
Instagram Affiliate Marketing: A Step-by-Step Guide to Passive Income
From Everand
Instagram Affiliate Marketing: A Step-by-Step Guide to Passive Income
Deepak
No ratings yet
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet
Sample DLP TVL Programming
No ratings yet
Sample DLP TVL Programming
3 pages
SEO Interview Questions and Answers
No ratings yet
SEO Interview Questions and Answers
4 pages
SAML Introduction
No ratings yet
SAML Introduction
9 pages
Men Salon Management System
No ratings yet
Men Salon Management System
45 pages
Cinput Name-"Firstname": Form Action "/target - HTML"
No ratings yet
Cinput Name-"Firstname": Form Action "/target - HTML"
11 pages
Complete 3 Months Frontend development
No ratings yet
Complete 3 Months Frontend development
3 pages
HTML CSS Project
100% (1)
HTML CSS Project
3 pages
Bilal Rukundi: Internship
No ratings yet
Bilal Rukundi: Internship
1 page
Core CSS:: by Molly E. Holzschlag
100% (1)
Core CSS:: by Molly E. Holzschlag
6 pages
AKhil New Resume
No ratings yet
AKhil New Resume
1 page
Flask
No ratings yet
Flask
3 pages
Javascript Cheat Sheet
100% (1)
Javascript Cheat Sheet
1 page
Django: Python Web Framework Rayland Jeans CSCI 5448
No ratings yet
Django: Python Web Framework Rayland Jeans CSCI 5448
40 pages
Icf Reviewer
No ratings yet
Icf Reviewer
4 pages
Prospects - Lead Grabber
No ratings yet
Prospects - Lead Grabber
13 pages
Form Processing in PHP: Dr. Charles Severance
No ratings yet
Form Processing in PHP: Dr. Charles Severance
49 pages
Cream Pink Creative Geometric Project Proposal Cover Page A4 (1)
No ratings yet
Cream Pink Creative Geometric Project Proposal Cover Page A4 (1)
7 pages
Chapter 5 - Web Development Process
No ratings yet
Chapter 5 - Web Development Process
16 pages
Popunder Scripts
No ratings yet
Popunder Scripts
3 pages
04 Backlink Analysis Report
No ratings yet
04 Backlink Analysis Report
110 pages
WT Manual
No ratings yet
WT Manual
103 pages
Full Stack Web Development Report
No ratings yet
Full Stack Web Development Report
35 pages
Lighthouse Report Viewer-Home-Desktop
No ratings yet
Lighthouse Report Viewer-Home-Desktop
1 page
CSS Summary
No ratings yet
CSS Summary
6 pages
FSD Module 1
No ratings yet
FSD Module 1
16 pages
HTML5 Input Types
No ratings yet
HTML5 Input Types
7 pages
Summer Project PPT (Vanshik)
No ratings yet
Summer Project PPT (Vanshik)
23 pages
Profile (1)
No ratings yet
Profile (1)
3 pages
Web Development
No ratings yet
Web Development
30 pages