What We Instagram

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media

What We Instagram:
A First Analysis of Instagram Photo Content and User Types

Yuheng Hu Lydia Manikonda Subbarao Kambhampati


Department of Computer Science, Arizona State University, Tempe AZ 85281
{yuhenghu, lmanikon, rao}@asu.edu

Abstract ple usually post on Instagram?, What are the differences


between users in terms of the their posted photos?, and
Instagram is a relatively new form of communication where
users can easily share their updates by taking photos and
How are these differences between users’s photos related
tweaking them using filters. It has seen rapid growth in the to other user characteristics, such as the number of follow-
number of users as well as uploads since it was launched in ers? remain open and untouched. We advocate that In-
October 2010. In spite of the fact that it is the most popular stagram deserves attention from the research community
photo capturing and sharing application, it has attracted rel- that is comparable to the attention given to Twitter and
atively less attention from the research community. In this other social media platforms (Naaman, Boase, and Lai 2010;
paper, we present both qualitative and quantitative analysis on Ellison and others 2007). Having a deep understanding of
Instagram. We use computer vision techniques to examine the Instagram is important because it will help us gain deep in-
photo content. Based on that, we identify the different types of sights about social, cultural and environmental issues about
active users on Instagram using clustering. Our results reveal people’s activities (through the lens of their photos). After
several insights about Instagram which were never studied
before, that include: 1) Eight popular photos categories, 2)
all, a picture is worth a thousand words (in contrast, Twitter
Five distinct types of Instagram users in terms of their posted is mainly a text-based communication platform).
photos, and 3) A user’s audience (number of followers) is To address the gap, in this exploratory study, we aim to
independent of his/her shared photos on Instagram. To our acquire an initial understanding of the type of photos shared
knowledge, this is the first in-depth study of content and users by individuals on Instagram. To this end, we first crawl a
on Instagram.
large collection of photos and user profiles using Instagram
API. Next, with the help of computer vision techniques and
1 Introduction human coders, we conduct both quantitative and qualitative
Instagram, a mobile photo (and video) capturing and sharing analysis to examine the activity of users on Instagram. Based
service, has quickly emerged as a new medium in spotlight on our analysis, several insights about Instagram photos and
in the recent years. It provides users an instantaneous way users are revealed. First, we find that Instagram photos can be
to capture and share their life moments with friends through roughly categorized into eight types based on their content:
a series of (filter manipulated) pictures and videos. Since self-portraits, friends, activities, captioned photos (pictures
its launch in October 2010, it has attracted more than 150 with embedded text), food, gadgets, fashion, and pets, where
million active users, with an average of 55 million photos the first six types are much more popular. Furthermore, we
uploaded by users per day, and more than 16 billion photos discover that there exist five distinct types of users based
shared so far (Instagram 2013). The extraordinary success on the photos they posted. Lastly, we find that there are no
of Instagram corroborates the recent Pew report which states strong correlations between different types of users and their
that photos and videos have become the key social currencies characteristics (e.g., number of followers). This indicates
online (Rainie, Brenner, and Purcell 2012). that the size of a user’s audience (followers) is independent
Despite its popularity, to date, little research has been of his/her shared photos on Instagram.
focused on Instagram1 . Fundamental and critical ques- To the best of our knowledge, we believe this is the first
tions such as What types of photos and videos do peo- paper to conduct a deep analysis of photo content and user
Copyright c 2014, Association for the Advancement of Artificial activities and types on Instagram. In summary, the main
Intelligence (www.aaai.org). All rights reserved. contributions of this paper are:
1
We are aware of the small section of research on Instagram.
Among the handful ones, McCune investigated people’s motiva- • A characterization of the content of photos shared on In-
tions of using Instagram through a survey study of 23 Instagram stagram.
users (McCune 2011). On the other hand, researchers have ap-
plied visualization and cultural analytics on Instagram photos from
different cities in the world to trace their social and cultural differ- • An examination of how the content of photos is related to
ences (Hochman and Manovich 2013; Silva et al. 2013) user types and characteristics.

595
ers. Below, we first provide details about the dataset we
used, and later discuss how we develop a coding scheme for
categorizing the photos and the coding process.

3.1 Data Collection Methodology


To obtain a random sample of Instagram users and retrieve
their public photos, we first got the IDs of users who had
media (photos or videos) that appeared on Instagram’s public
timeline, which displays a subset of Instagram media that
was most popular at the moment. This process resulted in a
(a) (b) set of 37 unique users. By careful examination of each user
in this set, we found that these users were mostly celebrities
Figure 1: Interfaces of Instagram. (a) Instagram app home- (which may explain why their posts were popular). We then
page, (b) Transforming a photo using filters crawled the IDs of both their followers and friends, and later
merged these two lists to form one unified list that contained
95,343 unique seed users. Next, we built a random sample of
2 Background regular active Instagram users using this seed user list.
Instagram (Fig. 1) is a popular photo (video) capturing and Specifically, we operationalized the notion of regular ac-
sharing mobile application, with more than 150 million of tive users as those who are 1) not organizations, brands, or
registered users since its launch in October 2010. It offers spammers, and 2) had at least 30 friends, 30 followers, and
its users a unique way to post pictures and videos using had posted at least 60 photos.2 In practice, we found 13,951
their smartphones, apply different manipulation tools – 16 users (14.6% of the seed users) who satisfied those criteria,
filters – in order to transform the appearance of an image, out of which we randomly selected 50 users and downloaded
and share them instantly on multiple platforms (e.g., Twitter) their profiles, 20 recent photos (note that we cannot randomly
in addition to the user’s Instagram page. It also allows users download photos due to the limitations of Instagram API),
to add captions, hashtags using the # symbol to describe the and their social network (lists of friends and followers). We
pictures and videos, and tag or mention other users by using chose to sample only 50 users here since we are performing
the @ symbol (which effectively creates a link from their manual coding of their photos which is not feasible over large
posts to the referenced user’s account) before posting them. number of users. This dataset allows us to make predictions
In addition to its photo capturing and manipulation func- with a 95% confidence level and a 13% confidence interval
tions, Instagram also provides similar social connectivity as for typical users, accurate enough for the analysis in this
Twitter that allows a user to follow any number of other users, paper (i.e., the sample is representative).
called “friends”. On the other hand, the users following a
Instagram user are called “followers”. Instagram’s social 3.2 Content Categories and Coding Process
network is asymmetric, meaning that if a user A follows B, To characterize the types of photos posted on Instagram we
B need not follow A back. Besides, users can set their pri- used a grounded approach to thematize and code (i.e., cat-
vacy preferences such that their posted photos and videos egorize) a sample of 200 photos from 1,000 photos we ob-
are available only to the user’s followers that requires ap- tained (50 users by 20 photo per user). Coming up with good
proval from the user to be his/her follower. By default, their meaningful content categories is known to be challenging,
images and videos are public which means they are visible especially for images since they contain much richer features
to anyone using Instagram app or Instagram website. Users than text. Therefore, as an initial pass, we sought help from
consume photos and videos mostly by viewing a core page computer vision techniques to get an overview of what cat-
showing a “stream” of the latest photos and videos from egories exist in an efficient manner. Specifically, we first
all their friends, listed in reverse chronological order. They used the classical Scale Invariant Feature Transform (SIFT)
can also favorite or comment on these posts. Such actions algorithm (Lowe 1999) to detect and extract local discrimina-
will appear in referenced user’s “Updates” page so that users tive features from photos in the sample. The feature vectors
can keep track of “likes” and comments about their posts. for photos are of 128 dimensions. Following the standard
Given these functions, we regard Instagram as a kind of so- image vector quantization approach (i.e., SIFT feature clus-
cial awareness stream (Naaman, Boase, and Lai 2010) like tering (Szeliski 2011)), we obtained the codebook vectors for
other social media platforms such as Facebook and Twitter. each photo 3 . Finally, we used k-means clustering to obtain
2
3 Approach It is worth noting that during our crawling process, many users
(about 9.4%) changed their privacy settings from public to private
Our analysis based on the Instagram data collected using the which made their profiles and photos unretrievable.
Instagram API, is a qualitative categorization of Instagram 3
A photo I of a dog can have 125 SIFT features corresponding
photos; and a quantitative examination of users’ character- to the dog’s eyes, legs, ears and so on, which are expressed in terms
istics with respect to their photos. The data includes profile of the codebook vector (ofPsize n) as I =< C1 : f1 , C2 : f2 , C3 :
information, photos, captions and tags associated with photos, f3 , ..., Cn : fn >, where 0≤i≤n fi = 125 and Ci is the cluster
and users’ social network that includes friends and follow- of all the features about specific characteristic of an object in the

596
0.25 Category Exemplary Photos
Proportion of all categories 0.2 Friends (users posing
0.15 with others friends; At
0.1
least two human faces are
in the photo)
0.05

0
Friends Food Gadget Captioned Pet Activities Selfies Fashion Food (food, recipes,
Photo
cakes, drinks, etc.)
Figure 2: Proportion of Categories Gadget (electronic goods,
tools, motorbikes, cars,
etc.)
15 clusters of photos where the similarity between two photos
are calculated in terms of Euclidean distance between their Captioned Photo (pic-
codebook vectors. These clusters served as an initial set of tures with embed text,
our coding categories, where each photo belongs to only one memes, and so on)
category. Pet (animals like cats and
To further improve the quality of this automated catego- dogs which are the main
rization, we asked two human coders who are regular users objects in the picture)
of Instagram to independently examine photos in each one of Activity (both outdoor &
the 15 categories. They analyzed the affinity of the themes indoor activities, places
within the category and across categories, and manually ad- where activities happen,
justed categories if necessary (i.e., move photos to a more e.g., concert, landmarks)
appropriate category or merge two categories if their themes Selfie (self-portraits; only
are overlapped). Finally, through a discussion session where one human face is present
the two coders exchanged their coding results, discussed their in the photo)
categories and resolved their conflicts, we concluded with Fashion (shoes, costumes,
8-category coding scheme of photos (see Table 1) where both makeup, personal belong-
coders agreed on, i.e., the Fleiss’ kappa is κ = 1 . It is impor- ings, etc.)
tant to note that the stated goal of our coding was to manually
provide a descriptive evaluation of photo content, not to hy- Table 1: 8 Photo Categories
pothesize on the motivation of the user who is posting the
photos. We start with RQ1. Fig. 2 shows the different proportions
Based on our 8-category coding scheme, the two coders of photo categories. As shown in this figure, nearly half
independently categorized the rest of the 800 photos based on (46.6%) of the photos in our dataset belong to Selfies and
their main themes and their descriptions and hashtags if any Friends categories with slightly more self-portraits (24.2%
(e.g., if a photo has a girl with her dog, and the description vs. 22.4%). We also notice that Pet and Fashion are the least
of this photo is “look at my cute dog”, then this photo is popular categories with less than 5% of the total number of
categorized into “Pet” category). The coders were asked to images. This corroborates with some of the recent discoveries
assign a single category to each photo (i.e., we avoid dual in popular news media4 . Other categories – Food, Gadget and
assignment). The initial Fleiss’ kappa is κ = 0.75. To resolve Captioned photo contributes to more than 10% individually
discrepancies between coders, we asked a third-party judge but are approximately same among themselves. This is in line
to view the unresolved photos and assign them to the most with the conventional wisdom that Instagram is mostly used
appropriate categories. for self promoting and social networking with their friends.
We further narrow down this analysis to bolster these find-
4 Analysis ings. Fig. 3 shows the distribution of users in individual
categories w.r.t their engagement (which is referred to the
This section presents analysis of photo content and user types number of photos a user posted). For example, 22% users
on Instagram. Our main objective here is to develop a deeper posted 6-8 photos (coded in Friend category) and 26 % users
understanding on the types of photos and active users on posted 3-5 photos about food (coded in “Food” category). It
Instagram. Specifically, we aim to address the following is interesting to notice that both Pet and Fashion have a very
research questions: high standard deviation of 0.5. In contrast, Selfies and Friends
• RQ1: What kind of photos do people usually post on categories show very low standard deviations (SD = 0.11
Instagram? and SD = 0.124, respectively). Such a difference indicates
• RQ2: How do the users differ based on the type of images that user proportions are more equitably distributed – re-
they post? gardless of their engagement – when it comes to Selfie and
Friends photo categories, whereas posting photos about pets
• RQ3: How are these differences between users’ photo
content related to user’s number of followers ? 4
http://newsfeed.time.com/2013/12/02/this-collar-camera-
lets-your-pet-take-pics-and-post-them-to-instagram/ and
image. http://digiday.com/brands/fashion-brands-instagram/

597
Bin 1 Bin 2 Bin 3 Bin 4 Bin 5
100%
followers than common users in C1? To this end, we perform

Distribution of Bins w.r.t Category


90%
80%
a two-tailed t-test on the follower distributions from different
70% user clusters. We find that all the other types of users agree
60%
50% with the null hypothesis that followers are independent of
40%
30% the user clusters (two-tailed t-test; p–value = 0.171). Since
20%
10%
our analysis does not show any statistical significance over
0%
Friends Food Gadget Captioned Pet Activities Selfies Fashion
the “number of followers – types of users” correlations, we
photo
Photo Categories conclude that the size of a user’s audience (followers) is
independent of the type of the user (characterized in terms of
Figure 3: Proportion of users w.r.t content categories. Bin1 the user’s shared photos on Instagram).
contains 0-2 photos; Bin2 contains 3-5 photos; Bin3 contains
6-8 photos; Bin4 contains 9-11 photos; Bin5 contains ≥ 11 5 Conclusions and Future Work
photos. In this paper, we performed an analysis of photos and users
0.6
Friends Food Gadget Captioned photo Pet Activity Selfies Fashion
on Instagram – the fastest growing social media application.
To our knowledge, this is the first paper that conducts such
Density of category w.r.t cluster

0.5

0.4 analysis on Instagram data. In this paper we have shown


0.3

0.2
how the image data was handled and analyzed to answer
0.1 three fundamental research questions on Instagram. Our
0
C1 C2 C3 C4 C5
analysis shows that there are largely 8 different types of
photo categories on Instagram. Based on the content posted
Figure 4: Clustering users based on the categories of their by users, this analysis derives 5 different types of users
photos. C1 to C5 represent five different user clusters. C1 (or user clusters). We also showed that there is no direct
(n=11, 22%), C2 (n=7, 14%), C3 (n=7, 14%), C4 (n=3, 6%), relationship between the number of followers and the type of
and C5 (n=22, 44%) users characterized in terms of her shared photos, through
statistical significance tests. As a part of our future work, we
want to extend this work by incorporating other features on
Instagram such as user’s bio, hashtags, comments, and social
and fashion have high variance. network. We also plan to analyze sentiments and events
Next, we address RQ2. We perform an analysis to investi- associated with the photos and their associated text (Hu,
gate whether there exist different types of users on Instagram Wang, and Kambhampati 2013).
based on the content they post. To start with, we first cre-
ate an 8-dimensional vector for each user (since we have 8 Acknowledgements This research is supported in part by
categories of photos), where each dimension represents the the ONR grants N00014-13-1-0176, N0014-13-1-0519, ARO
proportion of user’s photos in the corresponding category. grant W911NF-13-1-0023 and a Google Research Grant.
After that, we utilize k-means clustering to generate clusters
of users accordingly. We perform the clustering multiple References
times to determine the best k – the number of clusters, whose Ellison, N. B., et al. 2007. Social network sites: Definition, history,
root mean square error is minimized. and scholarship. JCMC.
As shown in Fig. 4 shows the clustering results that distin- Hochman, N., and Manovich, L. 2013. Zooming into an instagram
guish 5 types of users. Within each cluster, the histograms city: Reading the local through social media. First Monday.
indicate the proportion of each of the 8 content categories. Hu, Y.; Wang, F.; and Kambhampati, S. 2013. Listening to the
The users on Instagam clearly exhibit distinctive character- crowd: automated analysis of events via aggregated twitter senti-
istics in terms of the photo they share. For example, there ment. In IJCAI.
exists “selfies-lovers” (C4) who almost post self-portraits ex- Instagram. 2013. Instagram statistics. http://instagram.com/press.
clusively (C4’s entropy is H(x)=1.4). Similarly, people in C2 Lowe, D. G. 1999. Object recognition from local scale-invariant
post mostly captioned photos whose embedded text mentions features. In CVPR.
about quotes, mottos, poetries or even popular hashtags (C2’s McCune, Z. 2011. Consumer production in social media networks :
entropy H(x)=1.6). On the other hand, there exist common A case study of the instagram iphone app. Dissertation, University
users like C1 where even though they focus (slightly) more of Cambridge.
on posting photos of food, they like to post other categories Naaman, M.; Boase, J.; and Lai, C.-H. 2010. Is it really about me?:
of photos as well. Therefore, C1’s entropy is the highest message content in social awareness streams. In CSCW.
(H(x)=1.96). Also, it is interesting to know that people in Rainie, L.; Brenner, J.; and Purcell, K. 2012. Photos and videos as
C5 (22 users in total) care about their friends as seriously social currency online. Pew Internet & American Life Project.
as caring about themselves, by posting nearly equal num- Silva, T. H.; Melo, P. O.; Almeida, J. M.; Salles, J.; and Loureiro,
ber of photos from both categories (while ignoring the other A. A. 2013. A picture of instagram is worth more than a thousand
categories) (C5’s entropy is H(x)=1.54). words: Workload characterization and application. In DCOSS.
To answer RQ3, we examine if the type of users directly IEEE.
correlates with the users’ number of followers. In other Szeliski, R. 2011. Computer vision: algorithms and applications.
words, do “selfies-lovers” (C4) attract significantly more Springer.

598

You might also like