You Are What Apps You Use: Demographic Prediction Based On User's Apps

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

You Are What Apps You Use:

Demographic Prediction Based on User’s Apps

Eric Malmi Ingmar Weber


Verto Analytics and Aalto University Qatar Computing Research Institute
Espoo, Finland Doha, Qatar
eric.malmi@aalto.fi iweber@qf.org.qa

Abstract The most apparent applications of demographic predic-


arXiv:1603.00059v1 [cs.SI] 29 Feb 2016

tion methods are in marketing. For instance, an app devel-


Understanding the demographics of app users is crucial, for
example, for app developers, who wish to target their adver-
oper might be interested in understanding which user seg-
tisements more effectively. Our work addresses this need ments are underrepresented when designing new ad cam-
by studying the predictability of user demographics based paigns for the app. On the other hand, computational so-
on the list of a user’s apps which is readily available to cial scientists, studying the behavior of people as observed
many app developers. We extend previous work on the prob- through an app, like Twitter, need to understand how rep-
lem on three frontiers: (1) We predict new demographics resentative users of the app are as a sample of the whole
(age, race, and income) and analyze the most informative population.
apps for four demographic attributes included in our analysis.
The most predictable attribute is gender (82.3 % accuracy), Studying the predictability of demographics also points
whereas the hardest to predict is income (60.3 % accuracy). out privacy implications of users allowing apps to access
(2) We compare several dimensionality reduction methods for their list of installed apps. Many users undoubtedly do not
high-dimensional app data, finding out that an unsupervised carefully review the permissions that the apps they install re-
method yields superior results compared to aggregating the quire, and even less, understand the scope of the information
apps at the app category level, but the best results are obtained that can be inferred from the data accessible by the apps.
simply by the raw list of apps. (3) We look into the effect of
the training set size and the number of apps on the predictabil- The only previous studies on demographic prediction
ity and show that both of these factors have a large impact on based on lists of apps, we are aware of, are (Seneviratne
the prediction accuracy. The predictability increases, or in et al. 2014; 2015). In the latter work, only gender predic-
other words, a user’s privacy decreases, the more apps the tion is studied and the dataset comprises of 218 users. We
user has used, but somewhat surprisingly, after 100 apps, the have obtained a dataset of 3 760 users, which allows us to
prediction accuracy starts to decrease. perform more fine-grained analyses, e.g., looking into the
effect of app count on the predictability, and to obtain statis-
Introduction tically more reliable results.
In 2014, 60 % of internet traffic was estimated to come from
mobile devices1 , of which 51 % was attributed to apps.2 As
the importance of mobile apps continues to rise, some have
Material
even declared that “apps are the new Web”3 . Though claims
of the Web’s demise are probably exaggerated, the number Our dataset contains the demographic attributes and a list
of available mobile apps continues to increase and with it, of apps for 3 760 Android users. While (Seneviratne et al.
one would expect, the importance of apps for the wider Web 2015) analyze the lists of installed apps, we are studying the
ecosystem. lists of apps used at least once within a period of one month
At the same time, most academic studies looking at “on- in 2015. Some very rarely used apps are probably missing
line users” still concentrate on website visits with, by com- from the latter list, but nevertheless, the lists can be expected
parison, much fewer attention being given to mobile apps. to be highly correlated.
In this paper, we study the predictability of six demographic The average number of apps per user is 82.6 and the num-
attributes based on the list of used apps. ber of unique apps is 8 840. Apps with less than ten users
This is a pre-print of an article appearing at ICWSM 2016. have been discarded to remove all personally identifiable in-
1
http://smallbiztrends.com/2014/07/ formation. The dataset is from Verto Analytics who have
online-traffic-report-mobile.html provided us a subsample of their media-measurement panel
2
Another 2015 report by Morgan Stanley put this fraction closer from the US. The panelists were recruited with the target of
to 33 % though, with mobile web browsing dominating mobile traf- getting a representative sample of the US population. Each
fic http://tinyurl.com/jstkty7. panelist has installed a meter app which tracks their app us-
3
http://tinyurl.com/3a3ru2o age, and in return, the user is paid for providing the data.
Table 1. Demographic prediction accuracy based on a user’s apps.
Classes have been binarized and balanced. AUC (Web) column period tracking apps are good predictors for gender, whereas
shows the prediction performance based on visited websites from dating apps are more informative about the marital status.
(Goel, Hofman, and Sirer 2012).
How Much Does the Size Matter?
Attribute Classes Accuracy AUC AUC (Web) We also study the gender prediction accuracy as a function
Gender Male vs. Female 82.3 % 0.901 0.84
of the training set size in Figure 1b. An absolute improve-
Age 18–32 vs. 33–100 77.1 % 0.850 0.85∗ ment of more than ten percents can be obtained by increas-
Race White vs. Non-white 72.7 % 0.801 0.83 ing the training set size from 100 users to 2 300 users. Er-
Married Married vs. Single 72.5 % 0.792 NA ror bars show the standard deviations of the accuracies over
Children 0 vs. ≥ 1 children 63.5 % 0.688 NA 100 balanced random subsamples per given number of train
Income ≤ $40K vs. > $40K 60.3 % 0.645 0.75∗
users.
(Seneviratne et al. 2015) report an accuracy of 69.8 % for
a dataset of 174 users, 50 % of which are used for train-
Methods ing (their original dataset is 218 users but they undersam-
When choosing a suitable prediction method, it is impor- ple the majority class to balance the classes). To be able to
tant to consider the following characteristics of the dataset: benchmark against this result, we take 300 balanced random
(1) feature vectors (bags–of–apps) are binary and very subsamples of 174 users and run 2-fold cross-validation for
sparse, (2) the number of features, 8 840, is larger than the these samples. We obtain a comparable average accuracy of
number of datapoints, 3 760, and (3) the dependent variables (68±5)% even though we are not using content-based fea-
(user demographics) can be treated categorical. tures derived from app descriptions nor numeric features as
Logistic regression is a natural choice for this type of a done in (Seneviratne et al. 2015). This suggests that the bag–
problem. We also tested support vector machines with dif- of–apps features alone can provide a competitive perfor-
ferent kernels and random forests, but both the results and mance. Furthermore, these features can be extracted more
the running times were inferior. While logistic regression easily, without having an access to an API for scraping the
can be adapted to multi-class problems, we instead bina- Google Play store.
rize the demographic variables and balance the classes. This A user who has installed a hundred apps probably reveals
allows us to compare the predictability of different demo- more of herself than a user with five apps. Thus it is rel-
graphic variables. evant to ask, how quickly is privacy lost when the number
of apps increases. In Figure 1c, we tackle this question by
Results binning the users according to the number of apps they have
used and showing the prediction accuracy averaged over all
Next, we show results related to three different aspects of de- demographic attributes
mographic prediction, namely, (1) the predictability of dif- p of all users in a bin. The standard
errors are given by p(1 − p)/n. The results show that the
ferent demographics, (2) the effects of the training set size
accuracy increases by about ten percents going from 20 apps
and the number of user’s apps, and (3) the effect of various
to 100 apps but after that, somewhat surprisingly, the accu-
dimensionality reduction methods.
racy starts to decrease. To test whether the decrease is statis-
tically significant, we perform an independent two-sample
How Much is Revealed by Which Apps? t test with the following null hypothesis: “The overall de-
Classification accuracies for six different demographic at- mographic prediction accuracy is not higher for the users
tributes are shown in Table 1. They are computed based on a with 50-150 apps compared to the users with more than 150
ten-fold cross-validation, and the most predictable attribute apps.” We can reject the null hypothesis (p = 0.014), which
is gender, whereas the household income of a user is the shows that using a lot of apps at least once per month actu-
most difficult to tell based on the list of user’s apps. The re- ally increases privacy.
sults are surprisingly similar to the AUC scores reported by
(Goel, Hofman, and Sirer 2012) who employed visited web- Dimensionality Reduction
sites as the features instead of apps. In their work, the at- Due to the high dimensionality of feature vectors (8 840
tributes marked with ‘*’ had a slightly different binarization unique apps) we study three different dimensionality reduc-
threshold compared to us. The receiver operating character- tion approaches.
istic for gender, given in Figure 1a, shows that for half of the The first method, adopted by (Seneviratne et al. 2015),
users, the gender can be predicted with a 97 % accuracy. considers only the apps installed by at least 10 % of the users
By studying the coefficients of the logistic model, we can (125 apps in total). With this approach the gender prediction
analyze the contribution of individual apps to the predic- accuracy drops from 82.3 % to 73.6 %, and even with the
tions. The coefficients with the largest absolute values are dataset of 174 users, the accuracy is decreased from 68 % to
the most predictive ones. In Table 2, we have listed these 65 %. It is important to use the full list of apps since some
apps for four different demographics along with the coeffi- of the apps might be reliable predictors even though they are
cients, the shares and the numbers of users who have used rare (think, for example, of a rare period tracking app).
the app4 . Many of the results are not surprising; for instance,
over demographics as some users have been removed when balanc-
4
Note that for a given app, the number of users, n, might vary ing the classes.
Table 2. The most predictive apps for different demographic attributes along with the logistic regression coefficients (Coef), the fractions of
app users with the demographic attribute (Share), and the numbers of app users (n).

Gender (Male) Age (33–100) Married (Married) Income (≥ $50K)


Coef Share n App name Coef Share n App name Coef Share n App name Coef Share n App name
0.81 85 % 150 ESPN 0.53 80 % 42 Great Clips Online Check-in 0.55 67 % 200 Zillow Real Estate & Rentals 0.58 75 % 141 Fitbit
0.73 80 % 142 Geek - Smarter Shopping 0.48 53 % 1687 Email 0.44 67 % 622 Walmart 0.45 66 % 205 LinkedIn
0.63 78 % 277 Tinder 0.46 58 % 318 New Words With Friends 0.44 60 % 823 Pinterest 0.41 65 % 41 com.ws.dm
0.59 80 % 172 Fallout Shelter 0.44 80 % 65 BINGO Blitz 0.44 74 % 39 Gospel Library 0.37 52 % 141 LG Android QuickMemo+
0.56 86 % 106 WatchESPN 0.43 60 % 380 iHeartRadio - Music & Radio 0.40 59 % 91 USAA Mobile 0.37 58 % 191 Redbox
0.52 72 % 190 Clash of Clans 0.41 54 % 197 Field Agent 0.40 80 % 63 ClassDojo 0.36 72 % 22 Like Parent
0.52 97 % 41 Grindr - Gay chat, meet & date 0.40 55 % 690 Lookout Security & Antivirus 0.38 60 % 123 ESPN 0.34 66 % 63 Peel Smart Remote
0.49 84 % 96 Yahoo Fantasy Football & More 0.40 92 % 41 DoubleUCasino 0.37 82 % 28 Deer Hunter 2014 0.34 61 % 220 Yelp
Gender (Female) Age (18–32) Married (Single) Income (≤ $40K)
-1.03 76 % 736 Pinterest -1.17 78 % 1066 Snapchat -0.89 70 % 810 Snapchat -0.43 66 % 136 Job Search
-0.73 84 % 182 Etsy -0.52 59 % 113 Perk Word Search -0.78 89 % 114 POF Free Dating App -0.43 63 % 97 Security policy updates
-0.61 97 % 79 Period Tracker -0.49 64 % 88 Summoners War -0.73 85 % 219 Tinder -0.37 78 % 23 Solitaire
-0.54 96 % 58 Period Calendar / Tracker -0.46 59 % 98 Clash of Kings -0.66 98 % 69 OkCupid Dating -0.35 67 % 79 Prize Claw 2
-0.50 76 % 346 Cartwheel by Target -0.45 86 % 90 iFunny :) -0.48 72 % 269 Tumblr -0.34 72 % 51 ScreenPay- Get Paid to Unlock
-0.49 66 % 258 Wish - Shopping Made Fun -0.45 81 % 158 GroupMe -0.42 72 % 205 SoundCloud - Music & Audio -0.33 78 % 56 MeetMe
-0.49 74 % 325 Checkout 51 - Grocery Coupons -0.42 80 % 68 GIPHY for Messenger -0.41 65 % 331 Uber -0.33 62 % 77 Foursquare
-0.45 74 % 178 Photo Grid - Collage Maker -0.42 80 % 183 Vine -0.41 89 % 69 MeetMe -0.32 56 % 73 Microsoft Word

ROC (area = 0.90)

Demographic prediction accuracy (%)


1.0 75
82
Gender prediction accuracy (%)

0.8 80
70
True Positive Rate

78
0.6
76
65
0.4 74
72
0.2 60
70
0.0 100 400 700 1000 1300 1600 1900 2200 0 50 100 150 200 250
0.0 0.2 0.4 0.6 0.8 1.0
False Positive Rate Number of train users Number of apps per user
(a) ROC curve for gender prediction. ’Male’ (b) Effect of training set size on gender pre- (c) Effect of user’s app count averaged over all
is treated as the positive class. diction. demographics.

Figure 1. Demographic prediction results.

The second method, also adopted by (Seneviratne et al. The reason is that unlike the SVD components, the original
2015), aggregates the installed apps to category level based bag–of–apps features are very sparse and the logistic regres-
on Google Play categorization. In our dataset, there are apps sion implementation we use5 supports sparse matrices.
from 48 categories. We take the number of apps in each
category as the features, which yields an accuracy of 74.6 %. Related Work
The third method employs the Truncated Singular Value Characterizing the demographics of Twitter users has been
Decomposition (TSVD). (Hu et al. 2007) also employ studied by (Mislove et al. 2011) who infer geography, gen-
TSVD, but instead of using the SVD components directly as der, and race of the users based on self-reported locations
features for predicting the demographics of web users, they and the names of the users. They find large deviances
adopt a recommender system approach. Setting the number from the demographic distribution of the overall popula-
of dimensions to 48, we obtain a gender prediction accuracy tion. (Duggan et al. 2015) provide a more extensive demo-
of 76.9 %. This shows that rather than using the Google graphic comparison of five social media platforms based on
Play categories of the apps, it is better to use the same num- telephone interviews. (Goel, Hofman, and Sirer 2012) look
ber of SVD components learned in an unsupervised man- into the demographics and behavior of web users, whereas
ner. However, the performance is clearly worse compared (Weber and Jaimes 2011) study the same for search-engine
to not using any dimensionality reduction, and even by in- users.
creasing the number of SVD components, we were unable
The demographic prediction based on user’s apps has
to exceed the performance of the logistic regression with all
been previously studied by (Seneviratne et al. 2015) who
features, although with 500 components, the accuracy is al-
predict the users’ gender. In their previous work (Senevi-
ready 81.8 %.
ratne et al. 2014), they also predict language, country, re-
In conclusion, none of the explored dimensionality reduc- lationship status, and whether the user is a parent, but in-
tion methods helped us to improve the gender prediction ac- stead of predicting these attributes directly, they first pre-
curacy. We should also note that although TSVD can help
to reduce the data dimensionality to about one-tenth of the 5
http://scikit-learn.org/stable/
original without losing much in accuracy, this still does not modules/generated/sklearn.linear_model.
necessarily help with the space complexity of the method. LogisticRegression.html
dict which apps are associated with the attributes and then Acknowledgments
check whether a user has apps corresponding to a given de- We would like to thank Verto Analytics for providing the
mographic attribute. We extend these works by studying dataset, Timo Smura for useful discussions, and Janaki
new demographics (age, race, and income), showing that Koirala for conducting some of the initial experiments.
increasing the training dataset size drastically improves the
prediction accuracy, and comparing various dimensionality References
reduction methods for the app data.
Al Zamal, F.; Liu, W.; and Ruths, D. 2012. Homophily and
Others have studied demographic prediction, e.g., based
latent attribute inference: Inferring latent attributes of twitter
on website visits (Hu et al. 2007; Goel, Hofman, and Sirer
users from neighbors. In Proc. ICWSM.
2012), social network features (Brea et al. 2014; Al Zamal,
Liu, and Ruths 2012), call patterns (Sarraute, Blanc, and Brea, J.; Burroni, J.; Minnoni, M.; and Sarraute, C. 2014.
Burroni 2014), Twitter followers (Culotta, Ravi, and Cut- Harnessing mobile phone social network topology to infer
ler 2015) and profiles (Chen et al. 2015), and location data users demographic attributes. In Proc. SNA-KDD.
(Riederer et al. 2015). Related to demographic prediction, Chen, X.; Wang, Y.; Agichtein, E.; and Wang, F. 2015.
(Chittaranjan, Blom, and Gatica-Perez 2013) investigate the A comparative study of demographic attribute inference in
predictability of personality traits based on apps and other twitter. In Proc. ICWSM.
smartphone usage features. There also was an app for pre- Chittaranjan, G.; Blom, J.; and Gatica-Perez, D. 2013. Min-
dicting personality based on installed apps6 . ing large-scale smartphone data for personality studies. Per-
sonal and Ubiquitous Computing 17(3):433–450.
Conclusions and Discussion Culotta, A.; Ravi, N. K.; and Cutler, J. 2015. Predicting the
We studied the demographic prediction problem based on demographics of twitter users from website traffic data. In
the list of used apps. Large differences in the predictabil- Proc. ICWSM.
ity were observed between the six demographic attributes Duggan, M.; Ellison, N. B.; Lampe, C.; Lenhart,
studied in this work, gender being the most predictable and A.; and Mary, M. 2015. Social media update
income being the hardest to predict. The apps contribut- 2014. http://www.pewinternet.org/2015/01/
ing the most to the predictions were identified for each at- 09/social-media-update-2014/.
tribute, revealing some expected patterns: dating apps are Goel, S.; Hofman, J. M.; and Sirer, M. I. 2012. Who does
used, although not exclusively, by single people, and high- what on the web: A large-scale study of browsing behavior.
income people are more likely to use LinkedIn, whereas In Proc. ICWSM.
lower-income people prefer an app called Job Search. Hu, J.; Zeng, H.-J.; Li, H.; Niu, C.; and Chen, Z. 2007.
We also studied various dimensionality reduction meth- Demographic prediction based on user’s browsing behavior.
ods for high-dimensional app data (8 840 unique apps), find- In Proc. WWW.
ing out that SVD yields superior results compared to aggre- Mislove, A.; Lehmann, S.; Ahn, Y.-Y.; Onnela, J.-P.; and
gating the apps on app category level, but the best results are Rosenquist, J. N. 2011. Understanding the demographics of
obtained simply by the raw list of apps. Finally, we looked twitter users. In Proc. ICWSM.
into the effect of the training set size and the number of apps
on the predictability and showed that both of these factors Riederer, C.; Zimmeck, S.; Phanord, C.; Chaintreau, A.; and
can have an impact of over 10 % on the prediction accuracy. Bellovin, S. M. 2015. “I don’t have a photograph, but you
Interestingly, the predictability increases the more apps the can have my footprints” — revealing the demographics of
user has used, but after 100 apps, the prediction accuracy location data. In Proc. ICWSM.
starts to decrease. The accuracy drop from users with 50- Sarraute, C.; Blanc, P.; and Burroni, J. 2014. A study of
150 apps to users with more than 150 apps was found to be age and gender seen through mobile phone usage patterns in
statistically significant. mexico. In Proc. ASONAM, 836–843. IEEE.
Several interesting questions are left for future work. Seneviratne, S.; Seneviratne, A.; Mohapatra, P.; and Ma-
First, we note that demographic attributes are most likely hanti, A. 2014. Predicting user traits from a snapshot of
not independent, and therefore, predicting the attributes si- apps installed on a smartphone. ACM SIGMOBILE Mobile
multaneously, employing multi-label prediction techniques, Computing and Communications Review 18(2):1–8.
could improve the performance. Second, we plan to study Seneviratne, S.; Seneviratne, A.; Mohapatra, P.; and Ma-
the demographics of various popular apps to understand po- hanti, A. 2015. Your installed apps reveal your gender and
tential biases in their userbases compared to the whole popu- more! ACM SIGMOBILE Mobile Computing and Commu-
lation. Third, it would be interesting to study the usage pat- nications Review 18(3):55–61.
terns of different demographic groups (as done previously
in the context of web search (Weber and Castillo 2010)) to Weber, I., and Castillo, C. 2010. The demographics of web
better understand the effects of demographic biases. search. In Proc. SIGIR.
Weber, I., and Jaimes, A. 2011. Who uses web search for
6
http://www.idigitaltimes.com/what-do-your what: and how. In Proc. WSDM, 15–24.
-apps-say-about-you-new-app-iphone-here-tell
-you-410883

You might also like