Skip to content

Commit df9189c

Browse files
committed
recompile twitter post with cache=FALSE to try to avoid image collision
1 parent ae81019 commit df9189c

File tree

5 files changed

+78
-73
lines changed

5 files changed

+78
-73
lines changed

_posts/2016-12-21-r-twitter.md

Lines changed: 78 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -59,31 +59,31 @@ head(gvl_twitter_df)
5959

6060

6161
{% highlight text %}
62-
text
63-
1 RT @TEN_GOP: Hillary voter: set the #Greenville black church on fire and spray painted 'Vote Trump'. \n\nTrump supporters: raised $180K to reÂ…
64-
2 Aviation MX Jobs Sheet Metal-LHM-GREENVILLE https://t.co/0B104F2S1f
65-
3 Why are people from all over the world moving to Greenville, SC? https://t.co/Ck7FLxHiA2 Orion Realty 864-631-2663 https://t.co/SqFbO5rO5M
66-
4 @gibbshundred chillin' in the Wisconsin refrigerator #beerforthought #beer @ Greenville, Wisconsin https://t.co/Vz0fey2pgO
67-
favorited favoriteCount replyToSN created truncated
68-
1 FALSE 0 <NA> 2016-12-22 22:01:13 FALSE
69-
2 FALSE 0 <NA> 2016-12-22 22:00:57 FALSE
70-
3 FALSE 0 <NA> 2016-12-22 22:00:43 FALSE
71-
4 FALSE 0 GibbsHundred 2016-12-22 22:00:34 FALSE
72-
replyToSID id replyToUID
73-
1 <NA> 812055418193186816 <NA>
74-
2 <NA> 812055350438428672 <NA>
75-
3 <NA> 812055292322181120 <NA>
76-
4 <NA> 812055254678245377 2225686310
77-
statusSource
78-
1 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
79-
2 <a href="http://www.hootsuite.com" rel="nofollow">Hootsuite</a>
80-
3 <a href="http://www.facebook.com/twitter" rel="nofollow">Facebook</a>
81-
4 <a href="http://instagram.com" rel="nofollow">Instagram</a>
82-
screenName retweetCount isRetweet retweeted longitude latitude
83-
1 lisalicious12 2593 TRUE FALSE <NA> <NA>
84-
2 AviationMXjobs 0 FALSE FALSE <NA> <NA>
85-
3 kenpujdak 0 FALSE FALSE <NA> <NA>
86-
4 kev_tosh 0 FALSE FALSE <NA> <NA>
62+
text
63+
1 Can you recommend anyone for this #job? Delivery Driver - https://t.co/DRHtAzXQYO #Transportation #Greenville, NC #Hiring #CareerArc
64+
2 @marcelllamariee @ashlynmariieee shoulda went to Greenville I would have went and we would have fucked shit up
65+
3 I guess I'll go to Greenville today <ed><U+00A0><U+00BD><ed><U+00B8><U+0098>
66+
4 RT @sportsguymarv: In a loss to Cook Co., Brittany Davis from Greenville High scrambled a Triple-Double. 52 12 10. || #A1Skills #GHSA #High<U+0085>
67+
favorited favoriteCount replyToSN created truncated
68+
1 FALSE 0 <NA> 2016-12-23 17:55:48 FALSE
69+
2 FALSE 0 marcelllamariee 2016-12-23 17:54:59 FALSE
70+
3 FALSE 0 <NA> 2016-12-23 17:54:24 FALSE
71+
4 FALSE 0 <NA> 2016-12-23 17:53:34 FALSE
72+
replyToSID id replyToUID
73+
1 <NA> 812356044567375872 <NA>
74+
2 812292443509026816 812355837901553664 777670063751135233
75+
3 <NA> 812355693948837888 <NA>
76+
4 <NA> 812355480425263104 <NA>
77+
statusSource
78+
1 <a href="http://www.tweetmyjobs.com" rel="nofollow">TweetMyJOBS</a>
79+
2 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
80+
3 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
81+
4 <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>
82+
screenName retweetCount isRetweet retweeted longitude latitude
83+
1 tmj_NC_transp 0 FALSE FALSE -77.3936674 35.6096532
84+
2 s_danielss16 0 FALSE FALSE <NA> <NA>
85+
3 __GorgeousNiq 0 FALSE FALSE <NA> <NA>
86+
4 QuayBrizzy 14 TRUE FALSE <NA> <NA>
8787
[ reached getOption("max.print") -- omitted 2 rows ]
8888
{% endhighlight %}
8989

@@ -106,23 +106,28 @@ print(gvl_twitter_unique %>% select(text))
106106

107107

108108
{% highlight text %}
109-
text
110-
1 Aviation MX Jobs Sheet Metal-LHM-GREENVILLE https://t.co/0B104F2S1f
111-
2 Why are people from all over the world moving to Greenville, SC? https://t.co/Ck7FLxHiA2 Orion Realty 864-631-2663 https://t.co/SqFbO5rO5M
112-
3 @gibbshundred chillin' in the Wisconsin refrigerator #beerforthought #beer @ Greenville, Wisconsin https://t.co/Vz0fey2pgO
113-
4 Greenville, NC 5:00 PM Temp: 57.8ºF Dew: 41.3ºF Pressure: 1019.4mb Rain: 0.00 in #encwx #ncwx https://t.co/TRgZS3ZJmp
114-
5 Bon Jovi's #ThisHouseIsNotForSaleTour kicks off at @BSWArena Greenville February 8! Tix: https://t.co/LFkpYxEIQ0 https://t.co/PLrIWhfk8J
115-
6 The Greenville mall lines are equivalent to Carowinds lines.
116-
7 #flowers in greenville sc turbo tax for realtors
117-
8 I think them upstate folk accent more country than Sumter we aren't that bad them damn Greenville area folks country as a couch on the porch
118-
9 @TheBuns1194 come out retirement and transfer back to good ol greenville lol <ed><U+00A0><U+00BD><ed><U+00B8><U+0082>
119-
10 Can you recommend anyone for this #job in #Greenville, NC? https://t.co/j25SnMwtTE #SONIC #Hospitality #Hiring #CareerArc
120-
11 We're #hiring! Click to apply: Customer Service- Retail Sales - https://t.co/q7rFALqoT5 #Job #CustomerService #Greenville, NC #Jobs
121-
12 Can you recommend anyone for this #job? Rehab Director - https://t.co/gzxnns476V #Healthcare #Greenville, SC #Hiring
122-
13 Greenville, SC playing cards https://t.co/Yb0Yz40MaY
123-
14 Oh, yeah I'm back in Greenville.
124-
15 Want to work in #Greenville, SC? View our latest opening: https://t.co/EI2orG2G0N #Job #Accounting #Jobs #Hiring #CareerArc
125-
16 Want to work in #Greenville, SC? View our latest opening: https://t.co/MScE6OeHjh #Job #Sales #Jobs #Hiring #CareerArc
109+
text
110+
1 Can you recommend anyone for this #job? Delivery Driver - https://t.co/DRHtAzXQYO #Transportation #Greenville, NC #Hiring #CareerArc
111+
2 @marcelllamariee @ashlynmariieee shoulda went to Greenville I would have went and we would have fucked shit up
112+
3 I guess I'll go to Greenville today <ed><U+00A0><U+00BD><ed><U+00B8><U+0098>
113+
4 @Uber_Support We need one of these in Greenville SC!!! Where are you guys?\nWhen are you opening a Greenlight station here? <ed><U+00A0><U+00BD><ed><U+00B8><U+008A>
114+
5 Greenville Police Beat 12-23-16 https://t.co/Lt4rbams5z
115+
6 #market risk solutions greenville cancer center
116+
7 Can you recommend anyone for this #job in #Greenville, SC? https://t.co/Xt7hznlGgA #Purchasing #Hiring #CareerArc
117+
8 Check out my #listing in #Liberty #SC #realestate #realtor https://t.co/CuzFTjp5ge https://t.co/sD1VypKlTu
118+
9 Join the Aerotek team! See our latest #job opening here: https://t.co/gZ6jEIqOyN #Manufacturing #Greenville, SC #Hiring #CareerArc
119+
10 @Sie_SoSweet I'm driving to greenville and I don't feel like stopping <ed><U+00A0><U+00BD><ed><U+00B8><U+0082>
120+
11 Here's a little behind the scenes peek of 2017's 1st Off the Grid Greenville, A Visual Guide to Local Favorites... https://t.co/M33IWBDAZe
121+
12 #greenville electric company best western plus muskoka inn huntsville on canada
122+
13 Registered Nurse - $5,000 Sign On Bonus - Vidant Home Hospice - 925571 in Greenville, NC https://t.co/xONI21XSDq #job
123+
14 Apply to this job: Population Health Analyst Job - 926645 in Greenville, NC https://t.co/6J5IeqNak8 #job
124+
15 Greenville may be the TU tonight . <ed><U+00A0><U+00BE><ed><U+00B4><U+0094><ed><U+00A0><U+00BE><ed><U+00B4><U+0094>
125+
16 I had the privilege of meeting Lottie Gibson back when James Akers Jr ran for Greenville County<U+0085> https://t.co/UzsRoWMNdr
126+
17 Done been to Greenville 3 times this week<ed><U+00A0><U+00BD><ed><U+00B8><U+00BC><ed><U+00A0><U+00BD><ed><U+00B8><U+0082>
127+
18 Greenville <ed><U+00A0><U+00BD><ed><U+00B3><U+008D>
128+
19 What nail place in Greenville is the best bc every one I go to effs up my nails
129+
20 I'm at @Walmart Supercenter in Greenville, TX https://t.co/uYSIZPA4VE
130+
21 Goodbye Greenville, take off in 20, New York in an hour! <ed><U+00A0><U+00BD><ed><U+00BB><U+00A9> #TakeOff #NewYorkCity #BigAppleChristmas https://t.co/upN8kaGONK
126131
{% endhighlight %}
127132

128133
The thing to notice here is that there are several different Greenvilles, so this makes analysis of the local area pretty hard. Many of the tweets can be about Greenville, NC or SC. In this particular dataset, there was even a Greenville Road in California (where there was a car fire). Rather than play a filtering game, it may be better to apply some knowledge specific to the area. For instance, local tweets will often be tagged with `#yeahThatgreenville`. So we will search again for the `#yeahthatgreenville` hashtag (and add a few more tweets as well). This time, we'll keep retweets:
@@ -186,13 +191,13 @@ head(tweet_words)
186191

187192

188193
{% highlight text %}
189-
id word
190-
1 812052233693097990 rt
191-
1.1 812052233693097990 greenville_sc
192-
1.2 812052233693097990 not
193-
1.3 812052233693097990 a
194-
1.4 812052233693097990 bad
195-
1.5 812052233693097990 way
194+
id word
195+
1 812352595159355392 the
196+
1.1 812352595159355392 wall
197+
1.2 812352595159355392 gods
198+
1.3 812352595159355392 mountains
199+
1.4 812352595159355392 amp
200+
1.5 812352595159355392 cities
196201
{% endhighlight %}
197202

198203
I used the `select` function from `dplyr` to keep only the `id` and `text` fields. The `unnest_tokens()` functions creates a long dataset with a single word replacing the text. All the other fields remain unchanged. We can now easily create a bar chart of the words used the most:
@@ -227,7 +232,7 @@ head(stop_words)
227232

228233

229234
{% highlight text %}
230-
# A tibble: 6 × 2
235+
# A tibble: 6 × 2
231236
word lexicon
232237
<chr> <chr>
233238
1 a SMART
@@ -258,13 +263,13 @@ head(tweet_words_interesting)
258263

259264

260265
{% highlight text %}
261-
id word
262-
1 811749098273538048 stretch
263-
2 811749098273538048 volunteer
264-
3 811749098273538048 strengthen
265-
4 811749098273538048 community
266-
5 811567485363294208 community
267-
6 811749098273538048 becausey
266+
id word
267+
1 812352595159355392 wall
268+
2 812352159916367873 wall
269+
3 812351889681633280 wall
270+
4 812352595159355392 gods
271+
5 812352159916367873 gods
272+
6 812351889681633280 gods
268273
{% endhighlight %}
269274

270275
The `anti_join` function is probably not familiar to most data scientists or statisticians. It is the opposite of a merge in a sense. Basically, the command above merges the `tweet_words` and `my_stop_words` data frames, and then _removes_ the rows that came from the `my_stop_words` dataset, leaving only the rows in `tweet_words` (the `id` and `word`) that does not match with something from `my_stop_words`. This is desirable because our `my_stop_words` dataset contains words we _do not_ want to analyze.
@@ -297,7 +302,7 @@ head(bing_lex)
297302

298303

299304
{% highlight text %}
300-
# A tibble: 6 × 2
305+
# A tibble: 6 × 2
301306
word sentiment
302307
<chr> <chr>
303308
1 2-faced negative
@@ -320,13 +325,13 @@ head(gvl_sentiment)
320325

321326

322327
{% highlight text %}
323-
id word sentiment
324-
1 811749098273538048 stretch <NA>
325-
2 811749098273538048 volunteer <NA>
326-
3 811749098273538048 strengthen <NA>
327-
4 811749098273538048 community <NA>
328-
5 811567485363294208 community <NA>
329-
6 811749098273538048 becausey <NA>
328+
id word sentiment
329+
1 812352595159355392 wall <NA>
330+
2 812352159916367873 wall <NA>
331+
3 812351889681633280 wall <NA>
332+
4 812352595159355392 gods <NA>
333+
5 812352159916367873 gods <NA>
334+
6 812351889681633280 gods <NA>
330335
{% endhighlight %}
331336

332337
Once you get to this point, sentiment analysis can start fairly easily:
@@ -339,11 +344,11 @@ gvl_sentiment %>% filter(!is.na(sentiment)) %>% group_by(sentiment) %>% summaris
339344

340345

341346
{% highlight text %}
342-
# A tibble: 2 × 2
347+
# A tibble: 2 × 2
343348
sentiment n
344349
<chr> <int>
345-
1 negative 14
346-
2 positive 78
350+
1 negative 19
351+
2 positive 83
347352
{% endhighlight %}
348353

349354
There are many more positive words than negative words, so the mood tilts positive in our crude analysis. We can also group by tweet, and see whether there more more positive or negative tweets:
@@ -359,15 +364,15 @@ gvl_sent_anly2
359364

360365

361366
{% highlight text %}
362-
# A tibble: 3 × 2
367+
# A tibble: 3 × 2
363368
sentiment n
364369
<chr> <dbl>
365-
1 negative 1.000000
366-
2 positive 1.344828
367-
3 <NA> 6.733668
370+
1 negative 1.055556
371+
2 positive 1.338710
372+
3 <NA> 6.040404
368373
{% endhighlight %}
369374

370-
On average, there are 1.3448276 positive words per tweet and 1 negative words per tweet, if you accept the assumptions of the above analysis.
375+
On average, there are 1.3387097 positive words per tweet and 1.0555556 negative words per tweet, if you accept the assumptions of the above analysis.
371376

372377
There is, of course, a lot more that can be done, but this will get you started. For some more sophisticated ideas you can check [Julia Silge's analysis of Reddit data](http://juliasilge.com/blog/Reddit-Responds/), for instance. Another kind of analysis looking at sentiment and emotional content can be found [here](https://mran.microsoft.com/posts/twitter.html) (with the caveat that it uses the predecessor to `dplyr` and thus runs somewhat less efficiently). Finally, it would probably be useful to supplement the above sentiment data frames with situation-specific sentiment analysis, such as making `goallllllll` in the above a positive word.
373378

figures/unnamed-chunk-12-1.png

2.97 KB
Loading

figures/unnamed-chunk-5-1.png

-3.37 KB
Loading

figures/unnamed-chunk-7-1.png

263 Bytes
Loading

figures/unnamed-chunk-8-1.png

11 Bytes
Loading

0 commit comments

Comments
 (0)