Wikipedia:Wikipedia Signpost/2019-09-30/Recent research: Difference between revisions
italics |
m Protected "Wikipedia:Wikipedia Signpost/2019-09-30/Recent research": old newspaper articles don't need to be continually updated, the only real edits expected here are from bots/scripts, and vandalism is extremely hard to monitor ([Edit=Require autoconfirmed or confirmed access] (indefinite) [Move=Require autoconfirmed or confirmed access] (indefinite)) |
||
(32 intermediate revisions by 13 users not shown) | |||
Line 1: | Line 1: | ||
⚫ | |||
<noinclude>{{Signpost draft |
|||
⚫ | |||
|blurb = And other recent research publications |
|||
|Ready-for-copyedit = Yes |
|||
|Copyedit-done = No |
|||
|Final-approval = No <!--Should only be used by EiC --> |
|||
}} |
|||
{{Wikipedia:Wikipedia Signpost/Templates/RSS description |
|||
|1=<!-- LEAVE BLANK to use "<title>: <blurb>" (using title and blurb from above), or replace with a custom description for the RSS feed --> |
|||
}}{{Wikipedia:Wikipedia Signpost/Templates/Signpost-header|||}}</noinclude> |
|||
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-article-header-v2 |
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-article-header-v2 |
||
|{{{1|Wikipedia's role in assessing credibility of news sources; using wikis against procrastination; OpenSym 2019 report}}} |
|{{{1|Wikipedia's role in assessing credibility of news sources; using wikis against procrastination; OpenSym 2019 report}}}|By [[User:Groceryheist|Nate TeBlunthuis]], [[User:Isaac_(WMF)|Isaac Johnson]], and [[User:HaeB|Tilman Bayer]] |
||
|By [[User:Groceryheist|Nate TeBlunthuis]], [[User:Isaac_(WMF)|Isaac Johnson]], and [[User:HaeB|Tilman Bayer]] |
|||
}} |
}} |
||
Line 23: | Line 13: | ||
=== "Reducing Procrastination While Improving Performance: A Wiki-powered Experiment with Students" === |
=== "Reducing Procrastination While Improving Performance: A Wiki-powered Experiment with Students" === |
||
:''Reviewed by [[User:Groceryheist|Nate TeBlunthuis]]'' |
:''Reviewed by [[User:Groceryheist|Nate TeBlunthuis]]'' |
||
In this research,<ref>{{Cite conference| publisher = ACM| doi = 10.1145/3306446.3340813| isbn = 9781450363198| |
In this research,<ref>{{Cite conference| publisher = ACM| doi = 10.1145/3306446.3340813| isbn = 9781450363198| page = 10:1-10 | last1 = Balderas| first1 = Antonio| last2 = Capiluppi| first2 = Andrea| last3 = Palomo-Duarte| first3 = Manuel| last4 = Malizia| first4 = Alessio| last5 = Dodero| first5 = Juan Manuel| title = Reducing Procrastination While Improving Performance: A Wiki-powered Experiment with Students|book-title= Proceedings of the 15th International Symposium on Open Collaboration| location = New York, NY, USA| series = OpenSym '19| date = 2019 | url = https://opensym.org/wp-content/uploads/2019/08/os19-paper-A10-balderas.pdf}}</ref> presented at last month's [[OpenSym]] conference, Balderas and colleagues experimented with a [[MediaWiki]] wiki for British university students in a computer science course to turn in and collaborate on written assignments. They were interested in developing a pedagogical intervention to combat [[procrastination]] by students, which they describe as an "ethically questionable" behavior. |
||
They created a new version of a six week software engineering course with the goal of reducing procrastination and evaluated as an experiment comparing the time management and grade performance of students between two consecutive years of the course. In the first year, the students were to turn in their assignments all at once at the end of the course and were free to use whatever software they wished. In the second year, the students were trained in |
They created a new version of a six week software engineering course with the goal of reducing procrastination and evaluated it as an experiment comparing the time management and grade performance of students between two consecutive years of the course. In the first year, the students were to turn in their assignments all at once at the end of the course and were free to use whatever software they wished. In the second year, the students were trained in MediaWiki, used it to complete weekly assignments, and would be penalized for not finishing the work on time. This second group of students procrastinated less (in the first year 16% of students handed in late work, compared to only 4% in the second year) and achieved better grades (in the first year many more students received a 'B' than an 'A', but the opposite was true in the second year). |
||
I think this study achieved its goal of demonstrating that |
I think this study achieved its goal of demonstrating that MediaWiki may be a useful pedagogical tool because edit history data can make it easy for instructors to monitor when students worked on their assignments. The course instructors used an open-source software called [https://app.assembla.com/spaces/wikiassignmentmonitor/subversion/source "WikiAssignmentMonitor"] that extracts data from the MediaWiki database and generates a spreadsheet showing how much progress a student made on each assignment every week or hour. The researchers used this tool to track whether students completed work on time. |
||
That said, the study also suffers limitations in its experimental design. Mainly, several other things changed between the two courses other than the use of a |
That said, the study also suffers limitations in its experimental design. Mainly, several other things changed between the two courses other than the use of a wiki or the schedule of deadlines. In particular, the assignments themselves were not exactly the same from one year to the next. However, they saw similar grade improvements for every assignment, even the ones that didn't change. Also, different software version control systems were used, but it seems more plausible that changing to MediaWiki and weekly deadlines explains their findings compared to this unrelated change. Importantly, it isn't possible from their study to say how much of the improvement should be attributed to the use of MediaWiki or to changing the schedule from 1 final deadline to 6 weekly deadlines. |
||
Despite these limitations, I thought it was interesting to see an educational application of |
Despite these limitations, I thought it was interesting to see an educational application of wikis that didn't rely heavily on collaboration, but instead on other affordances of the MediaWiki software that can be useful to instructors. They didn't have to require students turn in their work each week, they could just look at the WikiAssignmentMonitor report to check student's progress. Moreover, they could see students make progress on assignments over time at levels of granularity not normally available to course instructors. For instance, they could see whether a given student completed an assignment in one session instead of over many sessions. This paper made me curious about how this kind of monitoring would influence student behavior even if it wasn't a factor in their grades. |
||
''(Compare also the [[Wiki Education Foundation]]'s dashboard: https://dashboard.wikiedu.org/ )'' |
''(Compare also the [[Wiki Education Foundation]]'s dashboard: https://dashboard.wikiedu.org/ )'' |
||
Line 37: | Line 27: | ||
=== The Importance of Wikipedia in Assessing News Source Credibility === |
=== The Importance of Wikipedia in Assessing News Source Credibility === |
||
:''Reviewed by [[User:Isaac_(WMF)|Isaac Johnson]]'' |
:''Reviewed by [[User:Isaac_(WMF)|Isaac Johnson]]'' |
||
"How the Interplay of Google and Wikipedia Affects Perceptions of Online News Sources" by Annabel Rothschild, Emma Lurie, and Eni Mustafaraj of [[Wellesley_College|Wellesley College]], published in the 2019 [[Computational_journalism#Computational_Journalism_conferences|Computation and Journalism Symposium]], focuses on how readers determine the quality of a given news source based on information provided through Google's rich search results.<ref>{{cite journal |last1=Rothschild |first1=Annabel |last2=Lurie |first2=Emma |last3=Mustafaraj |first3=Eni |title=How the Interplay of Google and Wikipedia Affects Perceptions of Online News Sources |journal=Computation + Journalism Symposium |date=2019 |url=https://emmalurie.github.io/docs/cplusj2019-interplay.pdf | |
"How the Interplay of Google and Wikipedia Affects Perceptions of Online News Sources" by Annabel Rothschild, Emma Lurie, and Eni Mustafaraj of [[Wellesley_College|Wellesley College]], published in the 2019 [[Computational_journalism#Computational_Journalism_conferences|Computation and Journalism Symposium]], focuses on how readers determine the quality of a given news source based on information provided through Google's rich search results.<ref>{{cite journal |last1=Rothschild |first1=Annabel |last2=Lurie |first2=Emma |last3=Mustafaraj |first3=Eni |title=How the Interplay of Google and Wikipedia Affects Perceptions of Online News Sources |journal=Computation + Journalism Symposium |date=2019 |url=https://emmalurie.github.io/docs/cplusj2019-interplay.pdf |access-date=29 September 2019}}</ref> This is a particularly timely study as this summer it was reported that, for the first time, over half of searches on Google are not resulting in clicks to links<ref group=supp>{{cite web |last1=Fishkin |first1=Rand |title=Less than Half of Google Searches Now Result in a Click |url=https://sparktoro.com/blog/less-than-half-of-google-searches-now-result-in-a-click/ |website=SparkToro |access-date=29 September 2019 |date=13 August 2019}}</ref>{{ndash}}i.e. Google Search has become progressively more efficient at satisfying the needs of their users without the user ever visiting the sites providing the content that is surfaced via Google. This means that Google Search increasingly sets the context in which readers evaluate the quality of information they read. |
||
Rothschild et al. conduct two studies. The first involved interviews with 30 undergraduate students as they assessed the credibility of three news sources: ''[[The Durango Herald]]'', ''[[The Tennessean]]'', and ''The Christian Times''. Many of the participants indicated that they used Google as the primary medium through which they evaluated a source. As a result, in the second study, Rothschild et al. recruited 66 individuals through [[Amazon Mechanical Turk]] to evaluate the credibility of two news sources ([ProPublica]] and [[Newsmax]]) through the [[Knowledge_Graph|Knowledge Panel]] alone. Both studies indicated that information surfaced by Google from Wikipedia about the news sources figured heavily in readers' assessments. |
Rothschild et al. conduct two studies. The first involved interviews with 30 undergraduate students as they assessed the credibility of three news sources: ''[[The Durango Herald]]'', ''[[The Tennessean]]'', and ''The Christian Times''. Many of the participants indicated that they used Google as the primary medium through which they evaluated a source. As a result, in the second study, Rothschild et al. recruited 66 individuals through [[Amazon Mechanical Turk]] to evaluate the credibility of two news sources ([[ProPublica]] and [[Newsmax]]) through the [[Knowledge_Graph|Knowledge Panel]] alone. Both studies indicated that information surfaced by Google from Wikipedia about the news sources figured heavily in readers' assessments. |
||
This work highlights the incredible value that Wikipedia the provides to the world and tech platforms, in particular for helping readers assess the credibility of news sources. Readers use Wikipedia, as surfaced via Google, for this purpose, but sites like Youtube and Facebook also surface Wikipedia links about a source as a means of supporting fact-checking.<ref group=supp>{{cite |
This work highlights the incredible value that Wikipedia the provides to the world and tech platforms, in particular for helping readers assess the credibility of news sources. Readers use Wikipedia, as surfaced via Google, for this purpose, but sites like Youtube and Facebook also surface Wikipedia links about a source as a means of supporting fact-checking.<ref group=supp>{{cite magazine |last1=Matsakis |first1=Louise |title=Youtube, Facebook, and Google Can't Expect Wikipedia to Cure the Internet |url=https://www.wired.com/story/youtube-wikipedia-content-moderation-internet/ |magazine=Wired |access-date=29 September 2019 |language=en |date=16 March 2018}}</ref> This work also points towards particularly important statements on Wikidata for assessing the quality of a source -- namely awards that a publication has earned, social media presence, geographic context, and establishment date. |
||
The paper closes by noting that despite the value that Wikipedia, as surfaced by Google, provides to readers, many news sources do not yet have a knowledge panel appearing when you search for them. It mentions the [[Wikipedia:WikiProject_Newspapers|Newspapers on Wikipedia project]] (which had been inspired by early results from their research) as a valuable initiative for addressing this gap with many potential benefits beyond supporting credibility assessments within Google Search. |
The paper closes by noting that despite the value that Wikipedia, as surfaced by Google, provides to readers, many news sources do not yet have a knowledge panel appearing when you search for them. It mentions the [[Wikipedia:WikiProject_Newspapers|Newspapers on Wikipedia project]] (which had been inspired by early results from their research) as a valuable initiative for addressing this gap with many potential benefits beyond supporting credibility assessments within Google Search. |
||
Line 48: | Line 38: | ||
:''Reviewed by [[User:Isaac_(WMF)|Isaac Johnson]]'' |
:''Reviewed by [[User:Isaac_(WMF)|Isaac Johnson]]'' |
||
"Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics" by Włodzimierz Lewoniewski, Krzysztof Węcel, and Witold Abramowicz of [[Poznań_University_of_Economics_and_Business|Poznan ́University]], published by [[MDPI|MDPI]] Computers,<ref>{{cite journal |last1=Lewoniewski |first1=Włodzimierz |last2=Węcel |first2=Krzysztof |last3=Abramowicz |first3=Witold |title=Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics |journal=Computers |date=14 August 2019 |volume=8 |issue=3 |pages=60 |doi=10.3390/computers8030060 | |
"Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics" by Włodzimierz Lewoniewski, Krzysztof Węcel, and Witold Abramowicz of [[Poznań_University_of_Economics_and_Business|Poznan ́University]], published by [[MDPI|MDPI]] ''Computers'',<ref>{{cite journal |last1=Lewoniewski |first1=Włodzimierz |last2=Węcel |first2=Krzysztof |last3=Abramowicz |first3=Witold |title=Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics |journal=Computers |date=14 August 2019 |volume=8 |issue=3 |pages=60 |doi=10.3390/computers8030060 |language=en |doi-access=free }}</ref> examines the challenge of aggregating Wikipedia page views according to topic and comparing the quality and popularity of these topics across languages. |
||
[[File:Figure 4 from Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics.png|650px|center|alt=Heatmap showing a distribution of content by topic in 55 Wikipedia editions.|thumb|Figure 4 from the paper: share of articles per category in 55 language versions of Wikipedia.]] |
|||
⚫ | From a methodological standpoint, comprehensively labeling Wikipedia articles according to a relatively small number of topics is quite challenging. This problem has inspired many approaches and taxonomies (e.g., [[mw:Topic:Ub3g57qa9gflrlrc|ORES drafttopic]], [[mw:Wikimedia_Research/Showcase#March_2018|Using Wikipedia categories for research]], [[wikitech:Wikidata_Concepts_Monitor#WDCM_Taxonomy|Wikidata Concepts Monitor]]). This work explores two approaches: 1) automatic mapping of the existing category network on Wikipedia to [[en:Category:Main_topic_classifications|high-level categories as identified by English Wikipedia]], and, 2) topic as determined by a mixture of [[DBPedia|DBPedia]] and [[wikidata:Property:P31|Wikidata]] classes. |
||
⚫ | From a methodological standpoint, comprehensively labeling Wikipedia articles according to a relatively small number of topics is quite challenging. This problem has inspired many approaches and taxonomies (e.g., [[mw:Topic:Ub3g57qa9gflrlrc|ORES drafttopic]], [[mw:Wikimedia_Research/Showcase#March_2018|Using Wikipedia categories for research]], [[wikitech:Wikidata_Concepts_Monitor#WDCM_Taxonomy|Wikidata Concepts Monitor]]). This work explores two approaches: 1) automatic mapping of the existing category network on Wikipedia to [[en:Category:Main_topic_classifications|high-level categories as identified by English Wikipedia]], and, 2) topic as determined by a mixture of [[DBPedia|DBPedia]] and [[wikidata:Property:P31|Wikidata]] classes. Figure 4 from the paper (shown here) shows the results for proportion of articles in each topic (using the category network method). |
||
There are a lot of data and visualizations in this paper that I would encourage the reader to view for themselves. The authors also expose their results through the website [[Wikirank.net|WikiRank]]. |
There are a lot of data and visualizations in this paper that I would encourage the reader to view for themselves. The authors also expose their results through the website [[Wikirank.net|WikiRank]]. |
||
Line 56: | Line 48: | ||
''(See also [https://meta.wikimedia.org/wiki/Special:Search?search=lewoniewski&prefix=Research%3ANewsletter%2F20&fulltext=Search+past+issues&fulltext=Search&ns0=1&ns12=1&ns200=1&ns202=1 earlier coverage] of related publications by some of the same authors)'' |
''(See also [https://meta.wikimedia.org/wiki/Special:Search?search=lewoniewski&prefix=Research%3ANewsletter%2F20&fulltext=Search+past+issues&fulltext=Search&ns0=1&ns12=1&ns200=1&ns202=1 earlier coverage] of related publications by some of the same authors)'' |
||
[[File:View over Skövde from Billingen hill.jpg|thumb|500px|center|The city of Skövde (Sweden), location of this year's OpenSym conference]] |
|||
===OpenSym 2019=== |
===OpenSym 2019=== |
||
:''Report by [[User:HaeB|Tilman Bayer]]'' |
:''Report by [[User:HaeB|Tilman Bayer]]'' |
||
Line 61: | Line 55: | ||
====First literature survey of Wikidata quality research==== |
====First literature survey of Wikidata quality research==== |
||
Among the [https://twitter.com/WikiResearch/status/1164529820337881089 takeaways] presented from this overview of 28 papers which covered this area since Wikidata's launch in 2012 (some comparing it with other structured data projects such as [[DBpedia]] or [[YAGO (database)|YAGO]]):<ref>{{Cite conference| publisher = ACM| doi = 10.1145/3306446.3340822| isbn = 9781450363198| pages = 17–1–17:11| last1 = Piscopo| first1 = Alessandro| last2 = Simperl| first2 = Elena| title = What We Talk About when We Talk About Wikidata Quality: A Literature Survey| |
Among the [https://twitter.com/WikiResearch/status/1164529820337881089 takeaways] presented from this overview of 28 papers which covered this area since Wikidata's launch in 2012 (some comparing it with other structured data projects such as [[DBpedia]] or [[YAGO (database)|YAGO]]):<ref>{{Cite conference| publisher = ACM| doi = 10.1145/3306446.3340822| isbn = 9781450363198| pages = 17–1–17:11| last1 = Piscopo| first1 = Alessandro| last2 = Simperl| first2 = Elena| title = What We Talk About when We Talk About Wikidata Quality: A Literature Survey|book-title= Proceedings of the 15th International Symposium on Open Collaboration| location = New York, NY, USA| series = OpenSym '19| date = 2019|url=https://opensym.org/wp-content/uploads/2019/08/os19-paper-A17-piscopo.pdf}}</ref> |
||
* Many papers have examined the |
* Many papers have examined the completeness of Wikidata's data, but few its accuracy. |
||
* The high availability (server uptime) of wikidata.org is a relevant quality aspect for many users |
* The high availability (server uptime) of wikidata.org is a relevant quality aspect for many users. |
||
* "Wikidata outperforms similar projects in many dimensions" |
* "Wikidata outperforms similar projects in many dimensions." |
||
===="Article Quality Classification on Wikipedia: Introducing Document Embeddings and Content Features"==== |
===="Article Quality Classification on Wikipedia: Introducing Document Embeddings and Content Features"==== |
||
From the abstract:<ref>{{Cite conference| publisher = ACM| doi = 10.1145/3306446.3340831| isbn = 9781450363198| pages = 13–1–13:8| last1 = Schmidt| first1 = Manuel| last2 = Zangerle| first2 = Eva| title = Article Quality Classification on Wikipedia: Introducing Document Embeddings and Content Features| |
From the abstract:<ref>{{Cite conference| publisher = ACM| doi = 10.1145/3306446.3340831| isbn = 9781450363198| pages = 13–1–13:8| last1 = Schmidt| first1 = Manuel| last2 = Zangerle| first2 = Eva| title = Article Quality Classification on Wikipedia: Introducing Document Embeddings and Content Features|book-title= Proceedings of the 15th International Symposium on Open Collaboration| location = New York, NY, USA| series = OpenSym '19| date = 2019|url=https://opensym.org/wp-content/uploads/2019/08/os19-paper-A13-schmidt.pdf |
||
}}</ref>: |
}}</ref>: |
||
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">"... we extend [the] previous line of research on [automated] article quality classification by extending the set of features with novel content and edit features (e.g., document embeddings of articles). We propose a classification approach utilizing gradient boosted trees based on this novel, extended set of features extracted from Wikipedia articles. Based on an established dataset containing Wikipedia articles and quality classes, we show that our approach is able to substantially outperform previous approaches (also including recent deep learning methods [cf. previous coverage: '[[m:Research:Newsletter/2017/May#Improved_article_quality_predictions_with_deep_learning|Improved article quality prediction with deep learning]]']).</blockquote> |
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">"... we extend [the] previous line of research on [automated] article quality classification by extending the set of features with novel content and edit features (e.g., document embeddings of articles). We propose a classification approach utilizing gradient boosted trees based on this novel, extended set of features extracted from Wikipedia articles. Based on an established dataset containing Wikipedia articles and quality classes, we show that our approach is able to substantially outperform previous approaches (also including recent deep learning methods [cf. previous coverage: '[[m:Research:Newsletter/2017/May#Improved_article_quality_predictions_with_deep_learning|Improved article quality prediction with deep learning]]'])."</blockquote> |
||
"Document embeddings" refers to mapping each article to a vector in a vector space of "500 latent dimensions" (analogous to [[word embedding]]s), resulting in "a numeric, latent representation of the document content, its context, and semantics. We hypothesize that adding this comprehensive article representation can be leveraged for getting a better representation of the contents of an article and hence, its quality." |
"Document embeddings" refers to mapping each article to a vector in a vector space of "500 latent dimensions" (analogous to [[word embedding]]s), resulting in "a numeric, latent representation of the document content, its context, and semantics. We hypothesize that adding this comprehensive article representation can be leveraged for getting a better representation of the contents of an article and hence, its quality." |
||
Line 75: | Line 69: | ||
The edit-related features include the timestamps of the article's last 100 edits, and "the vector differences between the [[tf/idf]] vectors of the last 100 versions of the article." |
The edit-related features include the timestamps of the article's last 100 edits, and "the vector differences between the [[tf/idf]] vectors of the last 100 versions of the article." |
||
(See also tweets from the presentation: [https://twitter.com/frimelle/status/1164475513924182016], [https://twitter.com/WikiResearch/status/1164473626332225536]) |
''(See also tweets from the presentation: [https://twitter.com/frimelle/status/1164475513924182016], [https://twitter.com/WikiResearch/status/1164473626332225536])'' |
||
===="When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata"==== |
===="When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata"==== |
||
This paper<ref>{{Cite conference| publisher = ACM| doi = 10.1145/3306446.3340826| isbn = 9781450363198| pages = 16–1–16:9| last1 = Kaffee| first1 = Lucie-Aimée| last2 = Endris| first2 = Kemele M.| last3 = Simperl| first3 = Elena| title = When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata| |
This paper<ref>{{Cite conference| publisher = ACM| doi = 10.1145/3306446.3340826| isbn = 9781450363198| pages = 16–1–16:9| last1 = Kaffee| first1 = Lucie-Aimée| last2 = Endris| first2 = Kemele M.| last3 = Simperl| first3 = Elena| title = When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata|book-title= Proceedings of the 15th International Symposium on Open Collaboration| location = New York, NY, USA| series = OpenSym '19| date = 2019|url=https://opensym.org/wp-content/uploads/2019/08/os19-paper-A16-kaffee.pdf}}</ref> examines the work on [[d:Help:Label|labels in Wikidata]] (i.e. the most common name of an item in a particular language, typically but not always coinciding with the title of the corresponding Wikipedia article in that language, if it exists). |
||
From the conclusions:<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;"> "We identify three types of editors: registered editors, bots, and anonymous editors. Bots contributed to the most number of labels for specific languages while registered users tend to contribute more to multilingual labels, i.e., translation. The hybrid approach of Wikidata, of humans and bots editing the knowledge graph alongside, supports the collaborative work towards the completion of the knowledge graph."</blockquote> |
From the conclusions:<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;"> "We identify three types of editors: registered editors, bots, and anonymous editors. Bots contributed to the most number of labels for specific languages while registered users tend to contribute more to multilingual labels, i.e., translation. The hybrid approach of Wikidata, of humans and bots editing the knowledge graph alongside, supports the collaborative work towards the completion of the knowledge graph."</blockquote> |
||
===="Approving automation: analyzing requests for permissions of bots in Wikidata"==== |
===="Approving automation: analyzing requests for permissions of bots in Wikidata"==== |
||
From the paper's conclusions:<ref>{{Cite conference| publisher = ACM| doi = 10.1145/3306446.3340833| isbn = 9781450363198| conference = Proceedings of the 15th International Symposium on Open Collaboration| pages = 15| last1 = Farda-Sarbas| first1 = Mariam| last2 = Zhu| first2 = Hong| last3 = Nest| first3 = Marisa Frizzi| last4 = Müller-Birn| first4 = Claudia| title = Approving automation: analyzing requests for permissions of bots in wikidata| date = 2019-08-20|url=https://opensym.org/wp-content/uploads/2019/08/os19-paper-A15-farda-sarbas.pdf}}</ref> |
|||
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">"We studied the formal process of requesting bot rights in Wikidata [...] The RfPs [ [[:d:Wikidata:Requests for permissions/Bot|requests for permission]] ] were studied mainly from two perspectives: 1) What information is provided during the time the bot rights are requested and 2) how the community handles these requests. We found that the main tasks requested are adding claims, statements, terms and sitelinks into Wikidata, as well as the main source of bot edits have their roots in Wikipedia. This contrasts with Wikipedia where bots are performing mostly maintenance tasks. Our findings also show that most of the RfPs were approved and a small number of them were unsuccessful mainly because operators had withdrawn or there was no activity from the operators."</blockquote> |
|||
===="Dwelling on Wikipedia: Investigating Time Spent by Global Encyclopedia Readers"==== |
===="Dwelling on Wikipedia: Investigating Time Spent by Global Encyclopedia Readers"==== |
||
From the abstract |
From the abstract and paper<ref>{{Cite conference| publisher = ACM| doi = 10.1145/3306446.3340829| isbn = 9781450363198| pages = 14–1–14:14| last1 = TeBlunthuis| first1 = Nathan| last2 = Bayer| first2 = Tilman| last3 = Vasileva| first3 = Olga| title = Dwelling on Wikipedia: Investigating Time Spent by Global Encyclopedia Readers|book-title= Proceedings of the 15th International Symposium on Open Collaboration| location = New York, NY, USA| series = OpenSym '19| date = 2019|url=https://opensym.org/wp-content/uploads/2019/08/os19-paper-A14-teblunthuis.pdf}}</ref> (co-authored by this reviewer): |
||
[[File:The distribution of dwell times across 242 language editions of Wikipedia (Figure 2 from 'Dwelling on Wikipedia').png|thumb|The median time readers spend on a Wikipedia article is around 25 seconds]] |
|||
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">"In 2017, the Wikimedia Foundation began measuring the time readers spend on a given page view (dwell time), enabling a more detailed understanding of [Wikipedia] reading patterns. In this paper, we validate and model this new data source and, building on existing findings, use regression analysis to test hypotheses about how patterns in reading time vary between global contexts. |
|||
Consistent with prior findings from self-report data, our complementary analysis of behavioral data provides evidence that Global South readers are more likely to use Wikipedia to gain in-depth understanding of a topic." |
Consistent with prior findings from self-report data, our complementary analysis of behavioral data provides evidence that Global South readers are more likely to use Wikipedia to gain in-depth understanding of a topic. [...] The median reading time [across all Wikipedias, globally] is 25 seconds and the 75th percentile is 75.1 seconds. [...] Based on our data, we estimate that humanity spent about 672,349 years reading Wikipedia from November 2017 through October 2018." |
||
</blockquote> |
</blockquote> |
||
(See also: [https://twitter.com/frimelle/status/1164484802847731714 tweet from the presentation |
''(See also: [[:File:Dwelling_on_Wikipedia_slides_from_Opensym_2019.pdf|slides]] and [https://twitter.com/frimelle/status/1164484802847731714 tweet] from the presentation, [[m:Research:Reading time|project page on Meta-wiki]], [[phab:T230642|planned public data release]], [[wikimania:2019:Research/Dwelling_on_Wikipedia_Investigating_time_spent_by_global_encyclopedia_readers|Wikimania presentation]]/[https://twitter.com/JeanFred/status/1162672221875253252 tweet])'' |
||
===="Visualization of the Evolution of Collaboration and Communication Networks in Wikis"==== |
===="Visualization of the Evolution of Collaboration and Communication Networks in Wikis"==== |
||
This paper<ref>{{Cite conference| publisher = ACM| doi = 10.1145/3306446.3340834| isbn = 9781450363198| pages = 11–1–11:10| last1 = Faqir| first1 = Youssef El| last2 = Arroyo| first2 = Javier| last3 = Serrano| first3 = Abel| title = Visualization of the Evolution of Collaboration and Communication Networks in Wikis| |
This paper<ref>{{Cite conference| publisher = ACM| doi = 10.1145/3306446.3340834| isbn = 9781450363198| pages = 11–1–11:10| last1 = Faqir| first1 = Youssef El| last2 = Arroyo| first2 = Javier| last3 = Serrano| first3 = Abel| title = Visualization of the Evolution of Collaboration and Communication Networks in Wikis|book-title= Proceedings of the 15th International Symposium on Open Collaboration| location = New York, NY, USA| series = OpenSym '19| date = 2019|url=https://opensym.org/wp-content/uploads/2019/08/os19-paper-A11-faqir.pdf}}</ref> presented applications of the "WikiChron" tool, available as a demo for various (non-Wikimedia) wikis at http://wikichron.science/ (with source code available [https://github.com/Grasia/WikiChron on GitHub]). It was also the subject of a [[wikimania:2019:Research/Analyzing_the_evolution_of_wikis_with_WikiChron|presentation at this year's Wikimania]]. |
||
====Wikitribune navigating "challenges of collaborative evidence-based journalism"==== |
====Wikitribune navigating "challenges of collaborative evidence-based journalism"==== |
||
This paper<ref>{{Cite conference| publisher = ACM| doi = 10.1145/3306446.3340818| isbn = 9781450363198| pages = 12–1–12:10| last1 = O'Riordan| first1 = Sheila| last2 = Kiely| first2 = Gaye| last3 = Emerson| first3 = Bill| last4 = Feller| first4 = Joseph| title = Do You Have a Source for That?: Understanding the Challenges of Collaborative Evidence-based Journalism| |
This paper<ref>{{Cite conference| publisher = ACM| doi = 10.1145/3306446.3340818| isbn = 9781450363198| pages = 12–1–12:10| last1 = O'Riordan| first1 = Sheila| last2 = Kiely| first2 = Gaye| last3 = Emerson| first3 = Bill| last4 = Feller| first4 = Joseph| title = Do You Have a Source for That?: Understanding the Challenges of Collaborative Evidence-based Journalism|book-title= Proceedings of the 15th International Symposium on Open Collaboration| location = New York, NY, USA| series = OpenSym '19| date = 2019|url=https://opensym.org/wp-content/uploads/2019/08/os19-paper-A12-oriordan.pdf}}</ref> examined [[Wikitribune]], a for-profit but freely licensed news site launched in 2017. While Wikitribune is (despite the name) not based on a wiki, its model of open collaboration between professional journalists and volunteers, as well as the fact that it was launched by Wikipedia founder [[Jimmy Wales]], made it a fitting subject for OpenSym. |
||
Among the potential barriers to volunteer participation on WikiTribune identified by the researchers - in particular in its initial version - were the website's design (emphasizing readability over editability), and a real names policy. Over time, the project's model morphed from closed to hybrid to more open (also involving the departure of all paid journalists). Some data from the project's first year, as [https://twitter.com/frimelle/status/1164470845194088450 highlighted in the presentation]: The vast majority of articles (79%) were written by the paid staff. Articles tended to be UK-centric, have a low engagement in the comments, and had on average nine revisions and six different contributors. |
Among the potential barriers to volunteer participation on WikiTribune identified by the researchers - in particular in its initial version - were the website's design (emphasizing readability over editability), and a real names policy. Over time, the project's model morphed from closed to hybrid to more open (also involving the departure of all paid journalists). Some data from the project's first year, as [https://twitter.com/frimelle/status/1164470845194088450 highlighted in the presentation]: The vast majority of articles (79%) were written by the paid staff. Articles tended to be UK-centric, have a low engagement in the comments, and had on average nine revisions and six different contributors. |
||
===Conferences and events=== |
===Conferences and events=== |
||
See |
See the [[mw:Wikimedia Research/Showcase|page of the monthly Wikimedia Research Showcase]] for videos and slides of past presentations. |
||
===Other recent publications=== |
===Other recent publications=== |
||
''Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, [[m:Research:Newsletter#How to contribute|are always welcome]].'' |
''Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, [[m:Research:Newsletter#How to contribute|are always welcome]].'' |
||
:<small>''Compiled by [[User:HaeB|Tilman Bayer]]''</small> |
:<small>''Compiled by [[User:HaeB|Tilman Bayer]]''</small> |
||
;Papers from [https://www.icwsm.org/2019/program/accepted-papers/ |
;Papers from [https://www.icwsm.org/2019/program/accepted-papers/ ICWSM 2019]: |
||
===="Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start"==== |
===="Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start"==== |
||
From the abstract<ref>{{Cite |
From the abstract<ref>{{Cite arXiv | last1 = Yazdanian| first1 = Ramtin| last2 = Zia| first2 = Leila| last3 = Morgan| first3 = Jonathan| last4 = Mansurov| first4 = Bahodir| last5 = West| first5 = Robert| title = Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start | date = 2019-04-08|eprint=1904.03889| class = cs.IR}}</ref> of this paper (which received the "Outstanding Problem-Solution Paper" award at the conference): |
||
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">"Standard recommender systems [...] rely on users' histories of previous interactions with the platform. As such, these systems cannot make high-quality recommendations to newcomers without any previous interactions -- the so-called cold-start problem. The present paper addresses the cold-start problem on Wikipedia by developing a method for automatically building short questionnaires that, when completed by a newly registered Wikipedia user, can be used for a variety of purposes, including article recommendations that can help new editors get started. Our questionnaires are constructed based on the text of Wikipedia articles as well as the history of contributions by the already onboarded Wikipedia editors. We assess the quality of our questionnaire-based recommendations in an offline evaluation using historical data, as well as an online evaluation with hundreds of real Wikipedia newcomers, concluding that our method provides cohesive, human-readable questions that perform well against several baselines."</blockquote> |
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">"Standard recommender systems [...] rely on users' histories of previous interactions with the platform. As such, these systems cannot make high-quality recommendations to newcomers without any previous interactions -- the so-called cold-start problem. The present paper addresses the cold-start problem on Wikipedia by developing a method for automatically building short questionnaires that, when completed by a newly registered Wikipedia user, can be used for a variety of purposes, including article recommendations that can help new editors get started. Our questionnaires are constructed based on the text of Wikipedia articles as well as the history of contributions by the already onboarded Wikipedia editors. We assess the quality of our questionnaire-based recommendations in an offline evaluation using historical data, as well as an online evaluation with hundreds of real Wikipedia newcomers, concluding that our method provides cohesive, human-readable questions that perform well against several baselines."</blockquote> |
||
''See also [[m:Research:Voice and exit in a voluntary work environment/Elicit new editor interests|project page on Meta-wiki]]'' |
|||
====Shocks make both newcomers and experienced editors contribute more==== |
====Shocks make both newcomers and experienced editors contribute more==== |
||
From the abstract:<ref>{{Cite journal| issn = 2334-0770| volume = 13| issue = |
From the abstract:<ref>{{Cite journal| issn = 2334-0770| volume = 13| issue = 1| pages = 560–571| last1 = Zhang| first1 = Ark Fangzhou| last2 = Wang| first2 = Ruihan| last3 = Blohm| first3 = Eric| last4 = Budak| first4 = Ceren| last5 = Jr| first5 = Lionel P. Robert| last6 = Romero| first6 = Daniel M.| title = Participation of New Editors after Times of Shock on Wikipedia| journal = Proceedings of the International AAAI Conference on Web and Social Media| date = 2019-07-06| doi = 10.1609/icwsm.v13i01.3253| s2cid = 96439496|url=https://aaai.org/ojs/index.php/ICWSM/article/view/3253}}</ref> |
||
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;"> |
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;"> |
||
"[We study] participation following shocks that draw attention to an article. Such events can be recruiting opportunities due to increased attention; but can also pose a threat to the quality and control of the article and drive away newcomers. [We examine] shocks generated by drastic increases in attention as indicated by data from Google trends. We find that participation following such events is indeed different from participation during normal times–both newcomers and incumbents participate at higher rates during shocks. We also identify collaboration dynamics that mediate the effects of shocks on continued participation after the shock. The impact of shocks on participation is mediated by the amount of negative feedback given to newcomers in the form of reverted edits and the amount of coordination editors engage in through edits of the article’s talk page." |
"[We study] participation following shocks that draw attention to an article. Such events can be recruiting opportunities due to increased attention; but can also pose a threat to the quality and control of the article and drive away newcomers. [We examine] shocks generated by drastic increases in attention as indicated by data from Google trends. We find that participation following such events is indeed different from participation during normal times–both newcomers and incumbents participate at higher rates during shocks. We also identify collaboration dynamics that mediate the effects of shocks on continued participation after the shock. The impact of shocks on participation is mediated by the amount of negative feedback given to newcomers in the form of reverted edits and the amount of coordination editors engage in through edits of the article’s talk page." |
||
Line 119: | Line 119: | ||
===="Crosslingual Document Embedding As Reduced-Rank Ridge Regression"==== |
===="Crosslingual Document Embedding As Reduced-Rank Ridge Regression"==== |
||
From the abstract: <ref>{{Cite conference| publisher = ACM| doi = 10.1145/3289600.3291023| isbn = 9781450359405| pages = 744–752| last1 = Josifoski| first1 = Martin| last2 = Paskov| first2 = Ivan S.| last3 = Paskov| first3 = Hristo S.| last4 = Jaggi| first4 = Martin| last5 = West| first5 = Robert| title = Crosslingual Document Embedding As Reduced-Rank Ridge Regression| |
From the abstract: <ref>{{Cite conference| publisher = ACM| doi = 10.1145/3289600.3291023| isbn = 9781450359405| pages = 744–752| last1 = Josifoski| first1 = Martin| last2 = Paskov| first2 = Ivan S.| last3 = Paskov| first3 = Hristo S.| last4 = Jaggi| first4 = Martin| last5 = West| first5 = Robert| title = Crosslingual Document Embedding As Reduced-Rank Ridge Regression|book-title= Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining| location = New York, NY, USA| series = WSDM '19| date = 2019}} {{closed access}} [https://dlab.epfl.ch/people/west/pub/Josifoski-Paskov-Paskov-Jaggi-West_WSDM-19.pdf Author's copy]</ref> |
||
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;"> |
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;"> |
||
"... For training, our approach leverages a multilingual corpus where the same concept is covered in multiple languages (but not necessarily via exact translations), such as Wikipedia."</blockquote> |
"... For training, our approach leverages a multilingual corpus where the same concept is covered in multiple languages (but not necessarily via exact translations), such as Wikipedia."</blockquote> |
||
Line 131: | Line 131: | ||
===="Framing the Holocaust Online: Memory of the Babi Yar Massacres on Wikipedia"==== |
===="Framing the Holocaust Online: Memory of the Babi Yar Massacres on Wikipedia"==== |
||
From the abstract<ref>Mykola Makhortykh |
From the abstract<ref>{{cite journal |first=Mykola |last=Makhortykh |date=2017 |title=Framing the Holocaust Online: Memory of the Babi Yar Massacres on Wikipedia |journal=Digital Icons: Studies in Russian, Eurasian and Central European New Media |url=http://www.digitalicons.org/issue18/framing-the-holocaust-online-memory-of-the-babi-yar-massacres/ |
||
|issue=18 |pages=67–94}}</ref> |
|||
Studies in Russian, Eurasian and Central European New Media (digitalicons.org), No 18 (2017): 67–94.</ref> |
|||
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;"> |
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;"> |
||
"The article explores how a notorious case of Second World War atrocities in Ukraine – the [[Babi Yar|Babi Yar massacres]] of 1941-1943 – is represented and interpreted on Wikipedia. Using qualitative content analysis, it examines what frames and content features are used in different language versions of Wikipedia to transcribe the traumatic narrative of Babi Yar as an online encyclopedia entry. It also investigates how these frames are constructed by |
"The article explores how a notorious case of Second World War atrocities in Ukraine – the [[Babi Yar|Babi Yar massacres]] of 1941-1943 – is represented and interpreted on Wikipedia. Using qualitative content analysis, it examines what frames and content features are used in different language versions of Wikipedia to transcribe the traumatic narrative of Babi Yar as an online encyclopedia entry. It also investigates how these frames are constructed by scrutinizing the process of collaborative frame-building on discussion pages of Wikipedia and exploring how Wikipedia users employ different power play strategies to promote their vision of the events at Babi Yar."</blockquote> |
||
''(See also "Framing the Holocaust in popular knowledge" below, and related earlier coverage: "[[m:Research:Newsletter/2014/October#Holocaust_articles_compared_across_languages|Holocaust articles compared across languages]]")'' |
''(See also "Framing the Holocaust in popular knowledge" below, and related earlier coverage: "[[m:Research:Newsletter/2014/October#Holocaust_articles_compared_across_languages|Holocaust articles compared across languages]]")'' |
||
Line 140: | Line 140: | ||
===="Framing the Holocaust in popular knowledge: 3 articles about the Holocaust in English, Hebrew and Polish Wikipedia"==== |
===="Framing the Holocaust in popular knowledge: 3 articles about the Holocaust in English, Hebrew and Polish Wikipedia"==== |
||
From the abstract<ref> |
From the abstract<ref> |
||
{{Cite journal| doi = 10.11649/a.2016.012| issn = 2300-0783| volume = |
{{Cite journal| doi = 10.11649/a.2016.012| issn = 2300-0783| volume = | issue = 8| pages = 29–49| last = Wolniewicz-Slomka| first = Daniel| title = Framing the Holocaust in popular knowledge: 3 articles about the Holocaust in English, Hebrew and Polish Wikipedia| journal = Adeptus| date = 2016-12-22}}</ref>: |
||
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;"> |
<blockquote style="padding-left:1.0em; padding-right:1.0em; background-color:#eaf8f4;">" |
||
... the article conducts a content analysis of three articles, in three different languages [...]: “[[Auschwitz-Birkenau|Auschwitz-Birkenau Camp]]”, “[[Jedwabne pogrom|The Pogrom in Jedwabne]]”, and “[[Righteous Among the Nations]]”. [...] Analyzing how the articles fulfill each of the roles in the different languages, the research hypothesis is that the framing of the phenomena will differ between the versions, and each version will follow pillars of the collective memory of the Holocaust in its respective country. Findings, however, are not in complete compliance with this hypothesis."</blockquote> |
... the article conducts a content analysis of three articles, in three different languages [...]: “[[Auschwitz-Birkenau|Auschwitz-Birkenau Camp]]”, “[[Jedwabne pogrom|The Pogrom in Jedwabne]]”, and “[[Righteous Among the Nations]]”. [...] Analyzing how the articles fulfill each of the roles in the different languages, the research hypothesis is that the framing of the phenomena will differ between the versions, and each version will follow pillars of the collective memory of the Holocaust in its respective country. Findings, however, are not in complete compliance with this hypothesis."</blockquote> |
||
''(See also "Framing the Holocaust Online" above, and related earlier coverage: "[[m:Research:Newsletter/2014/October#Holocaust_articles_compared_across_languages|Holocaust articles compared across languages]]")'' |
''(See also "Framing the Holocaust Online" above, and related earlier coverage: "[[m:Research:Newsletter/2014/October#Holocaust_articles_compared_across_languages|Holocaust articles compared across languages]]")'' |
||
Line 150: | Line 150: | ||
:Supplementary references and notes: |
:Supplementary references and notes: |
||
{{Reflist|30em|group=supp}} |
{{Reflist|30em|group=supp}} |
||
{{Signpost draft helper}} |
|||
<!--END OF ARTICLE --> |
<!--END OF ARTICLE --> |
||
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-end-v2}} |
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-block-end-v2}} |
||
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-article-end-v2}} |
{{Wikipedia:Wikipedia Signpost/Templates/Signpost-article-end-v2}} |
||
<noinclude>{{Wikipedia: |
<noinclude>{{Wikipedia:Signpost/Template:Signpost-article-comments-end||2019-08-30|2019-10-31}}</noinclude> |
Latest revision as of 02:43, 6 January 2024
Wikipedia's role in assessing credibility of news sources; using wikis against procrastination; OpenSym 2019 report
A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.
"Reducing Procrastination While Improving Performance: A Wiki-powered Experiment with Students"
- Reviewed by Nate TeBlunthuis
In this research,[1] presented at last month's OpenSym conference, Balderas and colleagues experimented with a MediaWiki wiki for British university students in a computer science course to turn in and collaborate on written assignments. They were interested in developing a pedagogical intervention to combat procrastination by students, which they describe as an "ethically questionable" behavior.
They created a new version of a six week software engineering course with the goal of reducing procrastination and evaluated it as an experiment comparing the time management and grade performance of students between two consecutive years of the course. In the first year, the students were to turn in their assignments all at once at the end of the course and were free to use whatever software they wished. In the second year, the students were trained in MediaWiki, used it to complete weekly assignments, and would be penalized for not finishing the work on time. This second group of students procrastinated less (in the first year 16% of students handed in late work, compared to only 4% in the second year) and achieved better grades (in the first year many more students received a 'B' than an 'A', but the opposite was true in the second year).
I think this study achieved its goal of demonstrating that MediaWiki may be a useful pedagogical tool because edit history data can make it easy for instructors to monitor when students worked on their assignments. The course instructors used an open-source software called "WikiAssignmentMonitor" that extracts data from the MediaWiki database and generates a spreadsheet showing how much progress a student made on each assignment every week or hour. The researchers used this tool to track whether students completed work on time.
That said, the study also suffers limitations in its experimental design. Mainly, several other things changed between the two courses other than the use of a wiki or the schedule of deadlines. In particular, the assignments themselves were not exactly the same from one year to the next. However, they saw similar grade improvements for every assignment, even the ones that didn't change. Also, different software version control systems were used, but it seems more plausible that changing to MediaWiki and weekly deadlines explains their findings compared to this unrelated change. Importantly, it isn't possible from their study to say how much of the improvement should be attributed to the use of MediaWiki or to changing the schedule from 1 final deadline to 6 weekly deadlines.
Despite these limitations, I thought it was interesting to see an educational application of wikis that didn't rely heavily on collaboration, but instead on other affordances of the MediaWiki software that can be useful to instructors. They didn't have to require students turn in their work each week, they could just look at the WikiAssignmentMonitor report to check student's progress. Moreover, they could see students make progress on assignments over time at levels of granularity not normally available to course instructors. For instance, they could see whether a given student completed an assignment in one session instead of over many sessions. This paper made me curious about how this kind of monitoring would influence student behavior even if it wasn't a factor in their grades.
(Compare also the Wiki Education Foundation's dashboard: https://dashboard.wikiedu.org/ )
The Importance of Wikipedia in Assessing News Source Credibility
- Reviewed by Isaac Johnson
"How the Interplay of Google and Wikipedia Affects Perceptions of Online News Sources" by Annabel Rothschild, Emma Lurie, and Eni Mustafaraj of Wellesley College, published in the 2019 Computation and Journalism Symposium, focuses on how readers determine the quality of a given news source based on information provided through Google's rich search results.[2] This is a particularly timely study as this summer it was reported that, for the first time, over half of searches on Google are not resulting in clicks to links[supp 1]–i.e. Google Search has become progressively more efficient at satisfying the needs of their users without the user ever visiting the sites providing the content that is surfaced via Google. This means that Google Search increasingly sets the context in which readers evaluate the quality of information they read.
Rothschild et al. conduct two studies. The first involved interviews with 30 undergraduate students as they assessed the credibility of three news sources: The Durango Herald, The Tennessean, and The Christian Times. Many of the participants indicated that they used Google as the primary medium through which they evaluated a source. As a result, in the second study, Rothschild et al. recruited 66 individuals through Amazon Mechanical Turk to evaluate the credibility of two news sources (ProPublica and Newsmax) through the Knowledge Panel alone. Both studies indicated that information surfaced by Google from Wikipedia about the news sources figured heavily in readers' assessments.
This work highlights the incredible value that Wikipedia the provides to the world and tech platforms, in particular for helping readers assess the credibility of news sources. Readers use Wikipedia, as surfaced via Google, for this purpose, but sites like Youtube and Facebook also surface Wikipedia links about a source as a means of supporting fact-checking.[supp 2] This work also points towards particularly important statements on Wikidata for assessing the quality of a source -- namely awards that a publication has earned, social media presence, geographic context, and establishment date.
The paper closes by noting that despite the value that Wikipedia, as surfaced by Google, provides to readers, many news sources do not yet have a knowledge panel appearing when you search for them. It mentions the Newspapers on Wikipedia project (which had been inspired by early results from their research) as a valuable initiative for addressing this gap with many potential benefits beyond supporting credibility assessments within Google Search.
Wikipedia Topic Assessment
- Reviewed by Isaac Johnson
"Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics" by Włodzimierz Lewoniewski, Krzysztof Węcel, and Witold Abramowicz of Poznan ́University, published by MDPI Computers,[3] examines the challenge of aggregating Wikipedia page views according to topic and comparing the quality and popularity of these topics across languages.
From a methodological standpoint, comprehensively labeling Wikipedia articles according to a relatively small number of topics is quite challenging. This problem has inspired many approaches and taxonomies (e.g., ORES drafttopic, Using Wikipedia categories for research, Wikidata Concepts Monitor). This work explores two approaches: 1) automatic mapping of the existing category network on Wikipedia to high-level categories as identified by English Wikipedia, and, 2) topic as determined by a mixture of DBPedia and Wikidata classes. Figure 4 from the paper (shown here) shows the results for proportion of articles in each topic (using the category network method).
There are a lot of data and visualizations in this paper that I would encourage the reader to view for themselves. The authors also expose their results through the website WikiRank.
(See also earlier coverage of related publications by some of the same authors)
OpenSym 2019
- Report by Tilman Bayer
The fifteenth edition of the annual OpenSym conference took place in Skövde, Sweden last month. The event was launched in 2005 as "WikiSym", focusing exclusively on research about wikis, but over time came to include other forms of "open collaboration" and was renamed to OpenSym several years ago. Many papers presented at this year's OpenSym (see proceedings) studied open source software collaboration, but a substantial part were still focused on Wikipedia, Wikidata and other wikis. Apart from Balderas et al.'s paper on wikis and procrastination (reviewed above), these were:
First literature survey of Wikidata quality research
Among the takeaways presented from this overview of 28 papers which covered this area since Wikidata's launch in 2012 (some comparing it with other structured data projects such as DBpedia or YAGO):[4]
- Many papers have examined the completeness of Wikidata's data, but few its accuracy.
- The high availability (server uptime) of wikidata.org is a relevant quality aspect for many users.
- "Wikidata outperforms similar projects in many dimensions."
"Article Quality Classification on Wikipedia: Introducing Document Embeddings and Content Features"
From the abstract:[5]:
"... we extend [the] previous line of research on [automated] article quality classification by extending the set of features with novel content and edit features (e.g., document embeddings of articles). We propose a classification approach utilizing gradient boosted trees based on this novel, extended set of features extracted from Wikipedia articles. Based on an established dataset containing Wikipedia articles and quality classes, we show that our approach is able to substantially outperform previous approaches (also including recent deep learning methods [cf. previous coverage: 'Improved article quality prediction with deep learning'])."
"Document embeddings" refers to mapping each article to a vector in a vector space of "500 latent dimensions" (analogous to word embeddings), resulting in "a numeric, latent representation of the document content, its context, and semantics. We hypothesize that adding this comprehensive article representation can be leveraged for getting a better representation of the contents of an article and hence, its quality."
The edit-related features include the timestamps of the article's last 100 edits, and "the vector differences between the tf/idf vectors of the last 100 versions of the article."
(See also tweets from the presentation: [1], [2])
"When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata"
This paper[6] examines the work on labels in Wikidata (i.e. the most common name of an item in a particular language, typically but not always coinciding with the title of the corresponding Wikipedia article in that language, if it exists).
From the conclusions:"We identify three types of editors: registered editors, bots, and anonymous editors. Bots contributed to the most number of labels for specific languages while registered users tend to contribute more to multilingual labels, i.e., translation. The hybrid approach of Wikidata, of humans and bots editing the knowledge graph alongside, supports the collaborative work towards the completion of the knowledge graph."
"Approving automation: analyzing requests for permissions of bots in Wikidata"
From the paper's conclusions:[7]
"We studied the formal process of requesting bot rights in Wikidata [...] The RfPs [ requests for permission ] were studied mainly from two perspectives: 1) What information is provided during the time the bot rights are requested and 2) how the community handles these requests. We found that the main tasks requested are adding claims, statements, terms and sitelinks into Wikidata, as well as the main source of bot edits have their roots in Wikipedia. This contrasts with Wikipedia where bots are performing mostly maintenance tasks. Our findings also show that most of the RfPs were approved and a small number of them were unsuccessful mainly because operators had withdrawn or there was no activity from the operators."
"Dwelling on Wikipedia: Investigating Time Spent by Global Encyclopedia Readers"
From the abstract and paper[8] (co-authored by this reviewer):
"In 2017, the Wikimedia Foundation began measuring the time readers spend on a given page view (dwell time), enabling a more detailed understanding of [Wikipedia] reading patterns. In this paper, we validate and model this new data source and, building on existing findings, use regression analysis to test hypotheses about how patterns in reading time vary between global contexts.
Consistent with prior findings from self-report data, our complementary analysis of behavioral data provides evidence that Global South readers are more likely to use Wikipedia to gain in-depth understanding of a topic. [...] The median reading time [across all Wikipedias, globally] is 25 seconds and the 75th percentile is 75.1 seconds. [...] Based on our data, we estimate that humanity spent about 672,349 years reading Wikipedia from November 2017 through October 2018."
(See also: slides and tweet from the presentation, project page on Meta-wiki, planned public data release, Wikimania presentation/tweet)
"Visualization of the Evolution of Collaboration and Communication Networks in Wikis"
This paper[9] presented applications of the "WikiChron" tool, available as a demo for various (non-Wikimedia) wikis at http://wikichron.science/ (with source code available on GitHub). It was also the subject of a presentation at this year's Wikimania.
Wikitribune navigating "challenges of collaborative evidence-based journalism"
This paper[10] examined Wikitribune, a for-profit but freely licensed news site launched in 2017. While Wikitribune is (despite the name) not based on a wiki, its model of open collaboration between professional journalists and volunteers, as well as the fact that it was launched by Wikipedia founder Jimmy Wales, made it a fitting subject for OpenSym.
Among the potential barriers to volunteer participation on WikiTribune identified by the researchers - in particular in its initial version - were the website's design (emphasizing readability over editability), and a real names policy. Over time, the project's model morphed from closed to hybrid to more open (also involving the departure of all paid journalists). Some data from the project's first year, as highlighted in the presentation: The vast majority of articles (79%) were written by the paid staff. Articles tended to be UK-centric, have a low engagement in the comments, and had on average nine revisions and six different contributors.
Conferences and events
See the page of the monthly Wikimedia Research Showcase for videos and slides of past presentations.
Other recent publications
Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.
- Compiled by Tilman Bayer
- Papers from ICWSM 2019
"Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start"
From the abstract[11] of this paper (which received the "Outstanding Problem-Solution Paper" award at the conference):
"Standard recommender systems [...] rely on users' histories of previous interactions with the platform. As such, these systems cannot make high-quality recommendations to newcomers without any previous interactions -- the so-called cold-start problem. The present paper addresses the cold-start problem on Wikipedia by developing a method for automatically building short questionnaires that, when completed by a newly registered Wikipedia user, can be used for a variety of purposes, including article recommendations that can help new editors get started. Our questionnaires are constructed based on the text of Wikipedia articles as well as the history of contributions by the already onboarded Wikipedia editors. We assess the quality of our questionnaire-based recommendations in an offline evaluation using historical data, as well as an online evaluation with hundreds of real Wikipedia newcomers, concluding that our method provides cohesive, human-readable questions that perform well against several baselines."
See also project page on Meta-wiki
Shocks make both newcomers and experienced editors contribute more
From the abstract:[12]
"[We study] participation following shocks that draw attention to an article. Such events can be recruiting opportunities due to increased attention; but can also pose a threat to the quality and control of the article and drive away newcomers. [We examine] shocks generated by drastic increases in attention as indicated by data from Google trends. We find that participation following such events is indeed different from participation during normal times–both newcomers and incumbents participate at higher rates during shocks. We also identify collaboration dynamics that mediate the effects of shocks on continued participation after the shock. The impact of shocks on participation is mediated by the amount of negative feedback given to newcomers in the form of reverted edits and the amount of coordination editors engage in through edits of the article’s talk page."
"Crosslingual Document Embedding As Reduced-Rank Ridge Regression"
From the abstract: [13]
"... For training, our approach leverages a multilingual corpus where the same concept is covered in multiple languages (but not necessarily via exact translations), such as Wikipedia."
Tweet by one of the authors:
"Try Cr5, our new model for crosslingual document embedding! Input: text in any of 28 languages Output: language-independent vector representation, so you can compare text across langs. Pre-trained model and API: https://github.com/epfl-dlab/Cr5 "
- Other publications
"Framing the Holocaust Online: Memory of the Babi Yar Massacres on Wikipedia"
From the abstract[14]
"The article explores how a notorious case of Second World War atrocities in Ukraine – the Babi Yar massacres of 1941-1943 – is represented and interpreted on Wikipedia. Using qualitative content analysis, it examines what frames and content features are used in different language versions of Wikipedia to transcribe the traumatic narrative of Babi Yar as an online encyclopedia entry. It also investigates how these frames are constructed by scrutinizing the process of collaborative frame-building on discussion pages of Wikipedia and exploring how Wikipedia users employ different power play strategies to promote their vision of the events at Babi Yar."
(See also "Framing the Holocaust in popular knowledge" below, and related earlier coverage: "Holocaust articles compared across languages")
"Framing the Holocaust in popular knowledge: 3 articles about the Holocaust in English, Hebrew and Polish Wikipedia"
From the abstract[15]:
" ... the article conducts a content analysis of three articles, in three different languages [...]: “Auschwitz-Birkenau Camp”, “The Pogrom in Jedwabne”, and “Righteous Among the Nations”. [...] Analyzing how the articles fulfill each of the roles in the different languages, the research hypothesis is that the framing of the phenomena will differ between the versions, and each version will follow pillars of the collective memory of the Holocaust in its respective country. Findings, however, are not in complete compliance with this hypothesis."
(See also "Framing the Holocaust Online" above, and related earlier coverage: "Holocaust articles compared across languages")
References
- ^ Balderas, Antonio; Capiluppi, Andrea; Palomo-Duarte, Manuel; Malizia, Alessio; Dodero, Juan Manuel (2019). "Reducing Procrastination While Improving Performance: A Wiki-powered Experiment with Students" (PDF). Proceedings of the 15th International Symposium on Open Collaboration. OpenSym '19. New York, NY, USA: ACM. p. 10:1-10. doi:10.1145/3306446.3340813. ISBN 9781450363198.
- ^ Rothschild, Annabel; Lurie, Emma; Mustafaraj, Eni (2019). "How the Interplay of Google and Wikipedia Affects Perceptions of Online News Sources" (PDF). Computation + Journalism Symposium. Retrieved 29 September 2019.
- ^ Lewoniewski, Włodzimierz; Węcel, Krzysztof; Abramowicz, Witold (14 August 2019). "Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics". Computers. 8 (3): 60. doi:10.3390/computers8030060.
- ^ Piscopo, Alessandro; Simperl, Elena (2019). "What We Talk About when We Talk About Wikidata Quality: A Literature Survey" (PDF). Proceedings of the 15th International Symposium on Open Collaboration. OpenSym '19. New York, NY, USA: ACM. pp. 17–1–17:11. doi:10.1145/3306446.3340822. ISBN 9781450363198.
- ^ Schmidt, Manuel; Zangerle, Eva (2019). "Article Quality Classification on Wikipedia: Introducing Document Embeddings and Content Features" (PDF). Proceedings of the 15th International Symposium on Open Collaboration. OpenSym '19. New York, NY, USA: ACM. pp. 13–1–13:8. doi:10.1145/3306446.3340831. ISBN 9781450363198.
- ^ Kaffee, Lucie-Aimée; Endris, Kemele M.; Simperl, Elena (2019). "When Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata" (PDF). Proceedings of the 15th International Symposium on Open Collaboration. OpenSym '19. New York, NY, USA: ACM. pp. 16–1–16:9. doi:10.1145/3306446.3340826. ISBN 9781450363198.
- ^ Farda-Sarbas, Mariam; Zhu, Hong; Nest, Marisa Frizzi; Müller-Birn, Claudia (2019-08-20). Approving automation: analyzing requests for permissions of bots in wikidata (PDF). Proceedings of the 15th International Symposium on Open Collaboration. ACM. p. 15. doi:10.1145/3306446.3340833. ISBN 9781450363198.
- ^ TeBlunthuis, Nathan; Bayer, Tilman; Vasileva, Olga (2019). "Dwelling on Wikipedia: Investigating Time Spent by Global Encyclopedia Readers" (PDF). Proceedings of the 15th International Symposium on Open Collaboration. OpenSym '19. New York, NY, USA: ACM. pp. 14–1–14:14. doi:10.1145/3306446.3340829. ISBN 9781450363198.
- ^ Faqir, Youssef El; Arroyo, Javier; Serrano, Abel (2019). "Visualization of the Evolution of Collaboration and Communication Networks in Wikis" (PDF). Proceedings of the 15th International Symposium on Open Collaboration. OpenSym '19. New York, NY, USA: ACM. pp. 11–1–11:10. doi:10.1145/3306446.3340834. ISBN 9781450363198.
- ^ O'Riordan, Sheila; Kiely, Gaye; Emerson, Bill; Feller, Joseph (2019). "Do You Have a Source for That?: Understanding the Challenges of Collaborative Evidence-based Journalism" (PDF). Proceedings of the 15th International Symposium on Open Collaboration. OpenSym '19. New York, NY, USA: ACM. pp. 12–1–12:10. doi:10.1145/3306446.3340818. ISBN 9781450363198.
- ^ Yazdanian, Ramtin; Zia, Leila; Morgan, Jonathan; Mansurov, Bahodir; West, Robert (2019-04-08). "Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start". arXiv:1904.03889 [cs.IR].
- ^ Zhang, Ark Fangzhou; Wang, Ruihan; Blohm, Eric; Budak, Ceren; Jr, Lionel P. Robert; Romero, Daniel M. (2019-07-06). "Participation of New Editors after Times of Shock on Wikipedia". Proceedings of the International AAAI Conference on Web and Social Media. 13 (1): 560–571. doi:10.1609/icwsm.v13i01.3253. ISSN 2334-0770. S2CID 96439496.
- ^ Josifoski, Martin; Paskov, Ivan S.; Paskov, Hristo S.; Jaggi, Martin; West, Robert (2019). "Crosslingual Document Embedding As Reduced-Rank Ridge Regression". Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. WSDM '19. New York, NY, USA: ACM. pp. 744–752. doi:10.1145/3289600.3291023. ISBN 9781450359405. Author's copy
- ^ Makhortykh, Mykola (2017). "Framing the Holocaust Online: Memory of the Babi Yar Massacres on Wikipedia". Digital Icons: Studies in Russian, Eurasian and Central European New Media (18): 67–94.
- ^ Wolniewicz-Slomka, Daniel (2016-12-22). "Framing the Holocaust in popular knowledge: 3 articles about the Holocaust in English, Hebrew and Polish Wikipedia". Adeptus (8): 29–49. doi:10.11649/a.2016.012. ISSN 2300-0783.
- Supplementary references and notes:
- ^ Fishkin, Rand (13 August 2019). "Less than Half of Google Searches Now Result in a Click". SparkToro. Retrieved 29 September 2019.
- ^ Matsakis, Louise (16 March 2018). "Youtube, Facebook, and Google Can't Expect Wikipedia to Cure the Internet". Wired. Retrieved 29 September 2019.
Discuss this story