Paper Title (use style: paper title)
On Whether Tweets Predict Elections
A review of the literature indicates “No”
Damien Bourke
School of Information Technology & Mathematical Sciences
University of South Australia
Mawson Lakes, Australia
XXXXXXXXXX
Abstract—Twitter is an easy to use source of data, and several researchers have used the data contained within tweets to make election predictions. However, Twitter users are not a representative sample of the general population, and thus using tweets as a source for predictions should be undertaken with caution, if used at all. Although certain models produced thus far claim to predict election results, a closer examination finds that they lack methodological rigor, thus casting their results in doubt. Improvements in existing models may lead to better results, and ideas for future, more robust models are discussed. However, Twitter should, at most, complement rather than replace existing forecasting methods.
Keywords—twitter; election results; prediction
Introduction
Twitter (www.twitter.com) is a microblogging platform whereby users publish tweets, which are visible to users’ followers. A tweet has a 140-character limitation; and thus, conveys less information than either a Facebook post or blog entry (since neither face the same 140-character restraint). Although figures have not been updated since August 2013, Twitter users have been estimated to send approximately 500 million tweets per day [1]. There are approximately 313 million [2] falling into three categories.
The first type of users is individuals representing themselves, which is the most common type of account. However, it must be noted that accounts can be shared between users and/or devices. For example, after Donald Trump was nominated as the Republican presidential candidate, his official account, @realDonaldTrump, had posts sent from at least two devices: at least one Android mobile telephone and at least one iPhone. As the attitude and content of the messages from either device varied, it is suspected that the Android messages were sent by Donald Trump himself, while the iPhone messages were sent by his campaign staff (the iPhone-based messages were notable more restrained compared to the more puerile Android-based messages) [3].
The second type of users is individuals who represent organizations. For example, posts made on behalf of corporations or political parties are done so by one or more (unknown and potentially unknowable) individuals representing that corporation or party. For example, it is unclear as to who is the author of tweets from @secularparty; similarly, it is unclear how many individuals post on behalf of this account for the Secular Party of Australia.
The third type of users is automated (botnet) accounts. For example, @RealPressSecBot, which “Checks for new @realDonalTrump tweets every 5 min & transforms them into co
ect Presidential statement format.” [4]. One estimate of the total number of botnet accounts is in the millions [5], although it the actual number of such accounts is unknown.
Source of Twitter Data
With regards to the time during which election campaigns took place, a review of 127 studies [6] found three main sources of data were obtained from Twitter: from candidates or parties, from the general public, and from mediated events.
Tweets from political parties or candidates fell into two basic categories:
oadcasting messages, and replies to messages. Previous research found that most tweets from politicians or political parties adopted a
oadcast style; these tweets were intended for one-way dissemination of content from the sender to the receiver. Such tweets include retweets, and represent posting about a topic, rather responding to a topic. Thus, these types of tweets focused on self-promotion [7], impression management [8], information dissemination [9], and party mobilization [10]. By way of example, a self-promotional tweet is shown in Fig. 1, where the tween from an organization user is directed to both @ScottMo
isonMP and the Twitter feeds based on hashtags #qt (Question time in the Australian Federal Parliament) and #auspol (Australian politics).
This Govt has a plan to drive the economy forward and get expenditure under control - @ScottMo
isonMP #qt #auspol
Fig. 1 Tweet from Liberal Party of Australia (@LiberalAus), 13-Sep-2016
However, more recent research indicates that usage of Twitter has moved toward a more conversation and dyadic style of interaction; an increasing percentage of tweets contain replies and mentions [11] (both of which use the “@” feature to direct tweets to the specified users), as shown in Fig. 2, where the tweet from an organization is directing to two individuals, @PaulineHansonOZ and @LyleShelton, while at the same time the post is observable to all followers of @aussexparty, as well as being available in the Twitter feed for #auspol (Australian politics).
Absolute silence from @PaulineHansonOz and @LyleShelton on the Christian extremists who vandalised a war memorial in Queensland. #auspol
Fig. 2 Tweet from Australian Sex Party (@aussexparty), 03-Mar-2017
Tweets from the general public also fell into two basic categories: messages to parties or candidates, and messages about parties or candidates. Messages to parties or candidates involve mentioning the user account by prefixing their account with “@”, as shown in Fig. 3, where the tweet from an individual user is directed to @NSWLabor and is available in the Twitter feed for #abortion.
Dear @NSWLabor I feel betrayed as a woman that the party has male MPs in it who denied my bodily autonomy today. Voting Green. #abortion
Fig. 3 Tweet from Jackie McMillan (@MissDissentEats) mentioning Australian Labor Party, 11-May-2017
Messages about parties or candidates may include the name of the party or politician in either free text or as a hashtag. Tweets containing free text reference to political parties may be difficult to find in the Twitter data stream, as shown in Fig. 4, where although the tweet is clearly about SA Liberals, it was not directed to their Twitter account @LiberalSAHQ; however, the tweet is available in the Twitter feeds for #auspol (Australian politics) and #saparli (South Australian Parliament).
The SA Liberals first response to securing our energy future - it's all about coal #auspol #saparli https:
t.co/K5SiOeYVDm
Fig. 4 Tweet from Australian Labor Party, South Australia Branch (@alpsa) concerning the Liberal Party of Australia, 14-Mar-2017
Tweets from mediated events represent user interaction with television programs such as Q&A [12] or American Idol [13], an example of which uses the QandA hashtag is shown in Fig. 5, where a tweet from an organization user is made available to the Twitter feed #QandA (the Australian Broadcasting Corporation’s television program Q&A).
Ultimately the political class is distrustful of the arts because of sarcasm criticism ridicule of Govt decisions. #QandA
Fig. 5 Tweet from Seniors United Party of Australia (@AVoice4Seniors) using #QandA, 13-Mar-2017
Adopting Technology
As technology changes, politicians have adapted to and adopted their use. For example, Roosevelt’s voice was a match for radio [14], Kennedy’s charisma commanded television [15], and Obama em
aced Twitter [16]. The trend continues; all 17 Republican contenders for the presidential nomination used Twitter [17], with Donald Trump’s Twitter feed scoring highest on three constructs most likely to resonate with Republican voters: grandiosity, informality, and dynamism. A further study [3] observed that Trump’s mainly simple and repetitious, overwhelmingly negative and insulting tweets frequently use exclamation marks and words in capitalized forms.
Prediction
Various trends in the areas of the economy, public opinion and population health have been identified by the predictive use of analytics on social media data; for example, using Google trends for financial trading [18] and internet searches [19]. Such mining is not perfect [20], and such models are modified as their results are compared to other sources of data. Twitter has been used to make predictions in areas such as: whether or not a movie will be successful [21], the marketability of consumer goods [22], the rise and fall of the stock market [23], and the location and spread of pandemics [24]. The success of using data from Twitter to make prediction in real-world activities led researchers to investigate the extent to which Twitter could be used to forecast election results.
Predicting Election Results
Forecasting election results started with the 2009 German election, which concluded that the mere number of tweets about a party was a predictor of electoral success for that party [16] (i.e. it was unimportant whether or not the tweets were positive or negative; quantity was more important that quality). These results were soon rebutted [25] and the original authors offered a response to that criticism [26], watering down the predictive capabilities of their model. Thus, the debate the extent to which Twitter data could and should be used to forecast election results began. This debate is far from settled, although subsequent studies [27], [28], [29], [30], [31] have questioned the extent to which Twitter can provide accurate forecasting.
Following from that initial German study, data from Twitter has been used to predict elections in different countries, including: Brazil (2010): [32], France (2012) [33], Ireland (2011) [27], Italy (2011) [33], (2013) [34], Netherlands (2011) [28], New Zealand (2011) [35], Norway (2013) [11], Spain (2011) [36], Singapore (2011) [29], the United Kingdom (2010) [37], (2015) [38], and the United States of America (2010) [39], [36].
One concern with studies using data from Twitter is that most studies are post hoc and do produce true forecasts (results issued before the election) [40]. Indeed, on several occasions, election-related tweets have been subjected to post processing, with resulting claims that this data might have been able to make co
ect predictions [30]. By 2012, no paper had predicted a forward result [41]; however, a more recent paper [38] did make a forward prediction (and interestingly, that prediction was found be inco
ect).
Sourcing Data
The studies reviewed obtained their data from Twitter via a mixture of four methods, ranging in order of labor intensity: manually, using web-scrapes, using Twitter’s API, or utilizing third-party data sets. Manual transcription involves typing content from the Twitter website into a format useful for analysis. Web-scraping typically involves obtaining data in a format which must be cleaned in order to identify useful attributes. Using Twitter’s API allows data to be extracted into analysis tools such as R. Finally, third-party software products allow for complete data sets to be acquired.
A comparison of the results returned by various collection methods has not taken place [6]. For example, there is no agreed upon measure for ‘counting votes’: some studies count tweets, others count users, other perform (varying types of) sentiment analysis [41]. Thus, it is difficult to compare and contrast the merits of using one collection method over another. Given the absence of rigor in collecting and analyzing Twitter data, Twitter studies risk being as meaningless as the 1936 Literary Digest presidential poll [42], whereby a privileged and non-representative sample of the population (Literary Digest subscribers with telephones) was used to unsuccessfully predict the results of the presidential election.
Measuring Data
Moving beyond counting the mere number of tweets, later studies introduced various research methods to both complement and supplement simple statistics, including: adding candidate attributes [43], comparing number of