The Twitter Explorer: a Framework for Observing Twitter through Interactive Networks
Armin Pournaki, Felix Gaisbauer, Sven Banisch, Eckehard Olbrich
TThe twitter explorer
A framework for observing Twitter through interactive networks
ARMIN POURNAKI, FELIX GAISBAUER, SVEN BANISCH, and ECKEHARD OLBRICH,
Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
We present an open-source interface for scientists to explore Twitter datathrough interactive network visualizations. Combining data collection, trans-formation and visualization in one easily accessible framework, the twitterexplorer connects distant and close reading of Twitter data through theinteractive exploration of interaction networks and semantic networks. Bylowering the technological barriers of data-driven research it aims to at-tract researchers from various disciplinary backgrounds and facilitates newperspectives in the thriving field of computational social science.
INTRODUCTION
Due to its public-by-default nature and the possibility of callingdata sets conveniently via an API, Twitter has become a widely usedsource for the observation and analysis of political debates [1, 2],sentiments [3], brand communication [4], or natural disasters [5],to name a few. Different kinds of interactions on Twitter [6] areoften represented in the form of networks, such as retweet networks[1, 7], mention networks [7], follower networks [8] or co-hashtagnetworks [9]. Twitter data is therefore being used extensively toaddress a whole variety of research questions. However, analysisis usually being carried out by a tech-savvy community orientedtowards quantitative analysis. This is also due to the fact that theprocesses of data collection and transformation can be an obstaclefor scientists with other disciplinary backgrounds. We introduce aneasily accessible integrated framework that combines data acquisi-tion, transformation and visualization, aiming to open the field ofdata-driven research to a broader spectrum of researchers.
PREVIOUS WORK
There exists a wide range of tools for collecting Twitter data.
DMI-TCAT [10] provides an extensive suite for streaming real-time tweets,but there is no function to search tweets from the past. twarc [11]is a command-line-interface for streaming and searching tweets.Neither of the two tools include the generation of interactive net-works.
Gephi [12] is a powerful software suite for network analysis.There exists a plug-in for the collection of Twitter data [13] whereusers can filter real-time tweets and visualize them in the
Gephi interface. In contrast to these solutions the twitter explorer providesan open framework that combines data collection, transformationand visualization and allows users to explore the collected Twittercorpus interactively.
This project has received funding from the European Union’s Horizon 2020 researchand innovation programme under grant agreement No 732942. The authors would liketo thank Vasco Asturiano for the fast replies to questions about his force-graph library.
ARCHITECTURE
The twitter explorer consists of three components:◦ The collector , a Streamlit-powered [14] application provides agraphical user interface for the Twitter Search API and saves thecollected data for further processing.◦ The visualizer , a Streamlit-powered application provides a graphi-cal user interface for the generation of interaction networks andsemantic networks based on the collected data and saves the in-teractive networks.◦ The explorer interface allows users to interact with the networksand explore the underlying metadata of nodes and links.Each of these components is conceived in a modular way whichfacilitates adding new features to the twitter explorer (see Fig. 1).
DATA ACQUISITION: THE COLLECTOR
In the collector, the user interacts with the Twitter Search API [15],giving access to a limited set of tweets from the last 7 days.
Authentication
Since 2018, users need to apply for a Twitter Developer Accountin order to access the API [16]. Since the collector makes directAPI calls, this step is necessary for its usage. There are developeraccounts specific to academic research [17]. The user can then createapp tokens which will allow the twitter explorer to connect to theAPI via Application-only authentication (OAuth 2.0) [18].
Collection
The collection of tweets is initiated by a keyword string, followingthe rules of a
Twitter Advanced Search . The free API comes withlimitations: users can only make a limited amount of requests per15mins [19]. The tweets are continuously stored until all possibletweets that the Search API provides are collected. The corpus isthen ready to be passed on to the next interface.
DATA TRANSFORMATION: THE VISUALIZER
The visualizer creates interactive network visualizations from thecollected corpus. There one can distinguish between interactionnetworks (with users as nodes) and semantic networks (with wordsor concepts as nodes). The twitter explorer currently supports thecreation of retweet networks as interaction networks and hashtag co-occurrence networks as semantic networks. Several data aggregationmethods allow for exploration of the network at different scales ofcomplexity. a r X i v : . [ c s . S I] M a r • Pournaki et al. Fig. 1.
The twitter explorer framework.
The collector (left), after having set up the credentials, allows for connection to the Twitter Search API and savesthe collected tweets in jsonl format. They are then passed on to the visualiser (middle), where the user can get an overview of the content and then create theretweet- and hashtag networks. The interactive networks are generated as html files that can be explored in the web browser. The modular structure of thethree components facilitates the development of new features, which are suggested by the light grey boxes.
Twitter timeline
The data is presented as a timeline, where tweet counts are plottedover time. The user can get a feeling of the overall salience of thechosen keyword and possible peaks can hint towards special events.The timeline is generated as an interactive plot.
Interaction networks
There are several ways of interaction on Twitter: retweets, mentions,replies, following, likes, quotes and direct messages. Not all of themare accessible through the API. We focus on retweet interactionwhich can be represented as a directed network in which nodesare users and a link is drawn from node i to j if i retweets j . The twitter explorer’s visualizer provides an interface for creating retweetnetworks which includes the following features: Community detection.
In order to find strongly connected clustersof a network, it has become common practice to employ commu-nity detection algorithms. The twitter explorer currently supportsLouvain [20] and InfoMap [21] algorithms.
Force-directed layout.
The underlying visualization library [22] em-ploys a force-directed layout in which nodes that retweet each othermore often are placed closer to each other [23].
Aggregation methods.
One challenge for understanding and visu-alizing complex interaction networks is to find useful aggregationmethods necessary to observe the underlying discursive mecha-nisms at different levels of granularity. We therefore propose severalmethods of node aggregation: (1) removing nodes that only retweet one source and don’t generate any content, (2) removing nodes thatwere retweeted less than x times and (3) reducing the network toan interaction network of communities (cluster graph). Privacy.
Remove all accessible metadata of users that have less than5000 followers (no public figures) from the interactive visualizationin order to comply with current privacy standards. The nodes arevisible and their links are taken into account, but they cannot bepersonally identified in the interface.
Export options.
Export the networks to common formats like edge-list, GML or GraphViz. The framework is therefore compatible witha wide range of existing tools for network analysis [12, 24, 25].
Semantic networks
While retweet networks allow to identify the main proponents of adebate and their interaction patterns, looking at the most retweetedtweets might not be sufficient to get an impression of the contentstructure of the debate. In order to explore the textual content of thedata, we propose hashtag co-occurrence networks . Here, every node isa hashtag, and links are drawn between nodes if they appear in thesame tweet. By again laying out the network with a force-directedalgorithm, the hashtag network gives an overview of the debate’svocabulary and can reveal the different subtopics within a debate.
NETWORK EXPLORATION INTERFACE
The twitter explorer offers an intuitive exploration interface (seeFig. 2). A modular command palette allows for user interaction andprovides insight into the underlying metadata of the network: he twitter explorer • 3 Fig. 2.
The network exploration interface.
The modular command palette (left) can (1) show information about the underlying data, (2) modify thevisualization, (3) display network measures and (4) search for and show information about specific users and the content they generated in the dataset. Nodesare colored according to their community. They can be interacted with by clicking or hovering to display the username and relevant metadata in the palette.
Network information.
Accesses generic information about the net-work (keywords used to collect the data, date of collection, first/lasttweet of the dataset).
Visualization options.
Supports different node colorings accordingto their community assignment. The node size can be dynami-cally changed according to their respective metadata values (in/out-degree, number of followers, number of followed accounts). Thisfacilitates for instance the detection of news outlets.
Network measures.
Shows the number of nodes and links in thenetwork. This set will be extended to include a wider range ofnetwork indicators in future releases.
User information.
Search users in the given network and find themby zooming or flashing their color. Display the user’s relevant meta-data (number of followers, number of followed accounts, numberof retweets, number of times retweeted), their tweets in the datasetas well as their current timeline. Note that the interface will onlydisplay tweets that are still online at the time of exploration. Bydoing so, it complies with the Twitter display requirements [26].
INTEGRATION WITH OTHER METHODS
The twitter explorer can be regarded an all-in-one-solution for theexploration of Twitter networks, for which it is easy to develop newmodules within the existing components (see Fig. 1). An examplewould be to include additional community detection algorithmsor new node aggregation methods. At the same time, its modularstructure (division into collector / visualizer / explorer ) and the abilityto export the generated data makes the tool compatible with avariety of other data analysis tools (see Fig. 3). Therefore, scientistscan use the twitter explorer in combination with existing tools fromdata and network science. FUTURE DEVELOPMENT
The twitter explorer is currently in an open beta stage on GitHub.Future work will include the dynamical nature of retweet inter-action in the visualization paradigms. In order to disseminate theframework and attract new audiences to the field of data-drivenresearch, vignettes (use-cases) will be designed to showcase the twitter explorer’s use in social science research. Furthermore, it is • Pournaki et al.
Fig. 3.
The twitter explorer in context.
Its modular structure makes it easy to develop new features for the twitter explorer , but it also allows it to be usedin combination with existing data analysis and network science tools. The dotted arrows depict export paths allowing users to integrate the (transformed) datafrom the twitter explorer into their desired data analysis environment. planned to add the possibility of exploring recently developed mea-sures such as graph curvatures which can provide new insights tothe analysis of social networks [27].
AVAILABILITY
The twitter explorer can be tested at https://twitterexplorer.org. Thesource code is available on GitHub, where the current release canbe downloaded [28]. The tool is is licensed under the GNU GPLv3license [29].
TECHNICAL DETAILS
The twitter explorer is written partly in Python (data collection andtransformation) and JavaScript (interactive network visualization).The frontend for the data collector and the visualizer is made with
Streamlit [14], a Python library for the creation and deploymentof data-analytic tools. The Twitter objects are stored in the jsonlines format [30]. The network operations and community detectionrelies on the Python implementation of igraph [25]. The interactivenetworks are drawn using
D3.js [31], more specifically the force-graph library [22] by Vasco Asturiano.
AUTHOR CONTRIBUTIONS
The idea for the twitter explorer originated from fruitful discussionsin the context of the Odycceus project between Armin Pournaki,Felix Gaisbauer, Sven Banisch and Eckehard Olbrich. The tool isdesigned and developed by Armin Pournaki. All authors wrote themanuscript.
REFERENCES [1] M. D. Conover, B. Goncalves, J. Ratkiewicz, A. Flammini, and F. Menczer. Pre-dicting the political alignment of twitter users. In , pages 192–199, Oct 2011.[2] Noé Gaumont, Maziyar Panahi, and David Chavalarias. Reconstruction of thesocio-semantic dynamics of political activist twitter networks—method and appli-cation to the 2017 french presidential election.
PLOS ONE , 13(9):1–38, 09 2018.[3] Georgios Paltoglou and Mike Thelwall. Sensing social media: A range of ap-proaches for sentiment analysis. In
Cyberemotions , pages 97–117. Springer, 2017.[4] Tanya Nitins and Jean Burgess. Twitter, brands, and user engagement. In A Bruns,M Mahrt, K Weller, J Burgess, and C Puschmann, editors,
Twitter and society[Digital Formations, Volume 89]: , pages 293–304. Peter Lang Publishing, UnitedStates of America, 2014.[5] Axel Bruns and Jean Burgess. Crisis communication in natural disasters: Thequeensland floods and christchurch earthquakes. In A Bruns, M Mahrt, K Weller,J Burgess, and C Puschmann, editors,
Twitter and society [Digital Formations,Volume 89]: , pages 373–384. Peter Lang Publishing, United States of America,2014.[6] Lee Rainie. The six types of twitter conversations.
Pew Research Center , 20, 2014.[7] Michael D Conover, Jacob Ratkiewicz, Matthew Francisco, Bruno Gonçalves,Filippo Menczer, and Alessandro Flammini. Political polarization on twitter. In
Fifth international AAAI conference on weblogs and social media , 2011.[8] Seth A Myers, Aneesh Sharma, Pankaj Gupta, and Jimmy Lin. Information networkor social network? the structure of the twitter follow graph. In
Proceedings of the23rd International Conference on World Wide Web , pages 493–498, 2014.[9] Jean Burgess and Ariadna Matamoros-Fernández. Mapping sociocultural con-troversies across digital media platforms: One week of
Communication Research and Practice , 2(1):79–96, 2016.[10] Erik Borra and Bernhard Rieder. Programmed method: developing a toolsetfor capturing and analyzing tweets.
Aslib Journal of Information Management ,66(3):262–278, May 2014.[11] DocNow. twarc. https://github.com/DocNow/twarc, 2020. [Online; accessed06-March-2020].[12] Mathieu Bastian, Sebastien Heymann, and Mathieu Jacomy. Gephi: An opensource software for exploring and manipulating networks. 2009.[13] Clément Levallois and Matthieu Totet. Twitter Streaming Importer.https://seinecle.github.io/gephi-tutorials/generated-html/twitter-streaming- he twitter explorer • 5 Journal of statistical mechanics:theory and experiment , 2008(10):P10008, 2008.[21] Martin Rosvall and Carl T Bergstrom. Maps of information flow reveal communitystructure in complex networks. arXiv preprint physics.soc-ph/0707.0609 , 2007.[22] Vasco Asturiano. force-graph. https://github.com/vasturiano/force-graph, 2018.[Online; accessed 06-March-2020].[23] Andreas Noack. Modularity clustering is force-directed layout.
Physical Review E ,79(2):026102, 02 2009.[24] Tiago P. Peixoto. The graph-tool python library. figshare , 2014.[25] Gabor Csardi and Tamas Nepusz. The igraph software package for complexnetwork research.