15 Tips for Using Data Science to Research Twitter Posts & Comments
15 Tips for Using Data Science to Research Twitter Posts & Comments
The fact that almost every user’s tweets are completely public makes Twitter a gold mine of data according to runrex.com. Twitter posts and comments can be a rich source of useful insights, and has, therefore, become a crucial source of information for organizations and brands. However, extracting these insights still requires some skill and knowledge with data science, and this article will look to highlight 15 tips for using data science to research Twitter posts and comments.
- Apply for a Twitter developer account
To get started, you will first need to get a Twitter API, and according to discussions over at guttulus.com, to use Twitter’s API you will need to create a developer account on the Twitter apps site. This will require you to fill in the application form provided, which will involve you explaining what it is you wish to analyze. You will then receive the credentials needed once your application has been accepted by Twitter.
- Consider installing Tweepy
As explained over at runrex.com, Tweepy is an excellent tool for accessing the Twitter API as it supports Python 2.6, 2.7, 3.3, 3.4, 3.5, and 3.6. Therefore, if you will be using Python to conduct your research, then you will need to install this excellent tool.
- Install the “rtweet” package for R
If you will be researching and analyzing Twitter posts and comments with the Programming software R, then the subject matter experts over at guttulus.com recommend that you download the package “rtweet” which you will then use to extract the tweets.
- Set up authentication
Whether you will be using R or Python, you will need to set up the authentication to connect to Twitter. As outlined over at runrex.com, this is achieved by entering the name of your app and all the information you have received when applying for the Twitter API. Once you do so, you will be re-directed to a Twitter page and asked to accept the authentication, and once this is done you will be good to go.
- Other tools for accessing Twitter API
Software libraries like Tweepy for Python and “rtweet” for R aren’t the only tools you can use to access the Twitter API. This is because other tools also exist and they include command-line tools like Twarc, Plugins for popular analytic packages like NVIVO and NodeXL, and web applications such as DMI-TCAT.
- Be aware of the limitations of using the Twitter public API
All the tools covered in the tips above use Twitter’s public API, which is why you must know the limitations it has if you choose any of them to conduct your research of Twitter posts and comments. These limitations, as covered over at guttulus.com, include the fact that access to historical tweets is extremely limited, access to current tweets is also limited, and also, Twitter may sample or even not provide a complete set of tweets in your searches.
- You can find an existing Twitter dataset
If you want to overcome the limitations of Twitter’s public API, a solution according to the experts over at runrex.com is finding a dataset that has already been collected and which satisfies your research project’s requirements.
- Tools to use for “hydrating”
When looking to use an existing dataset for your research, you will need to conduct what is referred to as “hydrating” which is the retrieval of complete tweets from the Twitter API based on the tweet id as explained over at guttulus.com. An example of a tool that will help you with this is DocNow’s Hydrator. You can also use TweetSets, which is a web application that will enable you to create your own dataset.
- Limitations of using an existing Twitter dataset
Just as is the case of using Twitter’s public API, using an existing Twitter dataset to research Twitter posts and comments also has its limitations. As is covered over at runrex.com, you will first be limited by Twitter’s Developer Policy which places a limit on the sharing of datasets. You will also not be able to access any tweets that have been deleted to become protected.
- Purchasing historical Twitter data
Another option available to you, other than using Twitter’s API or existing Twitter datasets, is purchasing historical Twitter data directly from Twitter. This will provide you with all the data you are looking for and allow you to glean useful insights from the same.
- Tools to use
While historical Twitter data was previously available from Gnip, a data provider purchased by Twitter, it has since folded into Twitter and if you want to purchase historical data you will need to do so through the Historical PowerTrack enterprise product according to guttulus.com.
- Advantages of using the Historical PowerTrack over the public Twitter API
Another tip worth pointing out is that choosing to purchase historical Twitter data from Twitter using the Historical PowerTrack enterprise product comes with a couple of advantages in that it both allows you to access additional filter operators as explained over at runrex.com, as well as tweet enhancements such as un-shortened URLs and profile locations.
- Limitations of purchasing historical Twitter data
It is also important that you are aware of the drawbacks of going down this route as compared to the other two. By far the biggest drawback of purchasing historical Twitter data has to do with the costs involved. According to the gurus over at guttulus.com, this can be quite a costly venture, which is why only large organizations with the financial muscle opt for this option.
- Purchase from a Twitter service provider
You can also decide to get your data with the help of a Twitter service provider, who usually offers such data for a fee. As covered over at runrex.com, Twitter data options available from service providers usually involve data from the public Twitter APIs, data from the enterprise Twitter APIs, and datasets built by querying against an existing set of historical tweets.
- Limitations of Twitter service providers
Twitter service providers also come with several limitations just like the other options. For example, you need to establish if you can export your dataset from the service provider’s platform as this may affect your research. Also, Twitter service providers are generally “black box”, which means that if they perform a certain task for you, you may not be able to know which algorithm they used, which is another thing to consider.
The above are some of the tips to keep in mind when looking to use data science to research Twitter posts and comments, and you can uncover more insights by checking out the excellent runrex.com and guttulus.com.