Abstract:
Microblogging platforms are widely used to share information, feelings, and ideas about anything. With nearly 320 million users (as of April 2017) Twitter is one of the most popular microblogging platforms making it a lucrative platform for propagating (mis)information through organized activities. Such cases have been observed during election campaigns (2016 United States), disasters (2010 Haiti earthquake), and resis tance movements (2011 Occupy Wall Street, 2011 Arab Spring). As a result of this, there exists an increased use of social media to recruit people to illegal organizations and to spread fake news. Recruited users utilize various Twitter entities like hashtags, mentions, URLs to organize and coordinate their efforts towards a specific goal. Be sides from recruited users, fake accounts and bots are also frequently used in Twitter. In such cases, users can be manipulated, since users assume that tweets are posted with the free will of individuals without intent of collusion. This thesis proposes a supervised classification model for distinguishing tweet sets that are “organized” and “organic”. A prototype implementation of this model is implemented and experiments with a large tweet sets are conducted. During study, nu merous features associated with tweets and posting behavior were examined to identify those that are appropriate for training the model. Analyzed tweets were collected by querying hashtags, since hashtags serve to group tweets. The training data set, which has a size of 1000 records with 299 features, is used as a result of analyzing more than 200 million tweets. Among the applied supervised learning algorithms, Random Forest gave the best results in all data sets with f-measure and accuracy of 0.98.