Enhanced Dataset of Citizen Centric Complaints and Grievances on Twitter
datasetposted on 18.07.2019 by Swati Agarwal, Nitish Mittal, Ashish Sureka
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
The dataset "Complaints_Reports_Data.sql" contains the public complaint tweets posted on 4 public service accounts of Indian Government (@RailMinIndia, @IncomeTaxIndia, @DelhiPolice and @dtpTraffic). Complaints_Reports_Data.sql file contains the records of raw tweets, users, hashtags, user mentions and other contextual metadata of tweets and bloggers. In this dataset, we also share a sample of tweets pre-processed in 3 steps ("pre1", "pre3" and "pre4")- hashtag expansion, spell error correction and internet & slang expansion. Metadata of each table is given below: Table 1: Annotated: tweet_ID, text, class (complaint or unknown) Table 2: Hashtags: tweet_ID, hashtag Table 3: Posts: tweet_ID, text, url_count, image_count, video_count, user_id, timestamp, organization (Indian Govt account), language, latitude, longitude, replied_to_tweet_id, replied_to_user_id, retweet Table 4, 5, 6: Pre1, Pre3, Pre4: tweet_ID, text, organization Table 7: User_Mentions: tweet_ID, user_ID Table 8: Users: user_ID, screen_name, name, verified?, location, created_at