“But tell me how you really feel?”
What does that even mean most of the time? Most of the time this is fodder for an interpersonal argument based in non-confrontational bottling up of emotions for months on end…but I digress.
For this project I conducted as part of the Flatiron bootcamp, we are aiming to better understand what determines what emotional sentiments that a given tweet contains. We are specifically interested in consumer tweets that are discussing tech products made by businesses.
My findings were based on creating a multi-classifier with 3 categories: positive sentiment, negative sentiment, and neutral sentiment.
Promoting launch events seems to drive a significant proportion of product-related tweets — words such as store, today, downtown, appear in the word cloud of most commonly used words in our dataset of product-related tweets.
Using social media giveaways to encourage re-tweets or links to your product, also seems to drive a significant amount of product-related tweets. This may seem like digital marketing common sense, though it’s reassuring to know that there’s data to back this up; doubly so considering giveaway-related words [ie mention, link, share] are in the top 10 common words in our tweet-dataset, not just the 50th percentile. If you’ve tried a giveaway before as one of your digital marketing strategies and not received enough engagement, it’s possible that some other factor such as your number of followers, or demographic of followers itself, is worth looking into.
An aim for future work is to focus PR resources on analyzing negative-sentiment tweets and from there deciding which types of negative tweets are worth addressing — as most tech-product tweets in our dataset were positive. We all are familiar with less-than-valid complaints, and creating an algorithm to detect and filter out the virtual versions of “Karens” tweeting, would be very useful in increasing the efficiency of PR teams.
As for the unbalanced data that was previously mentioned, once I added in a “neutral” category in addition to the positive and negative sentiments, there were far more neutral sentiment tweets than any other category. Positive tweets came in second, and negative tweets were very far behind both other categories.
We declare our positive tweets as the default:
def label(emotion_in):emotion_out = 2if emotion_in == 'Positive emotion':emotion_out = 0elif emotion_in == 'Negative emotion':emotion_out = 1return emotion_out
We make a new column that quantified the type of sentiment:
df2['emotionquant'] = df2['emotion'].apply(lambda x: label(x))df2.head()
Later on we use SMOTE to adjust for the class imbalances.
#Import SMOTE and print new distributionfrom imblearn.over_sampling import SMOTEprint('Original class distribution: \n')print(pd.Series(y).value_counts())smote = SMOTE()tf_idf_train_resampled, y_train_resampled = smote.fit_sample(tf_idf_train, y_train)# Preview synthetic sample class distributionprint('-----------------------------------------')print('Synthetic sample class distribution: \n')print(pd.Series(y_train_resampled).value_counts())**********Output:Original class distribution:2 53750 29701 569Name: emotionquant, dtype: int64-----------------------------------------Synthetic sample class distribution:2 40191 40190 4019Name: emotionquant, dtype: int64
We then crafted and revised a model using Naive Bayes; our aim was to optimize for recall for class 0 [the positive tweets]. This means that we wanted to reduce false negatives [aka the case that we believe a tweet is positive, but it is not]; focusing our energy on any possibly negative or neutral tweet is more important from a PR standpoint than those that already have positive sentiment in them.
As you can see, our recall for class 0 was .67, which was an improvement from several of our prior models, though requires further work to fine tune. The model is also overfit, though significantly less overfit than our alternative models which approached .99 training accuracy.
A goal for additional feature-engineering is to calculate a new model based on a list of specific, important words that have strong sentiment one way or another [ie tweets with “great”, “awful”, or “terrible” in them].