Cambridge dictionary defines sentiment is as a thought, opinion, or idea based on a feeling about a situation, or a way of thinking about something. Some examples are: Nationalist sentiment has increased in the area since the bombing; The area has become a hotbed of anti-government sentiment; The past few weeks have witnessed an outpouring of patriotic sentiment; Analysts and investors said market sentiment, for the time being, appears positive; and the Business sentiment is showing signs of recovery.
Sentiment can be highly subjective. As humans we use tone, context, and language to convey meaning. How we understand that meaning depends on our own experiences and unconscious biases.
Sentiment Analysis (SA) or Opinion Mining (OM) is the computational study of people’s opinions, attitudes and emotions toward an entity. The entity can represent individuals, events or topics. Sentiment Analysis can be considered a classification process. A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Sentiment scoring can be as fine-grained as required for a specific use case. Categories can expand beyond just “positive”, “neutral” and “negative”.
You can also refine the sentiment further into specific emotions. Advanced, “beyond polarity” sentiment classification looks, for instance, at emotional states such as enjoyment, anger, disgust, sadness, fear, and surprise. For example, positive sentiment can be further refined into happy, excited, impressed, trusting and so on.
Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.
The data sets used in SA are an important issue in this field. The main sources of data are from the product reviews. These reviews are important to the business holders as they can take business decisions according to the analysis results of users’ opinions about their products. The reviews sources are mainly review sites. SA is not only applied on product reviews but can also be applied on stock markets, news articles, or political debates
In political debates for example, we could figure out people’s opinions on a certain election candidates or political parties. The election results can also be predicted from political posts. The social network sites and micro-blogging sites are considered a very good source of information because people share and discuss their opinions about a certain topic freely. They are also used as data sources in the SA process.
Classification levels in SA
There are three main classification levels in SA: document-level, sentence-level, and aspect-level SA. Document-level SA aims to classify an opinion document as expressing a positive or negative opinion or sentiment. It considers the whole document a basic information unit (talking about one topic). Sentence-level SA aims to classify the sentiment expressed in each sentence. The first step is to identify whether the sentence is subjective or objective. If the sentence is subjective, Sentence-level SA will determine whether the sentence expresses positive or negative opinions. Wilson et al. have pointed out that sentiment expressions are not necessarily subjective in nature. However, there is no fundamental difference between document and sentence level classifications because sentences are just short documents. Aspect-level SA aims to classify the sentiment with respect to the specific aspects of entities.
Aspect-based sentiment analysis (ABSA)
Aspect-based sentiment analysis can be especially useful for real-time monitoring. Businesses can immediately identify issues that customers are reporting on social media or in reviews. This can help speed up response times and improve their customer experience.
Improving sales and retaining customers are core business goals. According to research by Apex Global Learning, every additional star in an online review leads to a 5-9% revenue bump. There’s an 18% difference in revenue between businesses rated as three-star and five-star ratings.
Sentiment analysis can identify how your customers feel about the features and benefits of your products. This can help uncover areas for improvement that you may not have been aware of.
Feature selection in sentiment classification
Sentiment Analysis task is considered a sentiment classification problem. The first step in the SC problem is to extract and select text features. Some of the current features are :
Terms presence and frequency: These features are individual words or word n-grams and their frequency counts. It either gives the words binary weighting (zero if the word appears, or one if otherwise) or uses term frequency weights to indicate the relative importance of features.
Parts of speech (POS): finding adjectives, as they are important indicators of opinions.
Opinion words and phrases: these are words commonly used to express opinions including good or bad, like or hate. On the other hand, some phrases express opinions without using opinion words. For example: cost me an arm and a leg.
Negations: the appearance of negative words may change the opinion orientation like not good is equivalent to bad.
Sentiment classification techniques
Sentiment Classification techniques can be roughly divided into machine learning approach, lexicon based approach and hybrid approach.
The Machine Learning Approach (ML) applies the famous ML algorithms and uses linguistic features. The text classification methods using ML approach can be roughly divided into supervised and unsupervised learning methods. The supervised methods make use of a large number of labeled training documents. The unsupervised methods are used when it is difficult to find these labeled training documents.
The Lexicon-based Approach relies on a sentiment lexicon, a collection of known and precompiled sentiment terms. For example, positive lexicons might include “fast”, “affordable”, and “user-friendly“. Negative lexicons could include “slow”, “pricey”, and “complicated”.
It is divided into dictionary-based approach and corpus-based approach which use statistical or semantic methods to find sentiment polarity. The dictionary-based approach which depends on finding opinion seed words, and then searches the dictionary of their synonyms and antonyms.
The corpus-based approach begins with a seed list of opinion words, and then finds other opinion words in a large corpus to help in finding opinion words with context specific orientations. This could be done by using statistical or semantic methods.
The hybrid Approach combines both approaches and is very common with sentiment lexicons playing a key role in the majority of methods.
Classification models commonly use Naive Bayes, Logistic Regression, Support Vector Machines, Linear Regression, and Deep Learning. Let’s explore these algorithms in a bit more detail.
Probabilistic classifiers use mixture models for classification. The mixture model assumes that each class is a component of the mixture.
Naive Bayes: this type of classification is based on Bayes’ Theorem. These are probabilistic algorithms meaning they calculate the probability of a label for a particular text. The text is then labelled with the highest probability label. “Naive” refers to the fundamental assumption that each feature is independent. Individual words make an independent and equal contribution to the overall outcome. This assumption can help this algorithm work well even where there is limited or mislabelled data.
Logistic Regression: a classification algorithm that predicts a binary outcome based on independent variables. It uses the sigmoid function which outputs a probability between 0 and 1. Words and phrases can be either classified as positive or negative. For example, “super slow processing speed” would be classified as 0 or negative.
Linear Regression: algorithm that predicts polarity (Y output) based on words and phrases (X input). The objective is to learn a linear model or line which can be used to predict sentiment (Y). Accuracy of the model can be improved by reducing the error.
Support Vector Machines: a model that plots labelled data as points in a multi-dimensional space. The hyperplane or decision boundary is a line which divides the data points. Anything to the left of the hyperplane would be classified as negative. And everything to the right would be classified as positive. The best hyperplane is one where the distance to the nearest data point of each tag is the largest. Support vectors are those data points which are closer to the hyperplane. They influence its position and orientation. These are the points which help to build the support vector machine.
Text data are ideally suited for SVM classification because of the sparse nature of text, in which few features are irrelevant, but they tend to be correlated with one another and generally organized into linearly separable categories
Deep Learning: here, an artificial neural network performs multiple layers of processing. Deep learning is a diverse set of algorithms that imitate human brain learning through associations and abstractions. Deep learning has significant advantages over traditional classification algorithms. These neural networks can understand context, and even the mood of the writer.
Software and Algorithms
A sentiment analysis tool is software that analyzes text conversations and evaluates the tone, intent, and emotion behind each message. By digging deeper into these elements, the tool uncovers more context from your conversations and helps your customer service team accurately analyze feedback. This is particularly useful for brands that actively engage with their customers using text-based communication like social media, live chat, and email where it can be difficult to determine the sentiment behind a message.
Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Automated sentiment analysis relies on machine learning (ML) techniques. In this case a ML algorithm is trained to classify sentiment based on both the words and their order. The success of this approach depends on the quality of the training data set and the algorithm.
Sentiment analysis helps brands learn more about customer perception using qualitative feedback. By leveraging an automated system to analyze text-based conversations, businesses can discover how customers genuinely feel about their products, services, marketing campaigns, and more.
References and Resources also include: