Previously we looked at what sentiment analysis is and two of the general challenges it poses. Today, we’ll understand one final general challenge and start to look at more specific ones – something which is fundamental to helping you choose the right sentiment analysis solution.
3. Lack of nuance
Typically, sentiment is categorised in positive, negative, and neutral. In the standard model of sentiment analysis, the output for a whole text is just a single number between -1 (negative) and 1 (positive), where 0 means neutral or no sentiment. This view lacks nuance. Understanding the difference between “I don’t like” and “I hate” is important, but in the simple model, both would get the same score of -1.
Precise measurement of sentiment nuances is hard, particularly because human annotators don’t always agree with one another. Fine-grained models with 5 or 7 nuances (e.g., very negative, negative, neutral, positive, very positive) give more opportunities for subjective differences between annotators than models with just 3 nuances. Understanding the sentiment scale and the accuracy of measurement is critical information to judge whether a difference in sentiment is significant or not.
In addition to the number of nuances, a distinction must be made between “neutral” and “no sentiment”. The sentence “The product is ok” has an emotional judgement, while the sentence “I bought the product” does not. A basic sentiment model is forced to return a numeric rating returns a 0 rating on both sentences. A more sophisticated model would abstain from doing this.
Overall sentiment, calculated as the average of sentiment expressions on a product, brand or concept, is an important metric. But it does not offer the depth needed to make improvements. Knowing that some people dislike a product is not actionable. Brands need to know which aspect they dislike, e.g., the colour of a lipstick, its price, or its packaging.
This means that sentiment should always be analysed in connection with its cause - usually a task that needs a closer look at the material.
Aspect-based sentiment analysis is an approach that can help here, as it extends the opinion-holder-target model by specifying which aspect of the target the opinion-holder is discussing. The model becomes: opinion holder à sentiment expression à aspect à target (Liu 2020). There are various implementations, e.g.: PyABSA (Yang 2020) as an open-source implementation in Python, Google Cloud’s Entity sentiment analysis as a fully managed solution and the ML platform MonkeyLearn offers a no-code solution to train one’s own model (Pascual 2019).
That concludes the general challenges of sentiment analysis. So now, let’s look at the more specific challenges.
4. Not understanding negation
Any sentiment analysis model must be able to understand negation (where words contradict one another or the same words have conflicting meanings). A naive dictionary approach that counts the number of occurrences of positive and negative words from a dictionary fails this test. It is unable to distinguish between “I like this” and “I don’t like this”.
The negation challenge has largely been solved through both rules-based approaches (Hutto and Gilbert 2014) and deep learning approaches. Any modern sentiment analysis software should handle negation well. A model that cannot clear this bar is bound to make many errors. However, not all commercially available analysis products fulfill this requirement. Even when a vendor claims their model handles negation, it is worth testing if the model keeps the promises.
A more advanced form of negation is sarcasm, where authors write something that is the opposite of what they think. This – just as irony – is extremely hard to detect for any algorithm and can be challenging for human readers too.
5. Contextual word meaning
The meaning of a word depends on the context it is used in. As an example, the company names “Apple” and “Target” are words that can either be used in their regular sense or to refer to the companies. The reader must figure out from the context of the surrounding words which meaning applies. Or think about words which may have a positive or negative sentiment content depending on the situation: being “unpredictable” is good for a crime novel but bad for an alarm clock. But while this is an easy task for humans, it is a challenging task for a computer.
Another tough problem: Sentiment can be expressed without using explicit judgmental words, for example, by pointing out features missing, qualities being absent, negative aspects being present, e.g., “the device wouldn’t turn on”, or “we needed 3 hours to understand the instructions”.
The transformer architecture developed by Vaswani et al. (2017) rises to these challenges. It uses a mechanism called “attention”, which lets it interpret each word in the context of the surrounding words. Transformers brought a revolution of accuracy in natural language processing. Sentiment analysis models that use older architectures are not competitive, as demonstrated in a benchmark by Mishev et al. (2020). Their benchmark focused on financial texts, but later studies in other domains proved the general superiority of transformers.
6. Out of vocabulary words
Most models are equipped and trained with a pre-defined vocabulary of words that they understand. That can take the form of a sentiment dictionary or word vectors.
Social media text frequently contains novel words, such as new hashtags or words developing as new vocabulary. But also slang or ambiguous wording is frequent. These words can and often do contain critical sentiment information. A new hashtag like #XZYBrandFail or an expression like “this guy is badass” should be detected, but if it is not part of the dictionary, it would not be understood by a model.
Another type of new text element is emojis. They often convey sentiment, so they need to be part of a sentiment analysis model. However, their interpretation may vary quite a bit depending on context, culture and even individuality.
Challenges 5 and 6 also imply that the way sentiment is expressed varies according to the communities that post. Sentiment in a community of beer drinkers is expressed very differently than in a community of IT pros or patients discussing their special health problems in a forum. This favours specialised models that understand word meanings in the context of a specific community. Such a model can be built by fine-tuning a more general model.
In part 3, we’ll help you become aware of the dangers posed by aggregation, annotation and coder reliability.