Part 2 – Synergistic Sentiment Analysis:
The Space Between the Lines
Welcome back! Sit down and buckle up for a magical tour of the text mining technology focusing on Sentiment Analysis (SA).
Well, first things first. What is Sentiment Analysis anyway? Rephrasing the Wikipedia definition, Sentiment analysis (sometimes called opinion mining) refers to an area of Natural Language Processing (NLP), which aims to determine the attitude of a writer with respect to some topic. This attitude may be their judgment or evaluation, their emotional state when writing or the intended emotional communication the author wishes to convey.
To keep our discussion as concrete as possible we’ll use real life examples to elucidate the different types of attitudes. Consider the following example:
This year was a setup year for B&N, and 2010 will see its efforts start to pay off [...] In 2010, B&N will rack up significant sales of Nooks and e-books, as some consumers look for an Amazon alternative.
Obviously this excerpt contains an explicit positive evaluation for Barnes and Noble for 2010, but moreover the tone is upbeat, optimistic, and even excited. A good Sentiment Analysis would pick up on this tone and report a highly positive sentiment for B&N and their e-reader Nook, whereas a negative or at least an apprehensive sentiment should be reported for Amazon.
The next example is even more blatant:
Belated Happy New Year and already what a year it’s turning out to be for eReaders! [...] Time’s a fave around here these days, especially considering its December report naming nook one of the Best Travel Gadgets of 2009 as well as rating the device # 2 among the Top Ten Gadgets of the year. While emphasizing nook’s “classy book-lending feature”, the magazine also cited “the powerful, flexible Android operating system that the whole package runs on.”
The exclamation mark, the rhythm, the tone, the profuse use of superlatives and positive adjectives all indicate an extremely positive sentiment for the nook product. It is clear that the author has a favorable opinion of the product, and moreover that he is quite eager to share his enthusiasm with the readers.
Obviously these are not the only attitudes that can be found on the web. Other attitudes may include anticipation, sarcasm, doubt, apprehension, cynicism and even condemnation. It’s our nature to focus on the good, so we’ll spare you examples of the negative attitudes (well, I guess it’s also that we prefer to avoid any unnecessary lawsuits
) but the basic idea of what is meant by an underlying attitude should be clear by now.
It’s important to keep in mind that Sentiment Analysis is not severed from the basic meaning of the sentence. Rather, SA picks up on the basic meaning and further capitalizes on the cadence, the tone, the choice of words, and even the absence thereof, to build a complete picture of the message being conveyed. Note that we’ve implicitly drawn a line between some sort of “basic meaning” of a sentence, and the “ultimate intention” of the message to be conveyed. Let’s try and be a bit more precise and explicit about this distinction.
Formal linguistic theory usually recognizes 3 levels of abstraction for natural language comprehension: Syntax, Semantics and Pragmatics (we are excluding phonology, phonetics and morphology which are irrelevant here). Simply stated, Syntax is the study of the grammatical structure of sentences, Semantics deals with how words are interpreted and how their interpretation is combined to yield the meaning of the sentence, and Pragmatics is the study of how extra-linguistic, real-world knowledge, so to speak, interacts with the basic meaning of sentences to yield the ultimate message conveyed.
So for example, syntactic theories may attempt to explain why the English sentence “I gave that to you” is fine whereas, “You gave that to I” is ungrammatical. Semantic theories may attempt to explain what the meaning of a word such as “tall” is, and how this meaning can be reconciled with seemingly problematic examples such as “I am tall” vs. “The midget is only 4 feet tall”. Pragmatics, goes one step further and attempts to explain how our knowledge of the world, circumstances, etc. play with and alter the meaning of the conveyed message. So for example, although strictly speaking the sentence: “I have 3 children”, does not formally preclude the possibility that I have more than 3, say 5 children, it would generally be considered wrong, or at least odd, for someone who indeed has 5 children to utter the original sentence: “I have 3 children”. To see how this judgment may change with circumstances, imagine Mr. Jones is being interviewed by the IRS, when he is notified by the interviewer that tax benefits are available to anyone with 3 or more children. Under these circumstance, we would probably no longer consider it odd for Mr. Jones to say “I have 3 children”, even if in fact he had 10 children.
So where does Sentiment Analysis fit in this 3-headed theoretical framework? If you’re guessing the answer lies somewhere between semantics and pragmatics, perhaps with a bit of a syntactic-twist, you’re following this introduction just fine. (If, on the other hand, you thought it was limited to the syntax, you may want to go brew yourself a fresh cup of coffee before you reread the last few paragraphs
).
Mirroring the theoretical image portrayed above, Natural Language Processing algorithms consist of syntactic algorithms (most notably Part Of Speech (POS) parsers and taggers), semantic algorithms (e.g. semantic rulebooks and relation extraction algorithms) and finally pragmatic algorithms (including for example, contextual disambiguating algorithms, and world knowledge look-up algorithms, used in automated translators for instance). At Digital Trowel we’ve honed our Sentiment Analysis algorithms to combine the strengths of these 3 disciplines to produce the most reliable and comprehensive understanding of the message being conveyed, reading not only the text itself, but also between the lines, so to speak.
The mathematical implementation of these algorithms is beyond the scope of this introduction, but this by no means should prevent us from taking advantage of the knowledge we’ve gained thus far to see how Sentiment Analysis techniques may harness the power of the different types of linguistic algorithms in an attempt to achieve their goal. In fact the lion’s share of the third part of this survey aims to do just that. For now, suffice it to say that one of the main reasons we at DT believe that our technology is superior has to do with our synergistic approach of integrating syntactic, semantic and pragmatic algorithms. This is why we call it Synergistic Sentiment Analysis (SSA). BTW, for those of you wondering, synergy is the term used to describe a situation where different entities cooperate advantageously for a final outcome (tx, Wikipedia!). There, you now understand yet another word in the titles above
Before ending this part, let’s focus on the goals of SA, or in other words, what SA is good for. Well, in one sentence, as we already phrased it:
Extracting and discerning the underlying sentiment allows us to transform otherwise inert texts into vibrant business opportunities.
But how does this come about? I think the best way to explain is by using an example:
Every day, millions of business news articles are published on the web. Many of these articles contain both facts as well as judgments, predictions, and just plain old sentiment. Obviously, it is impossible for any one human (or even a team of a hundred people) to read all these articles, sieve and sort through them, extract the facts and discern the sentiment, let alone do this all in real time to facilitate decision-making. This is where our SA engine comes in.
In a few seconds, our Sentiment Analysis engine can run through thousands and thousands of articles, sorting them for industry, company, product, etc., extracting key facts and events, and discerning the underlying sentiment. Take the stock market for example. Within less than 10 seconds, our SA engine can scan every article mentioning any NYSE company for example, published within a specified time range. Not only are key facts and events compiled into our database, but a sentiment score is calculated and generated for each ticker, yielding a real time numeric indication of the stock’s vibe for each company on the market! Numeric scores can be translated into an array of decision making procedures, and help with consolidating trading strategies. Now if that isn’t a great business idea, I don’t know what would constitute one!
There are many other business opportunities for the SA technology, including some of which we’ve already implemented at DT such as evaluating pharmaceutical forums for client’s sentiment about drugs, as well as sports product satisfaction, but I think this is enough hype for now
The third and final part of this introduction to the field of SA, goes a bit deeper into the SA engine itself, and examines the innovative technology unique to Digital Trowel using real examples… Stay tuned, this is where things get really exciting