Emergence of automated sentiment analysis

The Web Ecology Project released a study last year (“Detecting Sadness in 140 Characters“) which took the outpouring of grief and commentary associated with Michael Jackson’s death as a controlled data set for assessing sentiment analysis tools and protocols. The study is a marvelous unpacking of what goes into text-based opinion mining, and the subtle variations in language which indicate wild variance in sentiment.

In August, the New York Times ran an article on the commercial application of online sentiment analysis (disclosure: article features both a personal friend, as well as a tool I use regularly). In it, the writer teases out the difference between a search engine’s primarily quantitative assessment of online data and the primarily qualitative assessment of sentiment values, and emphasizes the power given to consumer’s when they can search for “The best hotel in San Antonio” and get results ordered in terms of a web-wide opinion poll on the topic.

Web folk love to ask and answer the question of “what’s next?”, and it is often (with some paucity of imagination) framed in terms of versions. Recently I was asked what I believed constituted “web 2.0” and what would constitute “web 3.0.” I don’t recall my answer on that day, but I bet it was clumsy. Today, I found my current answer in this revision of the Wikipedia article on sentiment analysis (linked above):

If web 2.0 as all about democratizing publishing, then the next stage of the web may well be based on democratizing data mining of all that content that’s getting published.

The largest missing piece in mining web data is the vector of content sentiment. A truly successful implementation would take a substantive step toward the idealized “semantic web.” I hunch that development in this area will kick off renewed interest in the classical study of rhetoric (i.e. the persuasive power of langauge and argument), and be an incremental step toward effective artificial intelligence (AI).

If this appears all a bit too utopic, I also hunch that that We will be faced with an over-reliance on opinion, resulting in issues including:

  • A new strain of social concern regarding tyranny of short-term (and often short-sighted) passions and opinions (an issue interestingly un-packed in the documents surrounding the framing of the US Constitution, including the Federalist Papers and the Anti-Federalist Papers)
  • A relative freezing of the fluidity of language like what occurred to the English language when writing (and to a greater degree, printing) became commonplace [yes, this assertion needs citing, I’ll call it hearsay for now]. If we can reliably pre-assess how our published language’s sentiment will be broadly received, will we venture into experimental, emerging, and/or colloquial language as often?

At present discussion of this issue publicly lives in the domain of “the politician (over/under) reliant on polls.” But what happens when we all have an easily accessible opinion poll available at all times?


    5 thoughts on “Emergence of automated sentiment analysis

    1. Regarding "A relative freezing of the fluidity of language": I feel like we’re in a cultural explosion of willy-nilly, I-make-up-werds-&-tweet-them writing habits. This particular problem is indeed one of the most difficult aspects of producing machine-ranked sentiment on tweets!Very interesting post.

    2. Mars, thanks for stopping by and sharing your comment. I see the difficulty, to be sure. It seems like it is a question of degree. Invent only where, and everywhere, there is a need. [or something]In primarily oral cultures, language is relatively more malleable than it is in literate cultures [see need-for-citing comment above]. The spoken word is freer (yes, as in both "beer" and "speech") than the written, and a certain innovation and spontaneity enters over this lowered threshold. [A cliched example of high language malleability relative to now is the observation that there are around 1500 words/phrases in Shakespeare’s works that cannot be found in any earlier source, rising the prospects high that he invented high proportion of them. And he was the popular entertainment of his day. People were evidently pleased and able to roll with it, fo’ shizzle.]There is something about how available/accessible is publishing on the social web which makes the act of writing for it more like speaking than writing. And it seems a result is more linguistic innovation, giving us your aforementioned willy-nilly.I dunno whether this innovation is valuable or not. To me a broader vocabulary which adds real usefulness to our human tool set for saying what we mean appears positive. Of course, to know if this is the case, one would want to look at both what is being added, and what is being replaced.

    3. Consider how search has impacted writing. Headline writers for news publications that are online – pretty much all of them – must now write straightforward, almost boring headlines because they’re search-friendly. In writing web sites, we assess and strategically include keywords. We’re learning best practices for working the protocols, and as we better understand the semantic web we’ll be thinking how to optimize for it, and that’ll have an impact on our language and our thinking. We’re mutating.It would be interesting to do a futurist assessment of the trends and what they suggest. I’ve actually been thinking about that a lot.

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s