The Web Ecology Project released a study last year (“Detecting Sadness in 140 Characters“) which took the outpouring of grief and commentary associated with Michael Jackson’s death as a controlled data set for assessing sentiment analysis tools and protocols. The study is a marvelous unpacking of what goes into text-based opinion mining, and the subtle variations in language which indicate wild variance in sentiment.
In August, the New York Times ran an article on the commercial application of online sentiment analysis (disclosure: article features both a personal friend, as well as a tool I use regularly). In it, the writer teases out the difference between a search engine’s primarily quantitative assessment of online data and the primarily qualitative assessment of sentiment values, and emphasizes the power given to consumer’s when they can search for “The best hotel in San Antonio” and get results ordered in terms of a web-wide opinion poll on the topic.
Web folk love to ask and answer the question of “what’s next?”, and it is often (with some paucity of imagination) framed in terms of versions. Recently I was asked what I believed constituted “web 2.0” and what would constitute “web 3.0.” I don’t recall my answer on that day, but I bet it was clumsy. Today, I found my current answer in this revision of the Wikipedia article on sentiment analysis (linked above):
If web 2.0 as all about democratizing publishing, then the next stage of the web may well be based on democratizing data mining of all that content that’s getting published.
The largest missing piece in mining web data is the vector of content sentiment. A truly successful implementation would take a substantive step toward the idealized “semantic web.” I hunch that development in this area will kick off renewed interest in the classical study of rhetoric (i.e. the persuasive power of langauge and argument), and be an incremental step toward effective artificial intelligence (AI).
If this appears all a bit too utopic, I also hunch that that We will be faced with an over-reliance on opinion, resulting in issues including:
- A new strain of social concern regarding tyranny of short-term (and often short-sighted) passions and opinions (an issue interestingly un-packed in the documents surrounding the framing of the US Constitution, including the Federalist Papers and the Anti-Federalist Papers)
- A relative freezing of the fluidity of language like what occurred to the English language when writing (and to a greater degree, printing) became commonplace [yes, this assertion needs citing, I’ll call it hearsay for now]. If we can reliably pre-assess how our published language’s sentiment will be broadly received, will we venture into experimental, emerging, and/or colloquial language as often?
At present discussion of this issue publicly lives in the domain of “the politician (over/under) reliant on polls.” But what happens when we all have an easily accessible opinion poll available at all times?