In the first part of our series we wrote a simple web content extractor function that would fetch a web page, remove all the markup and return a simple text string representing the text on the page. While not a perfect solution (it chokes on AJAX features commonly found on many modern sites), it will serve our purpose for this series thus far.
The plan for this post was to write a service using IBM’s new cognitive APIs using Watson and Bluemix. Watson has APIs in Node.JS, Python, Java, iOS, and Unity3d. Since I am going to set up each of the microfeatures as microservices I chose Python, because its easy ramp up, syntax, and the easy debugging of an interpreted language. Also I have a PyCharm license so that makes things easy.
Watson and Bluemix Refuse to work
As luck would have it, Watson and Bluemix had issues and I could not create credentials for my Tone Analysis service. After creating the service, I could not delete the service. I got a 404. I submitted a ticket. The appalling state of their production system, at least as it pertains to this particular feature is really disappointing. I figured Big Blue has a little more pride in their end product. As a result of poor software engineering IBM…. I am going to use a sentiment analysis library in Python to serve much of the same purpose. This will be fine and will give us the ability to stay within a single cloud provider since IBM can’t get its shit together.
Un peu de recherche
Since this isn’t going to be as easy as letting IBM tell me if a web page has a whole bunch of shit hitting the fan on it or not, I have to do a little research on what technically sentiment analysis is and how it works. Luckily Chris Potts from Stanford Linguistics gave a symposium on this very topic. Applying sentiment analysis to the news is hardly new, and the results are hardly mind blowing or unexpected… but its something to do. I’m fairly certain there is some mathematics that can somehow parse a bunch of free text and reduce it to a multiplier for how unsettling it is. The set of unsettling words is limited, and I figure the use of one or more of those correlates with building my survival bunker.
So I begin my research on sentiment analysis.
All Roads Lead To Bayes
As much as I would love to spend my entire Saturday reading probability books as it pertains to natural language analysis, I’m pretty sure it all ends up with Bayes looking at me with his “you have no idea what I am talking about” dead eyes. Also I have a trip to the Grand Canyon. So off to the Gorgeous Gorges of northern Arizona with Bayes in mind.