Part 1: Creating a Web Content Extractor Microservice using Go

Since I fancy myself language agnostic, I decided why not give Golang a try. Glad I did, very simple to develop on. A nice language, concise in the syntax, yet powerful and easy to setup and get started with. JetBrains has additionally released their preview version of Gogland – an IDE for Go development – and its exactly what you would expect from JetBrains – quality.

So I gave a first run at my content extractor. It works to some degree, but as some sites have gotten too fancy for their breetches and heavy on the asynchronous loading via AJAX – its not perfect. Ideally this piece can get better, but I’m going to get a nice initial case working by using Drudge Report since its nice and basic HTML. Then I’ll work on getting the major news outlets working.

Lets have a look at the Go code

As you can see we have a function that fetches the content and we have a function that removes the HTML tags, scripts, and consolidates the white space in the return value. I purposefully constructed it so that we can take advantage of go routines and channels when we build this thing out to be a hardcore realtime sentiment analysis machine.

The plan is expose this service as a microservice. We will have an additional service that uses IBM Watson to analyze the output of this service. We will then have an additional service that will connect the two and output a multiplier for doomsday. A 0 means everything is Ella James “At Last” like a song lubby dubby, and a 1 means that after I poop my pants in shear terror, and I must run… I must run away from all population centers but close to water, dig a hole, surround the outer walls in rocks, add a camoflauge roof, and hide there until the value gets to a more suitable .7 or .6.

Of course you could use this value to do other things like make a fortune, and lose it but I prefer it for my own sense of kitty like security and safety.