Part 6: Creating a Go Web Content Extraction Microservice

In Part 1 of this series we created a simple Go function to extract the content of a web page. In this posting, we are going to take that function and use the nicely developed http package of golang to expose that function as a microservice.

Making a Go Program a Webservice

Our first steps to creating a microservice from our simple go function is to reorginize our code and track it in GitHub. We want to create an installable library and binary such that a container can pull down the latest version of the source and start the application with no problems. To do this, we create two files.

In a lib subdirectory we include the contents of the go package. We will include this in the web service go file as well. Take note that there is a strange, very subtle way, of exposing symbols of a package externally. You CAPITALIZE the first letter of a symbol, in this case a function name, to expose it for use externally. I’m not sure that its an intuitive thing to someone new to go, or even if its a sane syntactical decision on the go folks part, but it is pretty nice eventually.

As you can see we are only exposing a single method, the Scrape function, to the outside world.

In the parent directory we write the go code for the binary we will include. The binary will be responsible for standing up an http server. This http server will basically transfer the URL query parameter to our backend library and return the extracted content. For this we include the go package “net/http” which is included and simple to use.

The app.go file is as follows

Now you can see we define two routes, a /status and a /scrape route. Each route has a handler function which we define. Each handler function has a specific contract

During the scrape call, we extract the url parameter from the request (r) pointer, and then call our library’s expose Scrape function with that value.

That function is imported in our import call at the top

Finally in the main function we define the routes and start the server. Also make sure we are in package main to tell go this is a binary that we want installed that runs the main method.

We commit this to subversion and we are ready to install it to our Go workspace.

Installing a Go Webservice from Git

This post will explain how to setup your workspace, $GOPATH, and all the scaffolding you need to use go correctly. 

Assuming you have that all set up, simply run

Go will fetch the latest source from github – in this case the source I have checked into my repo, but you can replace the allengeer part with your github user name. It compiles the library places it in your workspace, and it compiles the binary and places it on the bin path of your workspace.

Now to start the webservice (assuming $GOPATH/bin is on your $PATH)

Now visit http://localhost:5000/status

You should see RUNNING. Now give this a try http://localhost:5000/scrape?url=http://allengeer.com/part-6-creating-a-go-web-content-extraction-microservice

You should see the text content of this page. How cool is that?!

Containerize a Go Microservice with Docker

The final step in creating our Microservice is to package it in a container using docker. To create a Go app that is tracked in GitHub – such as we have done here – the Dockerfile is super simple.

As you can see, we start from the the golang image, we fetch the two required go packages, and then we run the scraper. We expose port 5000. We can build with the standard

and then run our container with

and then scale it up

In the next post we are going to look at how to start our microservice ecosystem with the microservice we created in part 5 and our web extractor microservice to start creating a system for extracting web page content and analyzing it with IBM BlueMix.

Source Code

The source code for this post is available at https://github.com/allengeer/scrape