We currently have two different applications, one for extracting content from a web page and one for extracting a tone analysis from a string of text. Now we want to make these services and have them deployed where we can start using them, possibly at massive scale as we attempt to discern the overall feeling of the planet.
When we get into microservices, and more specifically container based microservices, it becomes a wide open range of possibilities. Deciding on the infrastructure and architecture of your system is not clear or easy. We have several PaaS, IaaS, and container clouds available to process among them are
- Heroku PaaS
- AWS ECS with API Gateway
- AWS EC2 and CloudFormation
- AWS Lambdas and API Gateway
- Microsoft Azure
- Google Cloud Platform
- OpenShift Kubernetes Clusters
- Joyent Container Clusters
- Bluemix Container Applications
- Or a Roll your Own system
Evaluating which service is best can be a pain in the rear. Its a ton of information to parse through, a lot of technical guides, and involves a whole bunch of screwing up, dead ends, and failure.
If you decide to Roll Your Own container/pod infrastructure you additionally have to look at products such as
- Zuul / Archaius / Hystrix (the Netflix Cloud model)
- Docker Compose
- Docker Swarm
- Nomad Dispatch
- Jenkins Pipeline
As You can see the ecosystem of applications in this realm of highly scalable service based applications that are easily configured gets intense quickly. So if we are to build something out in this infrastructure it helps to have an architectural plan for what we are trying to accomplish.
What are we trying to build?
We want to create a system that churns through websites extracting their emotional tone content such that we can plot this data on a time axis and be able to discern deviations and events in the time series. We will start with a site that changes maybe twice per day – like the Drudge Report.
Twice a day, perhaps at high noon and midnight, we want to
- Put the www.drudgereport.com into the system
- Extract the content of www.drudgereport.com
- Feed that content into IBM Watson tone analysis
- Store the 5 emotional probabilities returned by Watson for that site at that time.
It would be very easy to have one machine do this. In fact its probably the best solution to have one machine perform this task. We could have a cron job run twice daily a python script that does this and feed that information to a Graphite server. Super easy and simple. That, keep in mind, is likely the best solution at this point in time.
But our first problem is I wrote part of it in Go and part of it in Python, and while I could use some shell scripting no how to chain these processes together… again, thats not the elegant solution that scales easily.
Scale the system to be used as a Service
Now lets say that I want to offer this series of events as a service. You put a URL in the queue, and your URLs tone data is recorded in Graphite. Now lets say I have a large trading portfolio and I want to do tone analysis on the 4000 companies I currently invest in. Perhaps I want to do tone analysis on the first 10 news results regarding each company that is publicly traded on the NYSE or NASDAQ. At that point I have 40,000 URLS in which to run this process through. If each analysis takes roughly 30 seconds, and we are doing it twice a day, we are looking at 40,000 minutes of processing time. Our one machine running our script isnt going to be able to do this. In fact we would need a cluster of 12 machines running 24/7 to handle this much processing.
Luckily we are starting off at 1 and we are not planning on getting up to that 4000 company mark anytime soon. But that is our maximum throughput so we want to architect the system such that we are able to handle something on the order of that.
Design your Services
This system requires four basic microservices.
A first run at the API for this service could be
Alternatively we can look at Hashicorp’s Nomad Dispatch, but like so many things in DevOps, I havent had time to review this so we will hold off on the queue service for now until I can review Nomad Dispatch.
Extract Content Service
Tone Analysis Service
Record Data Service
record(url string, emotion string, value float64)
Message Based Event Driven Architecture
Here we want to illustrate how we are to connect all of our services. We decide to do this with an event driven architecture using a system like Apache Kafka or RabbitMQ. Basically when a pushes a value into the queue service, it signals this event, which the extract content service is then in turn subscribed to. It then pulls the URL from the published event via pop() and runs the extraction. It then publishes another message of content available. This series of messages the Tone Analysis service subscribes to. It pulls down the content and sends it off for analysis and then when the analysis returns, it publishes a final tone analysis complete message. The record data service listens for these and upon recieving one, it promptly records the values in graphite.
Advantages of the Event Driven Architecture
This architecture of our system affords us a lot of advantageous features over a standard setting up a cluster of cron controlled scripts that report to a graphite server.
Loosely Coupled Services
Each service can do its job without any reliance on any other service. This allows us quite a bit of freedom should we need to scale particular parts of the system or change out the infrastructure for a particular part. Nothing changing in one service is going to change another.
From the blog Los Techies, the event driven architecture will lend itself to common problems with duplicate messages especially in scaled/redundant systems.
In the future, should we say want to purchase a stock based on joyful trend in their news reports, the event driven microservices architecture would allow us to subscribe a stock purchasing service to tone analysis messages and act when a certain condition is met.
This is pretty much a given. With microservice architectures we leverage the architecture to make scaling both horizontally and vertically easy.
This is a nice survey of some of the design considerations we have made to construct this system. In the next few posts we will venture in to setting up infrastructure for our individual services and development pipelines for each one. After that we will look into messaging infrastructure and how we begin to rig up our services with something like Apache Kafka. After that we will look into setting up our Graphite service. Then we will address issues like scalability and resilancy with our architecture. After that we will look into setting up a configuration infrastructure to allow us to configure our services with a set of “golden images” and configuration as code systems.