Part 9: Too Slow – Using Docker-Compose Scaling, HAProxy Load Balancing and Python threading to Increase Speed

Problem – Tone Analysis is Too Slow

Currently we fetch a list of the latest 20 news stories for a stock ticker and then iterate through the list sequentially scraping the content and then submitting that scrape to the IBM Watson tone analysis service. As a result we have a time of O(n) to process a list of URLS.

This is prohibitive once we start hitting tens of URLs. If each scrape/analyze process takes 10 seconds, for 10 urls we are looking at a minute and a half of processing. In the flow, the most time consuming piece is the waiting on IBM Watson to return its analysis.

To get metrics, we start by adding elapsed time metrics to our existing script. This is a very simple process

So now we run this to get a sequential baseline

So roughly 3.5 minutes to process one ticker symbol. Clearly if the goal is to process the whole of the NASDAQ a sequential architecture is going to blow tremendously and take on the order of 8 days to complete.

Python threading Parallelization

So clearly our sequential needs a bit of a revamp.  To process the list of links we make use of the threading library in python. Basically we want to take the scraping and the tone analysis piece of the process and put it into its own function which we will then call as its own thread. For each URL in our input list, we will create a thread to process it.

The threads all access the same dict that reports the return values from IBM Watson. To ensure we maintain the correct ordering of elements in the dict – such that item 2 in each of the emotional categories is for the same input content, we make use of the Lock object. Before we want to write the Watson results we aquire the lock. The first thread will take the lock and complete its updates to the dict. If any other thread attempts to acquire the lock in that time, it will block until the first thread releases the lock. This ensures the ordering of the results lists.

For each thread we spawn we keep track of it in a list of threads. We want to proceed to the reporting piece once all threads have completed. To do this, we call the join method on each of the threads.  The join method blocks on each thread until complete.

The updated (checked in as is as follows:

Running the Parallel Application

You will notice we have modified the log messages. Let’s go ahead and start up our docker-compose ecosystem and run the multi-threaded application.


Something terrible clearly happened. As expected we see each thread spawn up almost instantly, but then each thread ends sequentially. The total elapsed time is unchanged. Additionally, we get a few threads near the end begin throwing exceptions. Our data is incomplete now, and it took just as long to do. What happened?!

When we start our application we send almost simultaneously 20 requests to our tone analysis microservice. This is a flask service that handles one request at a time. As a result, even though our orchestrating application is now multithreaded, our microservice is not and cannot handle the load we are putting on it.

Scaling Microservices using HAProxy and Docker-Compose Scaling

To ensure we have enough containers running the tone-analysis service to handle the incoming load we are going to take advantage of the dockercloud/haproxy container. This container when configured properly, will load balance requests to a whole pool of containers.

We first modify our docker-compose.yml file. We remove the port mapping on the tone service definition and we add a lb definition which links to the tone service.

Since the tone service exposes port 5000, the HAProxy will route incoming traffic across the cluster of tone containers on port 5000. Lets start up our cluster with

Next we instruct docker to scale up the tone service.

Since we are going to handle 20 urls at a time, we chose 20 copies.

Docker starts up 20 containers….

HAProxy automatically detects and routes traffic to the different containers. Lets give our multi-threaded application a run with the backend that can handle 20 concurrent requests.


As we can see we increased performance time by a whopping 90% with this architectural advantage. The key component that allows us this willy nilly scaling ability is docker-compose and HAProxy automatic load balancing. At this rate, we will now be able to process the entirety of the NASDAQ tickers in roughly 19 hours instead of 8 days.

With this great power we now have a bit of a concern. Without thinking much we can really run up our Bluemix bill very quickly. As a result we will look at caching our tone analysis using Redis in the next post. Afterwards we are going to investigate how we can totally remove the python orchestration script and use an Event Driven Architecture (EDA) using the Pub/Sub broker using Apache Kafka.