Part 10: Adding a Redis Cache to our Watson Microservice

The Bill

Developing microservices for pleasure and profit works when it doesn’t cost anything to do so. After our last month of developing a tone analysis system using IBM Watson on BlueMix, I tallied up quite a few queries and as a result got a bill from BlueMix for a whole 9 cents.

Now 9 cents isn’t a lot, in fact I found 9 cents on the sidewalk today to pay for it so its not concerning at this point. However, in our last post we really created a possible problem. The application can now process upwards to 20 requests to bluemix simultaneously. This means I could run through the entire story space for the entire NASDAQ in less than a day. At $0.0088 a query for 60,000 queries… thats a whopping $528 for a one day mistake. And since none of that data is cached…. I’ll be broke within two days.

Adding Redis Container

In order to cut down on duplicate queries, I am going to add a Redis cache to the Tone Analysis Microservice. Redis is a simple Key Value store.

First I am going to add a Redis Sidecar container to our docker-compose.yml. We are going to link the Redis container to a persistent store by the volume directive. This means the ephemeral container will always use the same files and our cache will remain even though the container goes up and down.

And then to make sure the tone-analysis service can see it I will add a link in the tone service definition

Wow installing Redis is super simple?!

Caching our Tone Analysis results

Next we are going to add caching to the calls made to Bluemix. We pay about a penny every time we send a call off to Watson. So we want to make sure we aren’t sending the same thing off to Watson multiple times. To do this, every piece of text we analyze we are going to take the SHA256 hash of that text. This digest will serve as the key. Then we are going to submit the text to Watson. The response will be the value.

From that point on, if we get the same text in, we will have a cache hit and forgo giving Watson another penny. Pretty simple.

Additionally, to ensure backwards compatibility we set the Redis hostname through an environment variable. If that is not set then we won’t use caching.

To use redis in python we will need to install the redis library as well

Of course in the container on docker hub, I have already done this for you. The updated python code for the redis enabled tone analysis service is as follows.

Now we fire up our microservice ecosystem and give it a run. Make sure we remember to scale our tone service to 20 containers to take advantage of the multithreaded architecture we developed in the last post.

The first run should give 20 pennies to IBM and take roughly 20 seconds or so. Now lets run it again – the second run should return almost instantly as the exact same set of inputs should all result in cache hits and 0 pennies to IBM. But what we see on the second run is that several stories are still not resulting in a cache hit on their second time through. We are still giving Watson pennies for the same ole thing. Whats the problem?

Adding Caching to the Scraper

The problem is that the scraped content has minor changes on each scrape. Live feeds on the page such as stock price widgets, latest news feeds, twitter feeds, and advertisements are making the same URL come back with different content.

Luckily we can use Redis to cache the scraped content as well. Here we modify our to put the result of scraping in Redis and any query for that URL in the future will return the cached scrape instead of a new scrape.

Now we run our app. WOW Blazing fast! .46s!

Reviewing our Redis Cache

Now there has been a lot of hand waving here with Redis. What if we want to look at whats in Redis to verify our cache? Right now our keys are simply hashes and pretty undescriptive. To view our Redis database we first install a super useful Node.js application Redis Commander

Make sure your microservice ecosystem is standing up. Point your browser to http://localhost:8081 . You should see something like this

On the left hand side are all our keys, and on the right is the value of those keys. We can see there are two hash types – one MD5 (the hash of the URL) and one SHA256 (the hash of the scraped content). In theory if we take the value from one of the MD5 keys, digest it through SHA256… that should be a key that gets us the blue mix results.

It would be nice if we could parse this a little better from the GUI. To keep a nice log of what we’ve placed in cache we add this line to our

This will add a TICKER:URL:CONTENT and TICKER:URL:TONE set of keys so we can parse through the results of our processing easily.

Now there is some duplication of content in our Redis cache but its not significant enough for us to care. We sacrifice a bit of disk space for a bit of logging.