The Ticker News Sentiment Analysis System (TNSAS we’ll call it) has gone through quite a transformation lately. Not only have we taken advantage of our containerization to scale the tone analysis service independently from the content scraper service, but we have also taken advantage of Redis based caching to reduce the total number of calls sent to IBM Watson to an absolute minimum as well as providing a log of all scraped documents and their related analysis.
We report this data via the console and copy and paste to a spreadsheet for tracking over time. This is not ideal and require significant manual data collection over an extended period time to get data at a resolution small enough to enable something approaching a real time system. We need a solution that can stand up and collect data consistently and then report it per ticker.
Setting up a MariaDB Sidecar
Setting up MariaDB to use as an RDBMS that will store our tone results and pricing info is super simple. We simply add another container to our docker-compose.yml file. We ground this container to a physical on host volume store. This means the data stored will persist as we stop and start the service. To add MariaDB we add the following to our docker-compose.yml
1 2 3 4 5 6 7 8 |
mysql: image: mariadb:10.1.22 volumes: - ".mysql:/var/lib/mysql" ports: - "3306:3306" env_file: - env.lst |
With MariaDB you want to ground the version of the server using the specific version tag of the container. Here we use 10.1.22. You can also see we use .mysql subdirectory for storing the data on the host. We expose port 3306 so we can connect to it and provide our environment configuration file with the following configuration parameters:
1 2 |
MYSQL_ROOT_PASSWORD=root MYSQL_DATABASE=stockworks |
Using a MariaDB Container
We start up our container ecosystem with docker-compose up and then use something such as MySQL Workbench, Navicat, or SquirrelSQL. We can see that the stockworks database already exists and that our .mysql subdirectory on the host is populated with MySQL.
We want to create a table where we can record our data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
USE stockworks; CREATE TABLE tone ( ticker varchar(10), recordtime datetime, price decimal, conscientiousness float, confidence float, anger float, joy float, sadness float, disgust float, emotionalrange float, extraversion float, tentative float, analytical float, agreeable float, openness float, fear float, conscientiousness_std float, confidence_std float, anger_std float, joy_std float, sadness_std float, disgust_std float, emotionalrange_std float, extraversion_std float, tentative_std float, analytical_std float, agreeable_std float, openness_std float, fear_std float, CONSTRAINT pk_tone PRIMARY KEY(ticker, recordtime) ); CREATE INDEX idx_ticker ON tone (ticker) |
We then install
1 |
pip install pymysql |
which is a pure python connector to MySQL. This is a bit easier to use than installing the MySQLDb library for python. We make a connection to the database
1 2 3 4 5 6 |
connection = pymysql.connect(host='localhost', user='root', password='root', db='stockworks', charset='utf8mb4', cursorclass=pymysql.cursors.DictCursor) |
and then when we report the data, we also record it to mysql
1 2 3 4 5 6 7 8 |
with connection.cursor() as cursor: # Create a new record sql = "INSERT INTO `tone` (`ticker`, `recordtime`,`price`,`conscientiousness`,`confidence`,`anger`,`joy`,`sadness`,`disgust`,`emotionalrange`,`extraversion`,`tentative`,`analytical`,`agreeable`,`openness`,`fear`,`conscientiousness_std`,`confidence_std`,`anger_std`,`joy_std`,`sadness_std`,`disgust_std`,`emotionalrange_std`,`extraversion_std`,`tentative_std`,`analytical_std`,`agreeable_std`,`openness_std`,`fear_std`) VALUES (%s, %s, %s, %s,%s, %s, %s, %s,%s, %s, %s, %s,%s, %s, %s, %s,%s,%s, %s, %s, %s,%s, %s, %s, %s,%s, %s, %s, %s)" cursor.execute(sql, (ticker, time.strftime("%Y-%m-%d %H:%M:%S"), float(pricing["price"]), np.mean(emotions["Conscientiousness"]).item(),np.mean(emotions["Confident"]).item(),np.mean(emotions["Anger"]).item(),np.mean(emotions["Joy"]).item(),np.mean(emotions["Sadness"]).item(),np.mean(emotions["Disgust"]).item(),np.mean(emotions["Emotional Range"]).item(),np.mean(emotions["Extraversion"]).item(),np.mean(emotions["Tentative"]).item(),np.mean(emotions["Analytical"]).item(),np.mean(emotions["Agreeableness"]).item(),np.mean(emotions["Openness"]).item(),np.mean(emotions["Fear"]).item(),np.std(emotions["Conscientiousness"]).item(),np.std(emotions["Confident"]).item(),np.std(emotions["Anger"]).item(),np.std(emotions["Joy"]).item(),np.std(emotions["Sadness"]).item(),np.std(emotions["Disgust"]).item(),np.std(emotions["Emotional Range"]).item(),np.std(emotions["Extraversion"]).item(),np.std(emotions["Tentative"]).item(),np.std(emotions["Analytical"]).item(),np.std(emotions["Agreeableness"]).item(),np.std(emotions["Openness"]).item(),np.std(emotions["Fear"]).item())) # connection is not autocommit by default. So you must commit to save # your changes. connection.commit() |
And that’s it. We can now get a report on the data for a particular ticker
1 |
SELECT * FROM stockworks.tone where ticker='AAPL' order by recordtime asc; |
The updates happen automatically when we run the analysis so now we approach a solution that can operate in real-time.