Challenges of Archiving Industry Index ValuesPosted: 2014-08-20
If you have any interest in 3rd party development or databases, this post will be a entertaining as I share my current lackluster architecture for saving some of the new Eve API data.
I have a Raspberry Pi running MySQL that I use as a basic storage location for various databases. One of them is a new database that contains index values from the CREST API endpoint with a timestamp so I can start to develop a archive of the values.
The current solution I have for importing the new Index data isn’t very elegant. I’m using Lockefox’s Crius Toolset sheet to grab a CSV every day, adding two columns for a primary key and a timestamp, and importing into my table to get a table that looks like this:
“transactionID” “transactionDateTime” “solarSystemName” “solarSystemID” “manufacturing”
“103724” “2014-08-19 13:28:00” “Jouvulen” “30011392” “0.0339317910000”
“103725” “2014-08-19 13:28:00” “Urhinichi” “30040141” “0.0236668590000”
“103726” “2014-08-19 13:28:00” “Akiainavas” “30011407” “0.0285709850000”
“103727” “2014-08-19 13:28:00” “Seitam” “30041672” “0.0162879230000”
“103728” “2014-08-19 13:28:00” “BK4-YC” “30003757” “0.0143238350000”
It’s growing and starting to show how under-powered the Raspberry Pi is for data processing. Most of my issue stems from a lack of salable design on my part. I have no table indexes and am joining with the bulky mapDenormalize table.
I have a love-hate relationship with the mapDenormalize table. If you have ever worked with this table, you know that it is a beast: 502,558 rows, 15 columns with five of them being DOUBLE values coming in at 214MB. Normally not a problem for server with a lot of CPU cycles and RAM, but the 700MHz ARM processor on the Raspberry Pi has a hard time with multiple JOIN operations and GROUP BYs.
Here’s a query I was running against my dataset that ran for 15.5 minutes (!).
SELECT systemCosts.solarSystemName, systemCosts.transactionDateTime ,systemCosts.manufacturing, mapRegions.regionName, mapDenormalize.security
ON (systemCosts.solarSystemID = mapDenormalize.solarSystemID)
ON (mapDenormalize.regionID = mapRegions.regionID)
GROUP BY transactionDateTime,solarSystemName
So two full table JOIN and two GROUP operations and a table with no indexes, uf. I sent a screenshot off to James, my development partner.
My first solution was to remove extraneous data from the mapDenomalize table. After removing the groupID, constellationID, orbitID, x, y, z, radius, celestialIndex, and orbitIndex I trimmed the table down even further by deleting all entries that were not star systems. What was left was 7,929 rows coming in at around 1MB.
I’m glad to report that my terrible query is now running in 20 seconds. This was a small step to getting my growing index dataset to a usable state while I write something more permanent.