Challenges of Archiving Industry Index Values
Posted: 2014-08-20 Filed under: eveonline | Tags: api, arm, crest, mysql, raspberry pi 6 CommentsIf you have any interest in 3rd party development or databases, this post will be a entertaining as I share my current lackluster architecture for saving some of the new Eve API data.
I have a Raspberry Pi running MySQL that I use as a basic storage location for various databases. One of them is a new database that contains index values from the CREST API endpoint with a timestamp so I can start to develop a archive of the values.
The current solution I have for importing the new Index data isn’t very elegant. I’m using Lockefox’s Crius Toolset sheet to grab a CSV every day, adding two columns for a primary key and a timestamp, and importing into my table to get a table that looks like this:
“transactionID” “transactionDateTime” “solarSystemName” “solarSystemID” “manufacturing”
“103724” “2014-08-19 13:28:00” “Jouvulen” “30011392” “0.0339317910000”
“103725” “2014-08-19 13:28:00” “Urhinichi” “30040141” “0.0236668590000”
“103726” “2014-08-19 13:28:00” “Akiainavas” “30011407” “0.0285709850000”
“103727” “2014-08-19 13:28:00” “Seitam” “30041672” “0.0162879230000”
“103728” “2014-08-19 13:28:00” “BK4-YC” “30003757” “0.0143238350000”
It’s growing and starting to show how under-powered the Raspberry Pi is for data processing. Most of my issue stems from a lack of salable design on my part. I have no table indexes and am joining with the bulky mapDenormalize table.
I have a love-hate relationship with the mapDenormalize table. If you have ever worked with this table, you know that it is a beast: 502,558 rows, 15 columns with five of them being DOUBLE values coming in at 214MB. Normally not a problem for server with a lot of CPU cycles and RAM, but the 700MHz ARM processor on the Raspberry Pi has a hard time with multiple JOIN operations and GROUP BYs.
Here’s a query I was running against my dataset that ran for 15.5 minutes (!).
SELECT systemCosts.solarSystemName, systemCosts.transactionDateTime ,systemCosts.manufacturing, mapRegions.regionName, mapDenormalize.security
FROM systemCosts
JOIN mapDenormalize
ON (systemCosts.solarSystemID = mapDenormalize.solarSystemID)
JOIN mapRegions
ON (mapDenormalize.regionID = mapRegions.regionID)
GROUP BY transactionDateTime,solarSystemName
So two full table JOIN and two GROUP operations and a table with no indexes, uf. I sent a screenshot off to James, my development partner.
My first solution was to remove extraneous data from the mapDenomalize table. After removing the groupID, constellationID, orbitID, x, y, z, radius, celestialIndex, and orbitIndex I trimmed the table down even further by deleting all entries that were not star systems. What was left was 7,929 rows coming in at around 1MB.
I’m glad to report that my terrible query is now running in 20 seconds. This was a small step to getting my growing index dataset to a usable state while I write something more permanent.
1. It looks like you joining to mapSolarSystems but you not using anything from that table. I think you could remove that extra join.
2. Given that you have no aggregation functions, the group by clause is unnecessary.
3. It might be more efficient to do an inner select creating a result of solarSystemId, regionName, securityLevel. And then
select …
from systemCosts
join ( inner select statement ) systemDetails
on systemCosts.solarSystemId = systemDetails.solarSystemId
I don’t get why all the trouble with mapDenormalize if everything he wants is in the much more compact mapRegions and mapSolarSystems.
Grouping by solarSystemID only would allow for getting only one data row per system, but it is undefined which one, and it defaults to the first and not the latest row in mysql, so that doesn’t make much sense either. Also why no indices?
I use this table:
CREATE TABLE `iveeIndustrySystems` (
`systemID` INT(11) NOT NULL,
`date` DATE NOT NULL,
`manufacturingIndex` FLOAT UNSIGNED NOT NULL,
`teResearchIndex` FLOAT UNSIGNED NOT NULL,
`meResearchIndex` FLOAT UNSIGNED NOT NULL,
`copyIndex` FLOAT UNSIGNED NOT NULL,
`reverseIndex` FLOAT UNSIGNED NOT NULL,
`inventionIndex` FLOAT UNSIGNED NOT NULL,
PRIMARY KEY (`systemID`, `date`),
INDEX `date` (`date`)
)
COLLATE=’ascii_general_ci’
ENGINE=InnoDB;
And the equivalent query runs reasonably fast on my celeron server:
SELECT mr.regionName, mss.solarSystemName, mss.security, iis.* FROM iveeIndustrySystems as iis
JOIN mapSolarSystems as mss ON mss.solarSystemID = iis.systemID
JOIN mapRegions as mr ON mr.regionID = mss.regionID
WHERE iis.date = ‘2014-08-20’;
Somehow the post ate “iis.date” in the WHERE clause…
Why use mapDenormalize? Security is in mapSolarSystem.
MapDenormalize is a terrible table to use, if you’re not needing celestials.
ah good point, good thing this is just a testbed at this point
On top of other suggestions, definitely add indexing and consider adding in horizontal partitioning by date.