Challenges of Archiving Industry Index Values

If you have any interest in 3rd party development or databases, this post will be a entertaining as I share my current lackluster architecture for saving some of the new Eve API data.

I have a Raspberry Pi running MySQL that I use as a basic storage location for various databases. One of them is a new database that contains index values from the CREST API endpoint with a timestamp so I can start to develop a archive of the values.

The current solution I have for importing the new Index data isn’t very elegant. I’m using Lockefox’s Crius Toolset sheet to grab a CSV every day, adding two columns for a primary key and a timestamp, and importing into my table to get a table that looks like this:

“transactionID” “transactionDateTime” “solarSystemName” “solarSystemID” “manufacturing”
“103724” “2014-08-19 13:28:00” “Jouvulen” “30011392” “0.0339317910000”
“103725” “2014-08-19 13:28:00” “Urhinichi” “30040141” “0.0236668590000”
“103726” “2014-08-19 13:28:00” “Akiainavas” “30011407” “0.0285709850000”
“103727” “2014-08-19 13:28:00” “Seitam” “30041672” “0.0162879230000”
“103728” “2014-08-19 13:28:00” “BK4-YC” “30003757” “0.0143238350000”

It’s growing and starting to show how under-powered the Raspberry Pi is for data processing. Most of my issue stems from a lack of salable design on my part. I have no table indexes and am joining with the bulky mapDenormalize table.

I have a love-hate relationship with the mapDenormalize table. If you have ever worked with this table, you know that it is a beast: 502,558 rows, 15 columns with five of them being DOUBLE values coming in at 214MB. Normally not a problem for server with a lot of CPU cycles and RAM, but the 700MHz ARM processor on the Raspberry Pi has a hard time with multiple JOIN operations and GROUP BYs.

Here’s a query I was running against my dataset that ran for 15.5 minutes (!).

SELECT systemCosts.solarSystemName, systemCosts.transactionDateTime ,systemCosts.manufacturing, mapRegions.regionName, mapDenormalize.security
FROM systemCosts
JOIN mapDenormalize
ON (systemCosts.solarSystemID = mapDenormalize.solarSystemID)
JOIN mapRegions
ON (mapDenormalize.regionID = mapRegions.regionID)
GROUP BY transactionDateTime,solarSystemName

So two full table JOIN and two GROUP operations and a table with no indexes, uf. I sent a screenshot off to James, my development partner.

2014-08-04_terrible_query

My first solution was to remove extraneous data from the mapDenomalize table. After removing the groupID, constellationID, orbitID, x, y, z, radius, celestialIndex, and orbitIndex I trimmed the table down even further by deleting all entries that were not star systems. What was left was 7,929 rows coming in at around 1MB.

I’m glad to report that my terrible query is now running in 20 seconds. This was a small step to getting my growing index dataset to a usable state while I write something more permanent.


6 Comments on “Challenges of Archiving Industry Index Values”

  1. Gavin says:

    1. It looks like you joining to mapSolarSystems but you not using anything from that table. I think you could remove that extra join.
    2. Given that you have no aggregation functions, the group by clause is unnecessary.
    3. It might be more efficient to do an inner select creating a result of solarSystemId, regionName, securityLevel. And then

    select …
    from systemCosts
    join ( inner select statement ) systemDetails
    on systemCosts.solarSystemId = systemDetails.solarSystemId

  2. Aineko says:

    I don’t get why all the trouble with mapDenormalize if everything he wants is in the much more compact mapRegions and mapSolarSystems.
    Grouping by solarSystemID only would allow for getting only one data row per system, but it is undefined which one, and it defaults to the first and not the latest row in mysql, so that doesn’t make much sense either. Also why no indices?

    I use this table:
    CREATE TABLE `iveeIndustrySystems` (
    `systemID` INT(11) NOT NULL,
    `date` DATE NOT NULL,
    `manufacturingIndex` FLOAT UNSIGNED NOT NULL,
    `teResearchIndex` FLOAT UNSIGNED NOT NULL,
    `meResearchIndex` FLOAT UNSIGNED NOT NULL,
    `copyIndex` FLOAT UNSIGNED NOT NULL,
    `reverseIndex` FLOAT UNSIGNED NOT NULL,
    `inventionIndex` FLOAT UNSIGNED NOT NULL,
    PRIMARY KEY (`systemID`, `date`),
    INDEX `date` (`date`)
    )
    COLLATE=’ascii_general_ci’
    ENGINE=InnoDB;

    And the equivalent query runs reasonably fast on my celeron server:
    SELECT mr.regionName, mss.solarSystemName, mss.security, iis.* FROM iveeIndustrySystems as iis
    JOIN mapSolarSystems as mss ON mss.solarSystemID = iis.systemID
    JOIN mapRegions as mr ON mr.regionID = mss.regionID
    WHERE iis.date = ‘2014-08-20’;

  3. fuzzysteve says:

    Why use mapDenormalize? Security is in mapSolarSystem.

    MapDenormalize is a terrible table to use, if you’re not needing celestials.

  4. sales alt says:

    On top of other suggestions, definitely add indexing and consider adding in horizontal partitioning by date.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.