Data Warehousing using EVE Market Data Relay (EMDR)
Posted: 2013-02-20 Filed under: eveonline, industry, market | Tags: api, cron, csv, emdr, eve-central, mysql 3 CommentsOverview
Knowing the price and volume of items moving in the Eve market at a specific point in time is a very powerful piece of information that can help further your space empire.
Historically eve-central was the repository of market data but with new advances in cache scraping, data transport methods, and large archival storage methods have brought us exciting new capabilities.
tl;dr EMDR
What the heck is EMDR and why should I care?
Original implementations of market aggregation sites had a user click the Market Export button in the Eve client to produce text files in their local Documents\EVE\logs\Marketlogs folder and then use an application to transfer the exported files to a database. This method was tedious, lacked region and item coverage, and was prone to people editing the data before it was sent off.
A renascence was generated when developers began to explore the cache in the Eve client. The client cache is a location that serves as temporary storage for information that you are working with in the client; it is volatile and changes all the time rather than holding static art, formulas, or universe data as seen in the client .stuff files.
EMDR is a service that takes market orders from the cache of your local client and sends that immediately upon viewing it in the client to a relay service that people can subscribe to. If you click on an item in the market window, that piece of data is immediately sent off to many people that can receive it. The transfer is quick, the data is not tampered with, and it can easily be relayed to many interested parties. This is pure data, a statisticians dream!
Working with Eve-Central
What follows are notes from my partner, Raath, who has been withing with the EMDR method to improve our industrial operation.
When I first began adding price dependencies to the DRK Industry Tracker, it was back in the day before we had the EMDR. Eve-central was the bespoke out of game price lookup service and with the aid of their API I pulled prices from there.
At the time I don’t think it had the option to do a full pull of prices on every item in game so I devised a system where the tracker would cache prices locally and update them when they were older than an hour on a need to know mechanic so only the relevant data was requested.
I did this so that I didn’t swamp eve-central with hundreds of requests every hour for information that 95% of which would never be used. It was a system that worked well except for the occasional lag caused by the prices updating. As I said before, it only updated when prices were older than an hour as the users requested them. But when I started to think of releasing the tracker publicly, I needed something a little more reliable.
Transition to EMDR
At this point in time the EMDR was a fully developed solution so I started to look into how I could begin using this as my price basis and integrating data into our industrial workflow. Not really knowing the volume of information that would soon be assaulting our little virtual machine, I make a few mistakes.
Flood of Data
The term for the EMDR feed that people have adopted is the “fire hose” as it is literally a flood of overwhelming data being constantly sent at you with no regard for your ability to process it. As clients all across the world are clicking around the market window, updates are being sent to you in near real-time. The function name of “engage_fire_hose” only seemed natural.
Statistics
Our Industry site currently consumes 8 gig of data, serves 6-700 MB and calls around 150,000 API requests per day with the vast majority of our incoming data coming from the EMDR updates. In January we received 179 GB and sent 17 GB for an average of 5.8 GB/day of incoming data.
Processing Challenges
With the large amount of incoming price data, we needed to be able to efficiently process and store it while keeping the server responsive.
The first method I devised was a system where I stored transactionIDs in a hash table. This design soon showed its weakness as we hit a memory limit when the hash table started to fill up with millions of transactionIDs.
Additionally my attempts to keep load on MySQL server down were also in vain as the information was coming in so quickly that my consumer was having trouble keeping up. We soon had problems with locked rows and inserts failing so I had to completely revise my entire approach.
What resulted was a consumer process that guzzles around 8 GB of data per day and spits all the information to CSV files. There is no logical processing done in the consumer now, it just munches and spits data to file. Another cron job that runs every minute scans the temp file directory for CSVs, parses them all into one single large CSV, and then performs a LOAD DATA IN FILE into MySQL, a method which I’ve found is not only lightening fast but also keeps the load down.
Not all of the data is useful as the majority of it is duplicates. When the server has free time, we run a cleanup process in the form of flag checking remove expired records so that the data we display is as up to date.
Future Plans
All this work for clean price data. We have plans to further expand some of the market data functionality in projects to start using the cached market history to show the historical price or items and build costs compared to live data.
Nullsec Alliance Data Transformation
Posted: 2011-09-21 Filed under: eveonline | Tags: alliances, csv, data, excel, mysql, nullsec, pivottable, RipardTeg, spreadsheet 1 CommentMessy numbers to some, or a wealth of information for others. Here is a quick rundown of how I processed the data to form the Nullsec Alliance Growth and Decline post.
1. Get raw data from Wollari at Dotlan.
allianceID,date,members 151380924,26.06.2007,622 166439722,26.06.2007,1130 284278305,26.06.2007,252 288377808,26.06.2007,1151 386292982,26.06.2007,412 477769446,26.06.2007,252 628991027,26.06.2007,1062 632866070,26.06.2007,2267 673381830,26.06.2007,416
2. Load Eve Alliance information via API into a table.
3. Import Dotlan data into MySQL table and join allianceID on both tables to generate more human friendly data set that includes Alliance Names and Tickets.
SELECT alliances.allianceID, alliances.name, alliances.shortName, date, members FROM allianceHistory, alliances WHERE alliances.allianceID = allianceHistory.allianceID
4. Export data into an Excel readable file.
I spent some time debating about working with this data using a MySQL table and a charting engine such as Highcharts, but settled on using an Excel PivotTable. The time involved with me tinkering with the JavaScript (not my most proficient programming language) would really hinder progress.
Once in Excel, I had to reformat the date from dd.mm.yyyy to yyyy-mm-dd to allow proper chart generation. First, I used the text to column feature with a comma ‘,’ set as the deliminator. Secondly, in a new column, I used a formula to combine the date in my desired format.
=DATEVALUE(F1&"-"&E1&"-"&D1)
Hide the D, E, and F columns and we’re got a good set of data to analyze.
5. Select all 87,543 rows and generate a PivotTable using the Alliance Name as the legend, date as the Axis Fields, and a summation of the Member Count as the Value.
Now the fun part is thinking of WHAT is significant.
Some of the first things that I wanted to show was the rise and fall of the NC, growth of the DRF, and some charting for the larger Alliances like Goons and Test. These charts can be seen in the original post linked at the top.
RipardTeg of Jester’s Trek came to me with idea for a new report that I think will prove rather interesting. I’m going to be working with him to answer the question of ‘whether small sov-holding null-sec alliances are surviving in the last two years, or being folded into the mega-alliances over time’.
This type of Eve Community collaboration with passionate, exemplary individuals is what really keeps me playing. Stay tuned.