Why is get_timeseries much slower than get_data?

moataz.elmasry · May 2018

Hi,

I am running an automated script that pulls real-time data as well as historic data. I've noticed however that when using get_timeseries, it is much slower than get_data (almost instant) even if I set the time period such that only one datapoint is retrieved.

Any advice on how to speed it up? The script I am running needs to finish within a few seconds, but the get_timeseries function is slowing it down significantly.

Regards,

Moataz Elmasry

moataz.elmasry · May 2018

so it turned out the reason behind the slowness is an issue with my code, I was feeding the wrong list to some loop that was supposed to only loop twice.

Alex Putkov.1 · May 2018

It is what it is. I guess it's fair to say that get_data method returns real-time market data faster than get_timeseries method. There are certainly reasons for that, which include real-time market data being closer and more ready available to the Web service that delivers it vs. timeseries of price history as well as a more flat data model for real-time market data vs. timeseries. From the client application side there's not much you can do to speed up the retrieval of timeseries data. Pretty much the only variable you have to play with is the number of RICs in the request. It would certainly be faster to retrieve timeseries for several instruments in one request than in a loop one RIC at a time. What the optimum number of RICs per request is can only be established empirically.
The only other recommendation I can think of is the obvious: limit your data requests to only the data you require.

moataz.elmasry · May 2018

I see, that is quite disappointing. The data I was trying to pull sometimes gets historically revised during the weekly publication time. So, I can't get the full revised data as soon as it is released without waiting for 3+ minutes until just 2 weeks of historic data is pulled.

Alex Putkov.1 · May 2018

If you're saying that it takes you 3 minutes to retrieve 2 weeks worth of daily closing prices for a single instrument (i.e. 10 data points in total) then it certainly sounds excessive and I'd like to see an example of request you're executing. If it's two weeks of daily history for multiple instruments (how many?) or if it's two weeks of tick history, which can be many millions of data points per instrument, and which cannot even be retrieved in a single request at all, then 3 minutes may not be excessive.

moataz.elmasry · May 2018

I am pulling 129 RICs and the data is weekly so I am only pulling 2 data points per RIC.

Alex Putkov.1 · May 2018

This sure doesn't sound right to me. I just tried retrieving two latest weekly close prices for the constituents of S&P Toronto Stock Exchange Composite stock index (that's 248 RICs) and the retrieval took about 5 seconds. Would you care to share your code or the requests you're executing?

moataz.elmasry · May 2018

df = ek.get_timeseries(
    rics=[api_rics],
    fields=["CLOSE"],
    start_date=from_date.strftime('%Y-%m-%d'),
    end_date=datetime.datetime.now().strftime('%Y-%m-%d')
)

And I can't post all the RICs as then I exceed the characters limit.

Alex Putkov.1 · May 2018

This request is for daily timeseries, not weekly. Although it's possible that the RICs in your request are only updated weekly, in which case the request for daily history may return weekly timeseries. For weekly history request you'd need to include interval='weekly' parameter. I tried your request for daily close price history for 248 stock RICs and, if the start date is only a couple of weeks back from today, then on my end it still only takes a few seconds to fulfill the request. Can you post the RICs you use an an attachment txt or csv file?

Why is get_timeseries much slower than get_data?

Best Answer

Answers

Categories