get_timeseries returning incorrect time period from what was declared

pmorlen · July 2019

I have this code:

data = ek.get_timeseries(rics, fields='CLOSE',
start_date='2019-01-01',
end_date='2019-06-30')

but it returns data starting 5/20/2019 and ignores the start_date declared in the code:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 29 entries, 2019-05-20 to 2019-06-28
Columns: 103 entries, MMM to YUM
dtypes: float64(103)
memory usage: 23.6 KB

Based on other posts, it appears to be related to the 3000 shared row limit.

Here is a snippet of what I'd like returned - with daily closing price dates going back to 1/1/2019 for 103 equity tickers in total:

Close Date MMM AFL T ABBV ABT

2019-05-28 163.35 51.30 31.93 78.03 75.71

2019-05-29 161.40 51.43 31.91 78.06 75.67

Here is my current data retrieval code:

'rics' is a list of tickers

data3 = ek.get_data(rics, ['TR.PriceClose', 'TR.PriceCloseDate'], {'Sdate':'2019-01-01', 'EDate':'2019-06-30'})

But it returns a tuple:

(      Instrument  Price Close                  Date
 0            MMM       190.95  2019-01-02T00:00:00Z
 1            MMM       183.76  2019-01-03T00:00:00Z
 2            MMM       191.32  2019-01-04T00:00:00Z
 3            MMM       190.88  2019-01-07T00:00:00Z
 4            MMM       191.68  2019-01-08T00:00:00Z
 ...          ...          ...                   ...
 12767        YUM       110.66  2019-06-24T00:00:00Z
 12768        YUM       110.31  2019-06-25T00:00:00Z
 12769        YUM       110.12  2019-06-26T00:00:00Z
 12770        YUM       110.56  2019-06-27T00:00:00Z
 12771        YUM       110.67  2019-06-28T00:00:00ZI'm having trouble setting up 'get_data()' to return a dataframe, instead of a tuple.Can you please provide some guidance to correct?Thank you

Jirapongse · August 2019

@pmorlen

I have modified the code as shown below.

data3 = ek.get_data(rics,['TR.PriceCloseDate','TR.PriceClose'], {'Sdate':'2019-01-01', 'EDate':'2019-06-30'}) 

dfs = dict(tuple(data3[0].groupby('Instrument'))) 
dfarray = [] 
for ric, data in dfs.items(): 
   df_tmp = dfs[ric].dropna() 
   df_tmp = df_tmp.drop_duplicates() 
   df_tmp = df_tmp.set_index('Date') 
   df_tmp = df_tmp.drop(['Instrument'], axis=1) 
   df_tmp = df_tmp.rename(columns={"Price Close":ric}) 
   dfarray.append(df_tmp) 

result = pd.concat(dfarray, axis=1, sort=False) 
result.columns.name = 'CLOSE' 
result

It uses df.dropna() and df.drop_duplicates().

chavalit-jintamalit · July 2019

Hi @pmorlen

To get dataframe into data3, you can follow this code:

data3,err = ek.get_data(rics, ['TR.PriceClose', 'TR.PriceCloseDate'], {'Sdate':'2019-01-01', 'EDate':'2019-06-30'})

or

data3 = ek.get_data(rics, ['TR.PriceClose', 'TR.PriceCloseDate'], {'Sdate':'2019-01-01', 'EDate':'2019-06-30'}) [0]

data3 will be dataframe.

For datapoint limitation, please refer to this document on the last section, "

Try to detect and address datapoint limits"

Jirapongse · July 2019

The dataframe's format returned from get_data and get_timeseries is different.

I have implemented a simple script to make the dataframe from get_data similar to get_timeseries.

data3 = ek.get_data(rics,['TR.PriceCloseDate','TR.PriceClose'], {'Sdate':'2019-01-01', 'EDate':'2019-06-30'}) 

dfs = dict(tuple(data3[0].groupby('Instrument'))) 
dfarray = [] 
for ric, data in dfs.items(): 
   df_tmp = dfs[ric][pd.notnull(dfs[ric]['Date'])] 
   df_tmp = df_tmp.set_index('Date') 
   df_tmp = df_tmp.drop(['Instrument'], axis=1) 
   df_tmp = df_tmp.rename(columns={"Price Close":ric}) 
   dfarray.append(df_tmp) 

result = pd.concat(dfarray, axis=1, sort=False) 
result.columns.name = 'CLOSE' 
result

pmorlen · July 2019

Thank you - this did return a dataframe, however, it returns the data for only 1 of the 103 tickers in the 'rics' list. It returned the data for the last ticker in the list.

pmorlen · July 2019

Thank you - this did return a dataframe, however, it returns the data for only 1 of the 103 tickers in the 'rics' list. It returned the data for the last ticker in the list.

Jirapongse · July 2019

@pmorlen

Could you please share the rics list used in the code?

It may relate to the usage limit mentioned in the EIKON DATA API USAGE AND LIMITS GUIDELINE.

chavalit-jintamalit · July 2019

I can successfully receive multiple data point on multiple RIC.

pmorlen · July 2019

Hello @chavalit.jintamalit

Again, thank you for your assistance. I should have started this thread with my goal, which is to calculate the correlations for a list of stocks over a certain time period. In my current code, it is a period of 6 months. Your results are close to what I need, however, to calculate the correlations I would like to see each price date represent a row in the dataframe.

pmorlen · July 2019

@jirapongse.phuriphanvichai

Thank you again for your time. I was incorrect in stating that your code returned the correct results. I am attaching 3 images showing 1) the rics list used, 2) the intermediate results of your code, specifically dfarray, and 3) the error I'm receiving at 'result = pd.concat(dfarray, axis=1, sort=False)'. Third image will be in a separate comment.

rics.jpg

dfarray.jpg

pmorlen · July 2019

@jirapongse.phuriphanvichai
"

Continuing comment above (system would not let me attach a 3rd image).

error.jpg

chavalit-jintamalit · July 2019

@pmorlen

Can you give me example of the data in DF which you would like to have ?

pmorlen · July 2019

Attached is an example. From this DF, I calculate the log returns, then correlations. What caused me problems was the data limit for the get_timeseries() function, which is why I am trying get_data. Thank you.

df-example.jpg

chavalit-jintamalit · July 2019

Hi @pmorlen

Just an idea, if you hit the limit, you can split the request and delay it.

So you can query period1, period2, periodN and combine them together.

See this sample:

pmorlen · July 2019

@jirapongse.phuriphanvichai

I thought this information may be useful to you:

Attached is an example of the DF I'm trying to create. From this DF, I calculate the log returns, then correlations. What caused me problems was the data limit for the get_timeseries() function, which is why I am trying get_data. Thank you.

df-example.jpg

pmorlen · August 2019

@jirapongse.phuriphanvichai

Excellent! Exactly what I needed. Thank you!

get_timeseries returning incorrect time period from what was declared

Best Answer

Answers

Categories