get_timeseries returning incorrect time period from what was declared

I have this code:

data = ek.get_timeseries(rics, fields='CLOSE',
start_date='2019-01-01',
end_date='2019-06-30')

but it returns data starting 5/20/2019 and ignores the start_date declared in the code:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 29 entries, 2019-05-20 to 2019-06-28
Columns: 103 entries, MMM to YUM
dtypes: float64(103)
memory usage: 23.6 KB

Based on other posts, it appears to be related to the 3000 shared row limit.

Here is a snippet of what I'd like returned - with daily closing price dates going back to 1/1/2019 for 103 equity tickers in total:

Close Date MMM AFL T ABBV ABT

2019-05-28 163.35 51.30 31.93 78.03 75.71

2019-05-29 161.40 51.43 31.91 78.06 75.67

Here is my current data retrieval code:

'rics' is a list of tickers

data3 = ek.get_data(rics, ['TR.PriceClose', 'TR.PriceCloseDate'], {'Sdate':'2019-01-01', 'EDate':'2019-06-30'})

But it returns a tuple:

(      Instrument  Price Close                  Date
0 MMM 190.95 2019-01-02T00:00:00Z
1 MMM 183.76 2019-01-03T00:00:00Z
2 MMM 191.32 2019-01-04T00:00:00Z
3 MMM 190.88 2019-01-07T00:00:00Z
4 MMM 191.68 2019-01-08T00:00:00Z
... ... ... ...
12767 YUM 110.66 2019-06-24T00:00:00Z
12768 YUM 110.31 2019-06-25T00:00:00Z
12769 YUM 110.12 2019-06-26T00:00:00Z
12770 YUM 110.56 2019-06-27T00:00:00Z
12771 YUM 110.67 2019-06-28T00:00:00ZI'm having trouble setting up 'get_data()' to return a dataframe, instead of a tuple.Can you please provide some guidance to correct?Thank you

Best Answer

  • Jirapongse
    Jirapongse admin
    Answer ✓

    @pmorlen

    I have modified the code as shown below.

    data3 = ek.get_data(rics,['TR.PriceCloseDate','TR.PriceClose'], {'Sdate':'2019-01-01', 'EDate':'2019-06-30'}) 

    dfs = dict(tuple(data3[0].groupby('Instrument')))
    dfarray = []
    for ric, data in dfs.items():
    df_tmp = dfs[ric].dropna()
    df_tmp = df_tmp.drop_duplicates()
    df_tmp = df_tmp.set_index('Date')
    df_tmp = df_tmp.drop(['Instrument'], axis=1)
    df_tmp = df_tmp.rename(columns={"Price Close":ric})
    dfarray.append(df_tmp)

    result = pd.concat(dfarray, axis=1, sort=False)
    result.columns.name = 'CLOSE'
    result

    It uses df.dropna() and df.drop_duplicates().

Answers

  • Hi @pmorlen

    To get dataframe into data3, you can follow this code:

    data3,err = ek.get_data(rics, ['TR.PriceClose', 'TR.PriceCloseDate'], {'Sdate':'2019-01-01', 'EDate':'2019-06-30'})

    or

    data3 = ek.get_data(rics, ['TR.PriceClose', 'TR.PriceCloseDate'], {'Sdate':'2019-01-01', 'EDate':'2019-06-30'}) [0]

    data3 will be dataframe.

    For datapoint limitation, please refer to this document on the last section, "

    Try to detect and address datapoint limits"

  • The dataframe's format returned from get_data and get_timeseries is different.

    I have implemented a simple script to make the dataframe from get_data similar to get_timeseries.

    data3 = ek.get_data(rics,['TR.PriceCloseDate','TR.PriceClose'], {'Sdate':'2019-01-01', 'EDate':'2019-06-30'}) 

    dfs = dict(tuple(data3[0].groupby('Instrument')))
    dfarray = []
    for ric, data in dfs.items():
    df_tmp = dfs[ric][pd.notnull(dfs[ric]['Date'])]
    df_tmp = df_tmp.set_index('Date')
    df_tmp = df_tmp.drop(['Instrument'], axis=1)
    df_tmp = df_tmp.rename(columns={"Price Close":ric})
    dfarray.append(df_tmp)

    result = pd.concat(dfarray, axis=1, sort=False)
    result.columns.name = 'CLOSE'
    result
  • Thank you - this did return a dataframe, however, it returns the data for only 1 of the 103 tickers in the 'rics' list. It returned the data for the last ticker in the list.

  • Thank you - this did return a dataframe, however, it returns the data for only 1 of the 103 tickers in the 'rics' list. It returned the data for the last ticker in the list.

  • @pmorlen

    Could you please share the rics list used in the code?

    It may relate to the usage limit mentioned in the EIKON DATA API USAGE AND LIMITS GUIDELINE.

  • I can successfully receive multiple data point on multiple RIC.

    image

  • Hello @chavalit.jintamalit

    Again, thank you for your assistance. I should have started this thread with my goal, which is to calculate the correlations for a list of stocks over a certain time period. In my current code, it is a period of 6 months. Your results are close to what I need, however, to calculate the correlations I would like to see each price date represent a row in the dataframe.

  • @jirapongse.phuriphanvichai

    Thank you again for your time. I was incorrect in stating that your code returned the correct results. I am attaching 3 images showing 1) the rics list used, 2) the intermediate results of your code, specifically dfarray, and 3) the error I'm receiving at 'result = pd.concat(dfarray, axis=1, sort=False)'. Third image will be in a separate comment.

    rics.jpg

    dfarray.jpg

  • @jirapongse.phuriphanvichai
    "

    Continuing comment above (system would not let me attach a 3rd image).

    error.jpg

  • @pmorlen

    Can you give me example of the data in DF which you would like to have ?

  • Attached is an example. From this DF, I calculate the log returns, then correlations. What caused me problems was the data limit for the get_timeseries() function, which is why I am trying get_data. Thank you.

    df-example.jpg

  • Hi @pmorlen

    Just an idea, if you hit the limit, you can split the request and delay it.

    So you can query period1, period2, periodN and combine them together.

    See this sample:

    image

  • @jirapongse.phuriphanvichai

    I thought this information may be useful to you:

    Attached is an example of the DF I'm trying to create. From this DF, I calculate the log returns, then correlations. What caused me problems was the data limit for the get_timeseries() function, which is why I am trying get_data. Thank you.

    df-example.jpg

  • @jirapongse.phuriphanvichai

    Excellent! Exactly what I needed. Thank you!