I am attempting to get data on a set of RICs and getting error.

I am attempting to get data on a set of rics for these variables (['TR.InvestorFullName', 'TR.InvestorFullName.investorid','TR.InvestorFullName.investorpermid','TR.CategoryOwnershipPct','TR.InvestorType','TR.InvParentType', 'TR.PctOfSharesOutHeld', 'TR.PctofSharesOutHeld.Date', 'TR.InvAddrCountry', 'TR.OwnTrnverRating', 'TR.OwnTurnover','TR.OwnTurnover.date','TR.InvInvestmentStyleCode']) across 20 years (suffixes_list= [('_1999','_2000'),('_2001','_2001'),('_2001','_2002'),('_2003','_2003'),('_2003','_2004'),('_2005','_2005'),('_2005','_2006'),('_2007','_2007'), ('_2007','_2008'), ('_2009','_2009'),('_2009','_2010'),('_2011',"_2011"),('_2011',"_2012"),('_2013',"_2013"),('_2013',"_2014"),('_2015',"_2015"), ('_2015',"_2016"),('_2017',"_2017"),('_2017',"_2018"),('_2019',"_2019"),('_2019',"_2020"),('_2021',"_2021"),('_2021',"_2022")] . I am running this chunk by changing the value of RICs from my list.


# Chunk for RIC at index 238 (Number) instruments = sortedrics[238] # Adjust the index for each RIC data_frames = [] for i in range(-24, -1): s_date = str(i) e_date = str(i) df, err = ek.get_data(instruments, fields, {'SDate': s_date, 'Edate': e_date, 'Frq':"Y"}) keylist = df.keys() for item in keylist: df[item] = df[item].astype(str) print(s_date, e_date) data_frames.append(df) merged_df = data_frames[0] for i in range(1, len(data_frames)): merged_df = pd.merge(merged_df, data_frames[i], on='Investor Full Name', how='outer', suffixes=suffixes_list[i-1]) merged_df.to_csv(f'{instruments}.csv')


I encounter memory errors. I am wondering if there are any suggestions on how to mitigate that. I tried looping this request, but API can only take 5 minutes per request, so it did not work well for me. Thanks.


The issue is that the API response pulls the request, but on the step of the merger, it is not consistent, meaning that after I get the memory error, I run the chunk again, and it works fine sometimes. It changes from one RIC to another.

Best Answer

Answers

  • @JoanneCamille.Andes

    Please share the runnable code and the error message.

    Therefore, we can run the code and verify the problem.

    If the application requests a lot of data, the request can be timed out by the server. However, the application can catch the exception and re-run that request.

  • @Jirapongse


    Here is the code;


    fields = ['TR.InvestorFullName', 'TR.InvestorFullName.investorid','TR.InvestorFullName.investorpermid','TR.CategoryOwnershipPct','TR.InvestorType','TR.InvParentType','TR.PctOfSharesOutHeld', 'TR.PctofSharesOutHeld.Date', 'TR.InvAddrCountry','TR.OwnTrnverRating', 'TR.OwnTurnover','TR.OwnTurnover.date','TR.InvInvestmentStyleCode']


    suffixes_list= [('_1999','_2000'),('_2001','_2001'),('_2001','_2002'),('_2003','_2003'),('_2003','_2004'),('_2005','_2005'),('_2005','_2006'),('_2007','_2007'),

    ('_2007','_2008'), ('_2009','_2009'),('_2009','_2010'),('_2011',"_2011"),('_2011',"_2012"),('_2013',"_2013"),('_2013',"_2014"),('_2015',"_2015"),

    ('_2015',"_2016"),('_2017',"_2017"),('_2017',"_2018"),('_2019',"_2019"),('_2019',"_2020"),('_2021',"_2021"),('_2021',"_2022")]

    # Chunk for RIC at index 282

    instruments = sortedrics['BROG.OQ'] # Adjust the index for each RIC

    data_frames = []

    for i in range(-24,-1):

    s_date = str(i)

    e_date = str(i)

    df, err = ek.get_data(instruments,fields, {'SDate':s_date, 'Edate': e_date, 'Frq':"Y"})

    keylist = df.keys()

    for item in keylist:

    df[item]= df[item].astype(str)

    print(s_date,e_date)

    data_frames.append(df)

    merged_df = data_frames[0]

    for i in range(1, len(data_frames)):

    merged_df = pd.merge(merged_df,data_frames[i],on='Investor Full Name',how='outer',suffixes=suffixes_list[i-1])

    merged_df.to_csv(f'{instruments}.csv')

  • @Jirapongse please see attached file for the full code.

    API Code.txt

    When I've tried running this in Codebook app it gives me this error:

    File "/tmp/ipykernel_213/1336849579.py", line 16

    = ek.get_timeseries('AAPL.O', # the RIC for Apple, Inc.

    ^

    SyntaxError: invalid syntax

  • Main issue is - API response pulls the request, but on the step of the merger, it is not consistent, meaning that after client gets the memory error, he run the chunk again, and it works fine sometimes. It changes from one RIC to another.

    May I know if this is a behavioral issue of the app or if this is due to the data client wants to retrieve which is related to the API data limitation? Full code was attached to the above comment.

  • @JoanneCamille.Andes

    To replicate this issue, we need the cut-down and runnable version of the code that we can run it without any modications.

    It is also better if you can scope down this issue by finding RICs and exact parameters that cause this issue.

    Moreover, please paste the source code in the code block when sharing the code.

    1709523269454.png


  • Hi @Jirapongse, we have attached the file with the full code as we are getting the error below when using the <Code> option. API Code.txt1709646575608.png

    Main issue is - API response pulls the request, but on the step of the merger, it is not consistent, meaning that after client gets the memory error, he run the chunk again, and it works fine sometimes. It changes from one RIC to another.
  • @Jirapongse may we please follow up the above?

  • The answer has been provided on this discussion.