Extracting data for 100,000 RICs from Refinitiv through Eikon/RDP

Hello,

I have a list of 100,000+ RICs in a .csv file and I want to create a code that opens the file and retrieves bonds' monthly midyield, midprice, ask price, bid price, ask yield, bid yield, ratings, and monthly transaction volume and saves the output in a new .csv file. Ideally, the code should read RICs by chunks and move on to another chunk automatically (this is where my code fails, it only retrieves data for 1 chunk).

I have read every single thread on this topic but since I have 0 technical background, I am struggling to optimise my code.

I would appreciate if you could edit my Eikon code and provide a sample code for RDP.

#Importing
import refinitiv.dataplatform as rdp
import json
import pandas as pd
import os
from pandas.io.json import json_normalize
from pydash.arrays import chunk
import time
symbologylookup_endpoint = rdp.Endpoint(session,
'https://api.refinitiv.com/discovery/symbology/v1/lookup')
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
import eikon as ek
session=rdp.open_desktop_session('XX')


#create a list of RICs
import csv
csvReader = csv.reader(open("Data.csv"))
instruments=[]
for row in csvReader:
instruments.append(row[0])


#Loop through RICs
content_df = []
Bond_fields = ["TR.MIDYIELD","TR.MIDPRICE","TR.ASKPRICE.date","TR.ASKPRICE", "TR.BIDPRICE", "TR.ASKYIELD","TR.BIDYIELD",'TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).RatingSourceDescription','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY)','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).date']
chunklist=2000
for i in range(0, len(instruments), chunklist):
stock = instruments[i:i+chunklist]
df, err = ek.get_data(stock, Bond_fields,{'CALCMETHOD':'SUM','Frq': 'M', 'SDate': '2016-01-01','EDate': '2021-12-31'})
content_df.append(df)
content_df = pd.concat(content_df)
content_df


@zoya faberov @jason.ramchandani01 @j.dessain

Best Answer

  • zoya faberov
    Answer ✓

    Hello @georgecambridge001 ,

    As I see that you have been provided with the solution on the other discussion thread currently active, I will avoid duplicating the same answer and refer to that discussion instead.

    Please note, that RD library is strategic, and will continue actively evolving and improving going forward in time, while RDP is legacy remains available as is, at the time of this writing. It is recommended to select RD Library Python for any new requirement implementation, as suggested and illustrated with examples by @umer.nalla and @nick.zincone.

Answers

  • Hi @georgecambridge001,

    I have updated your code, please test it and let us know if it's working for you:


    #Importing
    import refinitiv.dataplatform as rdp
    import json
    import pandas as pd
    import os
    from pandas.io.json import json_normalize
    from pydash.arrays import chunk
    import time
    symbologylookup_endpoint = rdp.Endpoint(session, 'https://api.refinitiv.com/discovery/symbology/v1/lookup')
    pd.set_option("display.max_columns", None)
    pd.set_option("display.max_rows", None)
    import eikon as ek
    session=rdp.open_desktop_session('XX')


    #create a list of RICs
    import csv
    csvReader = csv.reader(open("Data.csv"))
    instruments=[]
    for row in csvReader:
    instruments.append(row[0])


    #Loop through RICs
    content_df = []
    Bond_fields = ["TR.MIDYIELD","TR.MIDPRICE","TR.ASKPRICE.date","TR.ASKPRICE", "TR.BIDPRICE", "TR.ASKYIELD","TR.BIDYIELD",'TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).RatingSourceDescription','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY)','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).date']
    chunklist=2000
    for i in range(0, len(instruments), chunklist):
    stock = instruments[i:i+chunklist]
    df, err = ek.get_data(stock, Bond_fields,{'CALCMETHOD':'SUM','Frq': 'M', 'SDate': '2016-01-01','EDate': '2021-12-31'})
    content_df.append(df)
    final_df = pd.concat(content_df)
    final_df
  • ezyzip.zip

    Hello,

    No it does not work. I get this error for concat:

    InvalidIndexError: Reindexing only valid with uniquely valued Index objects

    Please note that some parts of the code become red once I paste them in the cell (see attached photo). when I moved them to the right side and run the code, I got N/A for all the columns. I am also attaching the original .csv file for your reference. I have been struggling to complete this project for the last 3 weeks and would appreciate it if you could test the code in your environment. Please find attached a sample of 10,000 RICs.screenshot-2022-06-06-at-100522.png

    Finally, I would also like to request a sample code for RDP that performs identical operation but with RDP.

    @Shadab Hussain

    @zoya faberov @jason.ramchandani01 @j.dessain

  • Hello @georgecambridge001 ,

    In addition to the solution from @Shadab Hussain, you may wish to watch out for the duplicates. If concatenated DataFrames contain either columns that are named the same ( for example "DATE") or rows that contain identical values, this may result in "InvaldiIndexError" on concatenation.

    So an approach similar to:

    #Loop through RICs
    content_df = []
    Bond_fields = ["TR.MIDYIELD","TR.MIDPRICE","TR.ASKPRICE.date","TR.ASKPRICE", "TR.BIDPRICE", "TR.ASKYIELD","TR.BIDYIELD",'TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).RatingSourceDescription','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY)','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).date']
    chunklist=1000
    for i in range(0, len(instruments), chunklist):
    stock = instruments[i:i+chunklist]
    df, err = ek.get_data(stock, Bond_fields,{'CALCMETHOD':'SUM','Frq': 'M', 'SDate': '2016-01-01','EDate': '2021-12-31'})
    print('<<<iter=',str(i),'>>>\nRows=',str(len(df.index)),',Rows duplicates dropped=',str(len(df.drop_duplicates().index)),',Columns=',str(len(df.columns)),'df=\n')#, df,'\n')
    df.columns = ['INSTRUMENT','MIDY','MIDP','ASKP_DATE','ASKP','BIDP','ASKY','BIDY','RATINGDESCR','RATING','RATING_DATE']
      df.drop_duplicates(inplace=True)

    content_df.append(df)
    final_df = pd.concat(content_df,ignore_index=True)
    print(final_df)

    Renaming all columns with unique names and dripping duplicate rows, could help you avoid the error. Any approach that will avoid any duplicates should work.

    RD library is the strategic solution, RDP library being it's predecessor, and you can easily incorporate it just by replacing:

    import eikon as ek

    with

    import refinitiv.data.eikon as ek

    Hope this helps

  • Hello @zoya faberov and thanks for your reply.

    I have tested your code but unfortunately, like my previous code it does not move to another chunk. When I specify chunklist=1000, it simply returns bond data for only 1000 RICs and does not move to the next 1000 RIC automatically. What can be the solution to this issue? I have a list of 100,000 RICs and want to automate the process.

    Additionally, I get the following error

    AttributeError: 'NoneType' object has no attribute 'logger'

    when I replace


    import eikon as ek

    with


    import refinitiv.data.eikon as ek

    @zoya faberov @Shadab Hussain @jason.ramchandani01 @j.dessain

  • Hello @georgecambridge001,

    As on my side this chunking code is working,

    Let's try to understand better, what is happening on your side.

    1. You are running running outside of CodeBook, running naked .py script rather then Jupyter Lab/Notebook (.ipynb)?

    2. Try breaking after the first iteration and printing the first result DataFrame and any errors that come up on your first iteration. For example:

    #Loop through RICs
    content_df = []
    Bond_fields = ["TR.MIDYIELD","TR.MIDPRICE","TR.ASKPRICE.date","TR.ASKPRICE", "TR.BIDPRICE", "TR.ASKYIELD","TR.BIDYIELD",'TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).RatingSourceDescription','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY)','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).date']
    chunklist=1000
    for i in range(0, len(instruments), chunklist):
    stock = instruments[i:i+chunklist]
    df, err = ek.get_data(stock, Bond_fields,{'CALCMETHOD':'SUM','Frq': 'M', 'SDate': '2016-01-01','EDate': '2021-12-31'})
    print('<<<iter=',str(i),'>>>\nRows=',str(len(df.index)),',Rows duplicates dropped=',str(len(df.drop_duplicates().index)),',Columns=',str(len(df.columns)),'df=\n')#, df,'\n')
    df.columns = ['INSTRUMENT','MIDY','MIDP','ASKP_DATE','ASKP','BIDP','ASKY','BIDY','RATINGDESCR','RATING','RATING_DATE']
    df.drop_duplicates(inplace=True)
     print(df,'\n')
      print('***iter=',str(i),'***\nerr=',err)

    content_df.append(df)
     break
    final_df = pd.concat(content_df,ignore_index=True)
    print('Len=',str(len(final_df.index)))
    print(final_df)

    My DataFrame output setups prior to this step look like:

    pd.set_option('display.max_rows', 10)
    pd.set_option('display.max_columns', None)
    pd.set_option('display.width', None)
    pd.set_option('display.max_colwidth', 16)

    Therefore, my output looks like:

    <<<iter= 0 >>>
    Rows= 46428 ,Rows duplicates dropped= 22809 ,Columns= 11 df=

    INSTRUMENT MIDY MIDP ASKP_DATE ASKP BIDP ASKY BIDY RATINGDESCR RATING RATING_DATE
    0 00003#AB9=RRPS 35.659348 1001.72665 2019-09-30T0... 1003.0972 1000.3561 35.159632 36.159632
    1 00003#AB9=RRPS 80.06577 2309.1513 2019-10-31T0... 2312.2468 2306.0558 78.916549 81.216549
    2 00003#AB9=RRPS 74.969156 2103.19835 2019-11-30T0... 2105.9374 2100.4593 73.91978 76.01978
    3 00003#AB9=RRPS 78.872257 2202.41415 2019-12-31T0... 2205.2 2199.6283 77.772943 79.972943
    4 00003#AB9=RRPS 79.957356 2308.58905 2020-01-31T0... 2311.4224 2305.7557 78.807996 81.107996
    ... ... ... ... ... ... ... ... ... ... ... ...
    46423 004421QQ9=FINR <NA> <NA> <NA> <NA> <NA> <NA>
    46424 004421QR7=FINR <NA> <NA> <NA> <NA> <NA> <NA>
    46425 004421RH8=FINR <NA> <NA> <NA> <NA> <NA> <NA>
    46426 004421TQ6=FINR <NA> <NA> <NA> <NA> <NA> <NA>
    46427 004421UC5=FINR <NA> <NA> <NA> <NA> <NA> <NA>

    [22809 rows x 11 columns]

    and my errors ( which are not empty, but not fatal) look like:

    ***iter= 0 ***
    err= [{'code': 416, 'col': 8, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).RatingSourceDescription' and some specific identifier(s).", 'row': 112}, {'code': 416, 'col': 9, 'message': "Unable to collect data for the field 'TR.GR.Rating' and some specific identifier(s).", 'row': 112}, {'code': 416, 'col': 10, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).DATE' and some specific identifier(s).", 'row': 112}, {'code': 416, 'col': 8, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).RatingSourceDescription' and some specific identifier(s).", 'row': 113}, {'code': 416, 'col': 9, 'message': "Unable to collect data for the field 'TR.GR.Rating' and some specific identifier(s).", 'row': 113}, {'code': 416, 'col': 10, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).DATE' and some specific identifier(s).", 'row': 113}, {'code': 416, 'col': 8, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).RatingSourceDescription' and some specific identifier(s).", 'row': 114}, {'code': 416, 'col': 9, 'message': "Unable to collect data for the field 'TR.GR.Rating' and some specific identifier(s).", 'row': 114}, {'code': 416, 'col': 10, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).DATE' and some specific identifier(s).", 'row': 114}, {'code': 416, 'col': 8, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).RatingSourceDescription' and some specific identifier(s).", 'row': 115}, {'code': 416, 'col': 9,
    ...

    3. My eikon library installed is version 1.1.14

    4. I do not reproduce logger error, able to replace the library and run the same code succefully. I have refinitiv-data library version 1.0.0b9 installed, rather then the latest 1.0.0b13. This is because some issues were introduced in version 13, that library development is looking into, so I am holding out for the next version, and staying on version b9 for now.

    Let us know how (2) works on your side and what you find?



  • Hello @zoya faberov and @Shadab Hussain , thank you for your support. It works better now.

    However, as I stated in this thread before, I want to retrieve identical information with historical_pricing.summaries.Definition (not rd, only RDP historical summaries because for some RICs I found out that RDP retrieves information while RD does not). Will you be so kind as to have a look at my sample code and optimise it so that it can once again open attached (updated, non-duplicated) .csv file with RICs, go through each and return identical bond-related information?

    Thank you so much for your help. You are making peoples' lives so much easier.

    RICs.zipNew RDP.txt