Extracting data for 100,000 RICs from Refinitiv through Eikon/RDP
Hello,
I have a list of 100,000+ RICs in a .csv file and I want to create a code that opens the file and retrieves bonds' monthly midyield, midprice, ask price, bid price, ask yield, bid yield, ratings, and monthly transaction volume and saves the output in a new .csv file. Ideally, the code should read RICs by chunks and move on to another chunk automatically (this is where my code fails, it only retrieves data for 1 chunk).
I have read every single thread on this topic but since I have 0 technical background, I am struggling to optimise my code.
I would appreciate if you could edit my Eikon code and provide a sample code for RDP.
#Importing
import refinitiv.dataplatform as rdp
import json
import pandas as pd
import os
from pandas.io.json import json_normalize
from pydash.arrays import chunk
import time
symbologylookup_endpoint = rdp.Endpoint(session,
'https://api.refinitiv.com/discovery/symbology/v1/lookup')
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
import eikon as ek
session=rdp.open_desktop_session('XX')
#create a list of RICs
import csv
csvReader = csv.reader(open("Data.csv"))
instruments=[]
for row in csvReader:
instruments.append(row[0])
#Loop through RICs
content_df = []
Bond_fields = ["TR.MIDYIELD","TR.MIDPRICE","TR.ASKPRICE.date","TR.ASKPRICE", "TR.BIDPRICE", "TR.ASKYIELD","TR.BIDYIELD",'TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).RatingSourceDescription','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY)','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).date']
chunklist=2000
for i in range(0, len(instruments), chunklist):
stock = instruments[i:i+chunklist]
df, err = ek.get_data(stock, Bond_fields,{'CALCMETHOD':'SUM','Frq': 'M', 'SDate': '2016-01-01','EDate': '2021-12-31'})
content_df.append(df)
content_df = pd.concat(content_df)
content_df
Best Answer
-
Hello @georgecambridge001 ,
As I see that you have been provided with the solution on the other discussion thread currently active, I will avoid duplicating the same answer and refer to that discussion instead.
Please note, that RD library is strategic, and will continue actively evolving and improving going forward in time, while RDP is legacy remains available as is, at the time of this writing. It is recommended to select RD Library Python for any new requirement implementation, as suggested and illustrated with examples by @umer.nalla and @nick.zincone.
0
Answers
-
I have updated your code, please test it and let us know if it's working for you:
#Importing
import refinitiv.dataplatform as rdp
import json
import pandas as pd
import os
from pandas.io.json import json_normalize
from pydash.arrays import chunk
import time
symbologylookup_endpoint = rdp.Endpoint(session, 'https://api.refinitiv.com/discovery/symbology/v1/lookup')
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
import eikon as ek
session=rdp.open_desktop_session('XX')
#create a list of RICs
import csv
csvReader = csv.reader(open("Data.csv"))
instruments=[]
for row in csvReader:
instruments.append(row[0])
#Loop through RICs
content_df = []
Bond_fields = ["TR.MIDYIELD","TR.MIDPRICE","TR.ASKPRICE.date","TR.ASKPRICE", "TR.BIDPRICE", "TR.ASKYIELD","TR.BIDYIELD",'TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).RatingSourceDescription','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY)','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).date']
chunklist=2000
for i in range(0, len(instruments), chunklist):
stock = instruments[i:i+chunklist]
df, err = ek.get_data(stock, Bond_fields,{'CALCMETHOD':'SUM','Frq': 'M', 'SDate': '2016-01-01','EDate': '2021-12-31'})
content_df.append(df)
final_df = pd.concat(content_df)
final_df1 -
Hello,
No it does not work. I get this error for concat:
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Please note that some parts of the code become red once I paste them in the cell (see attached photo). when I moved them to the right side and run the code, I got N/A for all the columns. I am also attaching the original .csv file for your reference. I have been struggling to complete this project for the last 3 weeks and would appreciate it if you could test the code in your environment. Please find attached a sample of 10,000 RICs.
Finally, I would also like to request a sample code for RDP that performs identical operation but with RDP.
0 -
Hello @georgecambridge001 ,
In addition to the solution from @Shadab Hussain, you may wish to watch out for the duplicates. If concatenated DataFrames contain either columns that are named the same ( for example "DATE") or rows that contain identical values, this may result in "InvaldiIndexError" on concatenation.
So an approach similar to:
#Loop through RICs
content_df = []
Bond_fields = ["TR.MIDYIELD","TR.MIDPRICE","TR.ASKPRICE.date","TR.ASKPRICE", "TR.BIDPRICE", "TR.ASKYIELD","TR.BIDYIELD",'TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).RatingSourceDescription','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY)','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).date']
chunklist=1000
for i in range(0, len(instruments), chunklist):
stock = instruments[i:i+chunklist]
df, err = ek.get_data(stock, Bond_fields,{'CALCMETHOD':'SUM','Frq': 'M', 'SDate': '2016-01-01','EDate': '2021-12-31'})
print('<<<iter=',str(i),'>>>\nRows=',str(len(df.index)),',Rows duplicates dropped=',str(len(df.drop_duplicates().index)),',Columns=',str(len(df.columns)),'df=\n')#, df,'\n')
df.columns = ['INSTRUMENT','MIDY','MIDP','ASKP_DATE','ASKP','BIDP','ASKY','BIDY','RATINGDESCR','RATING','RATING_DATE']
df.drop_duplicates(inplace=True)
content_df.append(df)
final_df = pd.concat(content_df,ignore_index=True)
print(final_df)Renaming all columns with unique names and dripping duplicate rows, could help you avoid the error. Any approach that will avoid any duplicates should work.
RD library is the strategic solution, RDP library being it's predecessor, and you can easily incorporate it just by replacing:
import eikon as ek
with
import refinitiv.data.eikon as ek
Hope this helps
0 -
Hello @zoya faberov and thanks for your reply.
I have tested your code but unfortunately, like my previous code it does not move to another chunk. When I specify chunklist=1000, it simply returns bond data for only 1000 RICs and does not move to the next 1000 RIC automatically. What can be the solution to this issue? I have a list of 100,000 RICs and want to automate the process.
Additionally, I get the following error
AttributeError: 'NoneType' object has no attribute 'logger'
when I replace
import eikon as ekwith
import refinitiv.data.eikon as ek@zoya faberov @Shadab Hussain @jason.ramchandani01 @j.dessain
0 -
Hello @georgecambridge001,
As on my side this chunking code is working,
Let's try to understand better, what is happening on your side.
1. You are running running outside of CodeBook, running naked .py script rather then Jupyter Lab/Notebook (.ipynb)?
2. Try breaking after the first iteration and printing the first result DataFrame and any errors that come up on your first iteration. For example:
#Loop through RICs
content_df = []
Bond_fields = ["TR.MIDYIELD","TR.MIDPRICE","TR.ASKPRICE.date","TR.ASKPRICE", "TR.BIDPRICE", "TR.ASKYIELD","TR.BIDYIELD",'TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).RatingSourceDescription','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY)','TR.GR.Rating(BondRatingSrc=FTC:S&P:MDY).date']
chunklist=1000
for i in range(0, len(instruments), chunklist):
stock = instruments[i:i+chunklist]
df, err = ek.get_data(stock, Bond_fields,{'CALCMETHOD':'SUM','Frq': 'M', 'SDate': '2016-01-01','EDate': '2021-12-31'})
print('<<<iter=',str(i),'>>>\nRows=',str(len(df.index)),',Rows duplicates dropped=',str(len(df.drop_duplicates().index)),',Columns=',str(len(df.columns)),'df=\n')#, df,'\n')
df.columns = ['INSTRUMENT','MIDY','MIDP','ASKP_DATE','ASKP','BIDP','ASKY','BIDY','RATINGDESCR','RATING','RATING_DATE']
df.drop_duplicates(inplace=True)
print(df,'\n')
print('***iter=',str(i),'***\nerr=',err)
content_df.append(df)
break
final_df = pd.concat(content_df,ignore_index=True)
print('Len=',str(len(final_df.index)))
print(final_df)My DataFrame output setups prior to this step look like:
pd.set_option('display.max_rows', 10)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 16)Therefore, my output looks like:
<<<iter= 0 >>>
Rows= 46428 ,Rows duplicates dropped= 22809 ,Columns= 11 df=
INSTRUMENT MIDY MIDP ASKP_DATE ASKP BIDP ASKY BIDY RATINGDESCR RATING RATING_DATE
0 00003#AB9=RRPS 35.659348 1001.72665 2019-09-30T0... 1003.0972 1000.3561 35.159632 36.159632
1 00003#AB9=RRPS 80.06577 2309.1513 2019-10-31T0... 2312.2468 2306.0558 78.916549 81.216549
2 00003#AB9=RRPS 74.969156 2103.19835 2019-11-30T0... 2105.9374 2100.4593 73.91978 76.01978
3 00003#AB9=RRPS 78.872257 2202.41415 2019-12-31T0... 2205.2 2199.6283 77.772943 79.972943
4 00003#AB9=RRPS 79.957356 2308.58905 2020-01-31T0... 2311.4224 2305.7557 78.807996 81.107996
... ... ... ... ... ... ... ... ... ... ... ...
46423 004421QQ9=FINR <NA> <NA> <NA> <NA> <NA> <NA>
46424 004421QR7=FINR <NA> <NA> <NA> <NA> <NA> <NA>
46425 004421RH8=FINR <NA> <NA> <NA> <NA> <NA> <NA>
46426 004421TQ6=FINR <NA> <NA> <NA> <NA> <NA> <NA>
46427 004421UC5=FINR <NA> <NA> <NA> <NA> <NA> <NA>
[22809 rows x 11 columns]and my errors ( which are not empty, but not fatal) look like:
***iter= 0 ***
err= [{'code': 416, 'col': 8, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).RatingSourceDescription' and some specific identifier(s).", 'row': 112}, {'code': 416, 'col': 9, 'message': "Unable to collect data for the field 'TR.GR.Rating' and some specific identifier(s).", 'row': 112}, {'code': 416, 'col': 10, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).DATE' and some specific identifier(s).", 'row': 112}, {'code': 416, 'col': 8, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).RatingSourceDescription' and some specific identifier(s).", 'row': 113}, {'code': 416, 'col': 9, 'message': "Unable to collect data for the field 'TR.GR.Rating' and some specific identifier(s).", 'row': 113}, {'code': 416, 'col': 10, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).DATE' and some specific identifier(s).", 'row': 113}, {'code': 416, 'col': 8, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).RatingSourceDescription' and some specific identifier(s).", 'row': 114}, {'code': 416, 'col': 9, 'message': "Unable to collect data for the field 'TR.GR.Rating' and some specific identifier(s).", 'row': 114}, {'code': 416, 'col': 10, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).DATE' and some specific identifier(s).", 'row': 114}, {'code': 416, 'col': 8, 'message': "Unable to collect data for the field 'TR.GR.RATING(BONDRATINGSRC=FTC:S&P:MDY).RatingSourceDescription' and some specific identifier(s).", 'row': 115}, {'code': 416, 'col': 9,
...3. My eikon library installed is version 1.1.14
4. I do not reproduce logger error, able to replace the library and run the same code succefully. I have refinitiv-data library version 1.0.0b9 installed, rather then the latest 1.0.0b13. This is because some issues were introduced in version 13, that library development is looking into, so I am holding out for the next version, and staying on version b9 for now.
Let us know how (2) works on your side and what you find?
0 -
Hello @zoya faberov and @Shadab Hussain , thank you for your support. It works better now.
However, as I stated in this thread before, I want to retrieve identical information with historical_pricing.summaries.Definition (not rd, only RDP historical summaries because for some RICs I found out that RDP retrieves information while RD does not). Will you be so kind as to have a look at my sample code and optimise it so that it can once again open attached (updated, non-duplicated) .csv file with RICs, go through each and return identical bond-related information?
Thank you so much for your help. You are making peoples' lives so much easier.
0
Categories
- All Categories
- 6 AHS
- 37 Alpha
- 161 App Studio
- 4 Block Chain
- 4 Bot Platform
- 16 Connected Risk APIs
- 47 Data Fusion
- 30 Data Model Discovery
- 608 Datastream
- 1.3K DSS
- 577 Eikon COM
- 4.9K Eikon Data APIs
- 7 Electronic Trading
- Generic FIX
- 7 Local Bank Node API
- Trading API
- 2.7K Elektron
- 1.3K EMA
- 236 ETA
- 519 WebSocket API
- 33 FX Venues
- 10 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 20 Messenger Bot
- 2 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 59 Open Calais
- 264 Open PermID
- 39 Entity Search
- 2 Org ID
- PAM
- PAM - Logging
- 8.4K Private Comments
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 20 RDMS
- 1.4K Refinitiv Data Platform
- 367 Refinitiv Data Platform Libraries
- 3 Refinitiv Due Diligence
- LSEG Due Diligence Portal API
- 3 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.1K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 10 World-Check Customer Risk Screener
- 990 World-Check One
- 44 World-Check One Zero Footprint
- 45 Side by Side Integration API
- Test Space
- 3 Thomson One Smart
- 1.2K TR Internal
- Global Hackathon 2015
- 2 Specialists Who Code
- 10 TR Knowledge Graph
- 150 Transactions
- 142 REDI API
- 1.7K TREP APIs
- 4 CAT
- 21 DACS Station
- 117 Open DACS
- 1.1K RFA
- 103 UPA
- 172 TREP Infrastructure
- 224 TRKD
- 886 TRTH
- 5 Velocity Analytics
- 5 Wealth Management Web Services
- 59 Workspace SDK
- 9 Element Framework
- 5 Grid
- 13 World-Check Data File
- Yield Book Analytics
- 46 中文论坛