Determine the ideal number of fields and assets to download in each query, to avoid null data or emp
The purpose of this case is to determine the ideal number of fields and assets to download in each query, to avoid null data or empty dataframes. I have noticed that when I try to download various metrics and assets, sometimes the data returned is null or the dataframes are empty. However, this issue is variable; I can run the same code again, and in the second iteration, the data might be complete. Additionally, the issue varies with RICs: for some RICs, this problem occurs more frequently than for others.
Moreover, I have observed that the same set of metrics may work fine for a given set of tickers, but if I add another ticker to that list, problems might arise. This inconsistency is also variable, as sometimes it occurs and sometimes it does not.
It is also likely that the problem is not in the code itself. The code returns a dictionary of dictionaries, where each key is a RIC and the values are dataframes containing the metrics. I compare each iteration to identify inconsistencies or issues with the data retrieval process. Do you know why is this happening? or if you know the most efficient way to retrieve huge lists of fields and RIC´s.
# Libraries
import pandas as pd
import time
import logging
import refinitiv.data as rd
rd.open_session()
#logging config
logging.basicConfig(level=logging.INFO, format='%(message)s')
# Logger Config
logger = logging.getLogger('myAppLogger')
logger.setLevel(logging.INFO)
formatter = logging.Formatter('%(message)s')
pd.set_option('future.no_silent_downcasting', True)
def rename_duplicate_columns(df: pd.DataFrame) -> pd.DataFrame:
"""
Renames duplicate columns in the dataframe by
appending '_loc_curr' and '_usd'
to the first and second occurrence of each
duplicate column name, respectively.
Parameters
----------
df : pd.DataFrame
The dataframe containing columns that need
to be renamed if duplicates are found.
Returns
-------
df : pd.DataFrame
The dataframe with renamed columns to
handle duplicates.
"""
cols = pd.Series(df.columns)
for dup in cols[cols.duplicated()].unique():
cols[cols[cols == dup].index.values.tolist()] = [dup + '_loc_curr', dup + '_usd']
df.columns = cols
return df
def is_data_valid(data: pd.DataFrame):
# I check if the DataFrame obtained is valid (not empty and without only nan, since it can happen)
return not data.empty and not data.isnull().all().all()
def retrieve_and_process_fundamental_data(tickers: list[str],
metric_codes: list[str],
ticker_partition_size: int,
metric_partition_size: int,
max_attempts: int,
):
metrics_df = {ticker: pd.DataFrame() for ticker in tickers}
fund_data_error_metrics = []
fund_tickers_error = []
for ticker_index in range(0, len(tickers), ticker_partition_size):
ticker_partition = tickers[ticker_index:ticker_index + ticker_partition_size]
logger.info(f"Retrieving data for tickers: {ticker_partition}")
for metric_index in range(0, len(metric_codes), metric_partition_size):
partitioned_metrics = metric_codes[metric_index:metric_index + metric_partition_size]
logger.info(f"Retrieving data for tickers: {ticker_partition} and metrics: {partitioned_metrics}")
time.sleep(0.2)
logger.info("")
try:
for attempt in range(max_attempts):
logging.info(f"Attempt {attempt + 1} to fetch data.")
metrics_data = rd.get_data(tickers, metric_codes)
if is_data_valid(metrics_data):
logging.info("Successful query.")
break
else:
logging.error(f"Attempt {attempt + 1} failed. Retrying...")
for ticker in ticker_partition:
ticker_data = metrics_data[metrics_data['Instrument'] == ticker]
if metrics_df[ticker].empty:
metrics_df[ticker] = ticker_data
else:
metrics_df[ticker] = pd.concat([metrics_df[ticker], ticker_data], axis=1)
# Format
metrics_df[ticker] = metrics_df[ticker].rename(columns={'Period End Date': 'Date'})
# Removing duplicates for specific columns
columns_to_check = ['Date', 'Instrument', 'Income Statement Orig Announce Date']
for col in columns_to_check:
if col in metrics_df[ticker].columns:
first_col = metrics_df[ticker][col].iloc[:, 0] if isinstance(metrics_df[ticker][col], pd.DataFrame) else metrics_df[ticker][col]
metrics_df[ticker] = metrics_df[ticker].drop(columns=[c for c in metrics_df[ticker].columns if c == col][1:])
metrics_df[ticker][col] = first_col
metrics_df[ticker] = rename_duplicate_columns(metrics_df[ticker])
metrics_df[ticker] = metrics_df[ticker].infer_objects(copy=False)
logger.info(f"Done for {ticker}")
except Exception as e:
logger.error(f"Error retrieving data for tickers: {ticker_partition} and metrics: {partitioned_metrics} - {e}")
fund_data_error_metrics.append(partitioned_metrics)
fund_tickers_error.append(ticker_partition)
return metrics_df
ticker_partition_size = 10
metric_partition_size = 15
max_attempts = 3
tickers = [
"MSFT.OQ",
"AAPL.OQ",
"NVDA.OQ",
"GOOGL.OQ",
"AMZN.OQ",
"META.OQ",
"UNH.N",
"BRKa.N",
"LLY.N",
"2330.TW",
"AVGO.OQ",
"NOVOb.CO",
"V.N",
"TSLA.OQ",
"XOM.N",
"WMT.N",
"0700.HK",
"MA.N",
"CSCO.OQ",
"PG.N",
"005930.KS",
]
fields = [
"TR.F.ComStockBuybackNet(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0)",
"TR.F.NetIncAfterTax(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0)",
"TR.F.ShrUsedToCalcDilEPSTot(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0)",
"TR.F.IncAvailToComShr(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0)",
"TR.F.IncAvailToComShr(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0,Curn=USD)",
"TR.F.DebtInclPrefEqMinIntrTot(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0)",
"TR.F.CashSTInvstTot(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0)",
"TR.F.BookValuePerShr(SDate=2023-05-01,EDate=2024-05-30,Period=FQ0,Frq=FQ)",
"TR.F.MinIntr(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0)",
"TR.F.DebtTot(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0)",
"TR.F.OpProfBefNonRecurIncExpn(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0)",
"TR.F.OpProfBefNonRecurIncExpn(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0,Curn=USD)",
"TR.F.TotAssets(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0)",
"TR.F.NetCashFlowOp(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0)",
"TR.F.DebtLTTot(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0)",
"TR.F.ShHoldEqCom(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0)",
"TR.F.ShHoldEqCom(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0,Curn=USD)",
"TR.F.EBITDA(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0)",
"TR.F.EBITDA(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0,Curn=USD)",
"TR.CashFromOperatingAct(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,Period=FQ0,Curn=USD)",
"TR.CashFromOperatingAct(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,Period=FQ0)",
"TR.NetIncomeBeforeExtraItems(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,Period=FQ0)",
"TR.NetIncomeBeforeExtraItems(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,Period=FQ0,Curn=USD)",
"TR.AssetTurnover(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,Period=FQ0)",
"TR.LTDebtToTotalAssetsPct(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,Period=FQ0)",
"TR.SaleIssuanceOfCommon(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,Period=FQ0)",
"TR.SaleIssuanceOfCommon(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,Period=FQ0,Curn=USD)",
"TR.GrossProfitMarginIndustrialAndUtilityPct(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,Period=FQ0)",
"TR.F.TotCurrAssets(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0)",
"TR.F.TotCurrLiab(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0)",
"TR.F.LeveredFOCF(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0)",
"TR.F.LeveredFOCF(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0,Curn=USD)",
"TR.F.AvgNumShrOutst(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0)",
"TR.F.ShrUsedToCalcDilEPSTot(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0)",
"TR.ShortInterest(SDate=2014-01-01,EDate=2024-05-30,Frq=FQ)",
"TR.F.TotRevenue(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0)",
"TR.F.TotRevenue(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0,Curn=USD)",
"TR.F.IntrExpnNetOfIntrInc(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0)",
"TR.F.NetDebt(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0)",
"TR.F.ComShrOutsTot(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0)",
"TR.F.CashDivPaidComStockBuybackNet(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0)",
"TR.F.EPSBasicInclExordItemsComTot(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0)",
"TR.F.EPSBasicExclExordItemsComTot(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0)",
"TR.F.EPSDilInclExordItemsComTot(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0)",
"TR.F.BookValuePerShr5YrCAGR(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FY0)",
"TR.F.ReturnAvgComEqPct(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0)",
"TR.F.LeveredFOCFPerShr(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,ReportingState=Orig, Period=FQ0)",
"TR.EpsSmartEst(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, Period=FQ0)",
"TR.NetProfitMean(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,Period=FQ0)",
"TR.NetprofitSmartEst(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,Period=FQ0)",
"TR.EBITDASmartEst(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,Period=FQ0)",
"TR.RevenueMean(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,Period=FQ0)",
"TR.F.LeveredFOCFPerShr(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ,Period=FQ0)",
"TR.EPSSmartEstLastYrGrowth(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ)",
"TR.EBITDASmartEstLastYrGrowth(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ)",
"TR.RevenueSmartEstLastYrGrowth(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ)",
"TR.EPSActSurprise(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, Period=FQ0)",
"TR.RevenueActSurprise(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, Period=FQ0)",
"TR.OperatingMarginPercent(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, Period=FQ0)",
"TR.F.EBITDAMargPctTTM(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, Period=FQ0)",
"TR.F.IncAfterTaxMargPct(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, Period=FQ0)",
"TR.NetProfitMean(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, Period=FQ0)",
"TR.F.NetDebtToEBITDATTM(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ)",
"TR.F.NetDebttoTotEq(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0)",
"TR.F.EBITNetIntrCovRatio(SDate=2014-01-01, EDate=2024-05-30, Frq=FQ, ReportingState=Orig, Period=FQ0)",
"TR.F.IntrExpnNetOfIntrInc(SDate=2014-01-01,EDate=2024-05-30,Period=FQ0,Frq=FQ,Methodology=StubLTM,ReportingState=Orig)",
"TR.F.CurrRatio(SDate=2014-01-01,EDate=2024-05-30,Period=FQ0,Frq=FQ)",
"TR.EpsSmartEst(SDate=2014-01-01,EDate=2024-05-30,Frq=FQ,Period=FQ0)",
"TR.EPSActValue(SDate=2014-01-01,EDate=2024-05-30,Frq=FQ,Period=LTM)"
]
metrics_data_dicts = {}
for i in range(3):
key_name = f"metrics_dict_{i}"
logger.info(f"Iteration number {i}")
metrics_data_dicts[key_name] = retrieve_and_process_fundamental_data(tickers=tickers,
metric_codes=fields,
ticker_partition_size=ticker_partition_size,
metric_partition_size=metric_partition_size,
max_attempts=max_attempts)
Best Answer
-
Hi @alejandro.gonzalez ,
You're using the basic function rd.get_data() but you want to check the result with a granularity that this function doesn' t provide.
You should replace
metrics_data = rd.get_data(tickers, metric_codes)
with rd.content.fundamental_and_refrence API:
result = rd.content.fundamental_and_reference.Definition(universe=tickers, fields=metric_codes).get_data()
if result.errors:
# errors is a not empty list of tuples (code, error_message)
# => iterate on this list to detect error causes
...
else:
# retrieve result as a DataFrame
metrics_data = result.data.df0
Answers
-
Thank you for reaching out to us.
As far as I know, I may relate to the server load. The server can cancel the request due to time out. You may try to reduce the number of fields or the date range (SDate and EDate) in reach request.
You can enable logging in the library to verify what the problem is by using the following code.
config = rd.get_config()
config.set_param("logs.transports.file.enabled", True)
config.set_param("logs.transports.file.name", "refinitiv-data-lib.log")
config.set_param("logs.level", "debug")
rd.open_session()The refiniti-data-lib.log file will be created. You can check the log file for the issue.
0
Categories
- All Categories
- 6 AHS
- 37 Alpha
- 161 App Studio
- 4 Block Chain
- 4 Bot Platform
- 16 Connected Risk APIs
- 47 Data Fusion
- 30 Data Model Discovery
- 608 Datastream
- 1.3K DSS
- 577 Eikon COM
- 4.9K Eikon Data APIs
- 7 Electronic Trading
- Generic FIX
- 7 Local Bank Node API
- Trading API
- 2.7K Elektron
- 1.3K EMA
- 236 ETA
- 519 WebSocket API
- 33 FX Venues
- 10 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 20 Messenger Bot
- 2 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 59 Open Calais
- 264 Open PermID
- 39 Entity Search
- 2 Org ID
- PAM
- PAM - Logging
- 8.4K Private Comments
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 20 RDMS
- 1.4K Refinitiv Data Platform
- 367 Refinitiv Data Platform Libraries
- 3 Refinitiv Due Diligence
- LSEG Due Diligence Portal API
- 3 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.1K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 10 World-Check Customer Risk Screener
- 990 World-Check One
- 44 World-Check One Zero Footprint
- 45 Side by Side Integration API
- Test Space
- 3 Thomson One Smart
- 1.2K TR Internal
- Global Hackathon 2015
- 2 Specialists Who Code
- 10 TR Knowledge Graph
- 150 Transactions
- 142 REDI API
- 1.7K TREP APIs
- 4 CAT
- 21 DACS Station
- 117 Open DACS
- 1.1K RFA
- 103 UPA
- 172 TREP Infrastructure
- 224 TRKD
- 886 TRTH
- 5 Velocity Analytics
- 5 Wealth Management Web Services
- 59 Workspace SDK
- 9 Element Framework
- 5 Grid
- 13 World-Check Data File
- Yield Book Analytics
- 46 中文论坛