Merging historical and forecast dataframe yields duplicates

I get duplicate cells when I merge two dataframes: one containing historical data, a second containing forecast data, as follows:

tkrrenew = ['0916.HK', '0958.HK', '0451.HK', '1798.HK']
fdflds = ['CF_NAME', 'TR.EBITDA.rfperiod', 'TR.BasicEpsExclExtraItems.Currency', 'TR.BasicEpsExclExtraItems', 'TR.DpsCommonStock']
param = {'Period': 'FY0', 'SDate': 'FY-2', 'EDate': 'FY0', 'FRQ': 'FY'}
valrenew, err = ek.get_data(tkrrenew, fdflds, param)


frcstfld = ['CF_NAME', 'TR.EPSSmartEst.rfperiod', 'TR.EPSSmartEst', 'TR.DPSMean']
paramfcst = {'Period': 'FY1', 'SDate': '0', 'EDate': '2', 'FRQ': 'FY'}
fcstrenew, err = ek.get_data(tkrrenew, frcstfld, paramfcst)

testrenew = valrenew.merge(fcstrenew, on=['Instrument', 'Name'], how='inner')

The output, after some other commands is as follows:

image

Apart from .drop_duplicate(subset=['Instrument', 'FY']) which leaves me with:

image

What can I do to have both historical and forecast data till FY3 in one dataframe. If .join or .append can place the FY1-FY3 in the same column as FY-2 to FY0, that would ideal.

Pls advise.

Best Answer

  • Is this what you're looking for?
    valrenew.merge(fcstrenew, on=['Instrument', 'Financial Period Relative' ], how='outer')
    This will give you

    image


    You can then sort the resulting dataframe on Instrument and then Financial Period Relative column to get chronological view grouped by the stock. You can remove duplicate CF_NAME column (CF_NAME_x or CF_NAME_y), or you can drop CF_NAME field from one of the requests.

Answers

  • Thanks, Alex & Wasin.

    I've rewritten my code such that it now goes as follows:

    dffrkFY0 = dffrank[dffrank.FY == 'FY0'] #extract FY0 data
    dffrkFY1 = dffrank[dffrank.FY == 'FY-1'] #extract FY-1 data
    dffrkFY1.columns = [['Instrument', 'Name', 'FYE', 'RpCrncy','EstEPS','EstDPS','EstBVPS','PxCrncy','Price',\
    'FXCode', 'FXRate', 'EstPER', 'EstPB', 'EstD_Yld']]

    # dffwd is an earlier DataFrame

    dffmerge = pd.concat([dffwd, dffrkFY0, dffrkFY1], join='inner')
    dffmerge = dffmerge.sort_values(['Instrument', 'FYE'])
    dffmerge = dffmerge.reset_index(drop=True)
    dffmerge[:15]

    image

    Thanks for your effort; much appreciated.

    Later on in the code, I re-jig the <Name> such that it doesn't appear at such an odd position.