How to clean the output of data

Aleniles · November 2022

Hello, I have the code below.

import DatastreamPy as dsws
import pandas as pd

# set up connection
def connection_DSWS():
    DSWS_username = open("DSWSUsername.txt","r")
    DSWS_password = open("DSWSPassword.txt","r")

    ds = dsws.Datastream(username = str(DSWS_username.read()),
                         password = str(DSWS_password.read()))

    DSWS_username.close()
    DSWS_password.close()
    return ds 

def return_data(start, end, tickers, eom=False, period=12, format_date="Other"):      
    ds = connection_DSWS()
    df = ds.get_data(tickers=tickers, start=start, end=end, freq="M")
    df
    return df

Tickers = 'USCONPRCE(ES)*DUMMYMTE,USCNFBUSQ(ES)*DUMMYMTE,TRUS10T(RY)*DUMMYMTE' 
StartDate = "2020-12-31" #Put start date with the following format "yyyy-mm-dd"
EndDate = "2022-02-22" #same above for end date
FormatDate = "yyyymmdd" #Format in the csv 
Period = 3 # Period of ROC and MOM
path = "Output.csv"
EndOfMonth = True #True if you use *DUMMYMTE, you have to use it for all elements to have EOM data

df = return_data(StartDate, EndDate, Tickers, EndOfMonth, Period, FormatDate)

df

How to clean the dataframe so that the output into a csv file, comma delimited, with the ticker names in the headline and the date into the yyyymmdd format, will be exactly like this (see below) ?

Date,USCONPRCE,USCNFBUSQ,TRUS10T

20201231,261.564,60.5,0.912

20210129,262.200,59.4,1.094

20210226,263.346,60.9,1.456

20210331,265.028,63.7,1.746

Thanks!

raksina.samasiri · November 2022

Hi @Aleniles ,

Is this what you're looking for?

new_header = ['USCONPRCE','USCNFBUSQ','TRUS10T'] 
df.to_csv('csv_output.csv', header=new_header

output CSV file

compare to the df, the data matches

Aleniles · November 2022

Great!

And starting from that last csv file ... how to change (and fill the data) so to have this output:

Date,USCONPRCE,USCONPRCE _CHG,USCNFBUSQ,USCNFBUSQ _CHG,TRUS10T,TRUS10T_CHG

So basically is TICKER1,TICKER1_CHG,TICKER2_,TICKER2_CHG etc

Add the suffix _CHG to all tickers in a new column and fill it with 1 period actual change eg: df["USCONPRCE_CHG"]=(df['USCONPRCE']-df['USCONPRCE'].shift(1)) ?

This using a dynamic loop so that it will work with whatever ticker / whatever numbers of total tickers we have ... that is ... take first column ticker, add a new colum ticker_chg, add the actual change to that new column. Do the same for ticker 2, ticker 3 until last column.

Aleniles · November 2022

I was able to add the columns with _CHG but how to add the price change as values instead of Nan ?

df1=df
for count in range(len(df1.columns)):
    df1.insert(count*2+1, df1.columns[count*2]+'_CHG', 'Nan')

df1

Aleniles · November 2022

Solved (for all, hope it helps)

df1=df

for count in range(len(df1.columns)):

    df1.insert(count*2+1, df1.columns[count*2]+'_CHG', 100*(df1[df1.columns[count*2]]/df1[df1.columns[count*2]].shift(1)-1) )



df1

Aleniles · November 2022

As you can see the TRUS10T(RY) pulls only 3 decimal numbers by default. Is there a way to increase those decimals to 4 numbers while getting the data? I do not see any arguments for that in the get_data function

Aleniles · November 2022

Ok the solution is to use 'DPL#(BMUS10Y(RY)-BMUS02Y(RY),4)' as a ticker instead of 'BMUS10Y(RY)-BMUS02Y(RY) ' but unfortunatley doesn't work for other tickers like DPL#(TRUS10T(RY),4) or DPL#(USCNFBUSQ(ES)*DUMMYMTE,2), could you check why ?

How to clean the output of data

Best Answer

Answers

Categories