How to clean the output of data

Hello, I have the code below.

import DatastreamPy as dsws
import pandas as pd

# set up connection
def connection_DSWS():
DSWS_username = open("DSWSUsername.txt","r")
DSWS_password = open("DSWSPassword.txt","r")

ds = dsws.Datastream(username = str(DSWS_username.read()),
password = str(DSWS_password.read()))

DSWS_username.close()
DSWS_password.close()
return ds

def return_data(start, end, tickers, eom=False, period=12, format_date="Other"):
ds = connection_DSWS()
df = ds.get_data(tickers=tickers, start=start, end=end, freq="M")
df
return df

Tickers = 'USCONPRCE(ES)*DUMMYMTE,USCNFBUSQ(ES)*DUMMYMTE,TRUS10T(RY)*DUMMYMTE'
StartDate = "2020-12-31" #Put start date with the following format "yyyy-mm-dd"
EndDate = "2022-02-22" #same above for end date
FormatDate = "yyyymmdd" #Format in the csv
Period = 3 # Period of ROC and MOM
path = "Output.csv"
EndOfMonth = True #True if you use *DUMMYMTE, you have to use it for all elements to have EOM data

df = return_data(StartDate, EndDate, Tickers, EndOfMonth, Period, FormatDate)

df

How to clean the dataframe so that the output into a csv file, comma delimited, with the ticker names in the headline and the date into the yyyymmdd format, will be exactly like this (see below) ?

Date,USCONPRCE,USCNFBUSQ,TRUS10T

20201231,261.564,60.5,0.912

20210129,262.200,59.4,1.094

20210226,263.346,60.9,1.456

20210331,265.028,63.7,1.746

Thanks!

Best Answer

  • raksina.samasiri
    Answer ✓

    Hi @Aleniles ,

    Is this what you're looking for?

    new_header = ['USCONPRCE','USCNFBUSQ','TRUS10T'] 
    df.to_csv('csv_output.csv', header=new_header

    output CSV file

    1667789038246.png

    compare to the df, the data matches

    1667789059813.png

Answers

  • Great!

    And starting from that last csv file ... how to change (and fill the data) so to have this output:

    Date,USCONPRCE,USCONPRCE _CHG,USCNFBUSQ,USCNFBUSQ _CHG,TRUS10T,TRUS10T_CHG

    So basically is TICKER1,TICKER1_CHG,TICKER2_,TICKER2_CHG etc

    Add the suffix _CHG to all tickers in a new column and fill it with 1 period actual change eg: df["USCONPRCE_CHG"]=(df['USCONPRCE']-df['USCONPRCE'].shift(1)) ?


    This using a dynamic loop so that it will work with whatever ticker / whatever numbers of total tickers we have ... that is ... take first column ticker, add a new colum ticker_chg, add the actual change to that new column. Do the same for ticker 2, ticker 3 until last column.


  • I was able to add the columns with _CHG but how to add the price change as values instead of Nan ?


    df1=df
    for count in range(len(df1.columns)):
    df1.insert(count*2+1, df1.columns[count*2]+'_CHG', 'Nan')

    df1


  • Solved (for all, hope it helps)

    df1=df

    for count in range(len(df1.columns)):

    df1.insert(count*2+1, df1.columns[count*2]+'_CHG', 100*(df1[df1.columns[count*2]]/df1[df1.columns[count*2]].shift(1)-1) )



    df1
  • As you can see the TRUS10T(RY) pulls only 3 decimal numbers by default. Is there a way to increase those decimals to 4 numbers while getting the data? I do not see any arguments for that in the get_data function

  • Ok the solution is to use 'DPL#(BMUS10Y(RY)-BMUS02Y(RY),4)' as a ticker instead of 'BMUS10Y(RY)-BMUS02Y(RY) ' but unfortunatley doesn't work for other tickers like DPL#(TRUS10T(RY),4) or DPL#(USCNFBUSQ(ES)*DUMMYMTE,2), could you check why ?