EIkon 1.1.4 is not backward compatible with 1.1.2

igorg · September 2020

Another upgrade caused the problem with existing scripts. We've got 2 problems with existing scripts after 1.0.1. to 1.1.2 upgrade. Then we've got another problem after 1.1.2 -> 1.1.4 (1.1.5).

There is nothing on release notes page about the changes in 1.1.2. There is no release notes for 1.1.4 and 1.1.5 !

These issues makes us never upgrade the working modules at all.

Here is the problem I'm talking about :

dataFrame = ek.get_timeseries(rics='AC.MX', fields='*', start_date='2020-09-03T20:00:00', end_date='2020-09-03T20:10:00', interval='tick', normalize=False)

ricName = dataFrame.columns.name

ricName has RIC name value in 1.1.2. In 1.14 - it's nan.

This request returns the data in different formats using 1.1.2 and 1.1.4 Eikon python mode.

pierre.faurel · September 2020

ts[ric] works only for a multi ric timeseries

I tested different requests et checked dataframe structure with following code:

def get_ts(rics, fields):
  ts = rdp.get_timeseries(rics, fields, 
                          start_date='2020-09-03T20:00:00', 
                          end_date='2020-09-03T20:10:00',
                          interval='minute',
                          normalize=False, count=1)
  if hasattr(ts.columns, "name"):
    print("ts.columns.name: ", ts.columns.name)
  else:
    print("ts.columns.name: <No name>")
  print(ts)

1 RIC, N fields

=> ts.columns.name contains the ric name

get_ts(["MSFT.O"], '*')

Output:

ts.columns.name:  MSFT.O
MSFT.O                 HIGH     LOW    OPEN   CLOSE  COUNT  VOLUME
Date                                                               
2020-09-03 20:07:00  218.70  218.30  218.53  218.45    162   11654

N RIC, 1 field

=> ts.columns.name contains the field name

get_ts(["AAPL.O", "MSFT.O"], 'HIGH')

Output:

ts.columns.name:  HIGH
HIGH                 AAPL.O  MSFT.O 
Date                                
2020-09-03 20:07:00  127.67  218.70

N RIC, N fields

=> ts.columns.name contains "Security" label

get_ts(["AAPL.O", "MSFT.O"], ["HIGH", "LOW"])

Output:

ts.columns.name:  None 
Security             AAPL.O          MSFT.O         
Field                  HIGH     LOW    HIGH     LOW
Date                                                
2020-09-03 20:07:00  127.67  120.86  218.70  218.30

pierre.faurel · September 2020

Hi,

We apologize if these changes cause you any inconvenience.
We noticed your remark on release note and will improve this point

Rather than using dataFrame.columns.name property which is not standard in pandas (this property was added by eikon lib), we suggest to rely on your rics request parameter.
column names should only refer to related field names.

In last 1.1.4 and 1.1.5, changes were done to align with pandas standard output.

numpy.NaN value was change for pandas.NA
all values are converted to the most suitable type
(in 1.1.2, all numeric values were converted in float64, other values like str keep object type)

As you'll see below, this impacts only empty data :

ts = ek.get_timeseries(rics='AC.MX', fields='*', start_date='2020-09-03T20:00:00', end_date='2020-09-03T20:10:00', interval='tick', normalize=False) print(ts) print(ts.dtypes)

1.1.2
AC.MX                VALUE   VOLUME
Date                               
2020-09-03 20:00:01    NaN   4400.0
2020-09-03 20:00:01    NaN   3720.0
2020-09-03 20:00:01    NaN   5814.0
2020-09-03 20:00:01    NaN   2494.0
2020-09-03 20:00:01    NaN   2507.0
2020-09-03 20:00:01    NaN    436.0
2020-09-03 20:00:01    NaN    356.0
2020-09-03 20:00:01    NaN  16933.0

AC.MX
VALUE     float64
VOLUME    float64
dtype: object

1.1.4
                     VALUE  VOLUME
Date                              
2020-09-03 20:00:01   <NA>    4400
2020-09-03 20:00:01   <NA>    3720
2020-09-03 20:00:01   <NA>    5814
2020-09-03 20:00:01   <NA>    2494
2020-09-03 20:00:01   <NA>    2507
2020-09-03 20:00:01   <NA>     436
2020-09-03 20:00:01   <NA>     356
2020-09-03 20:00:01   <NA>   16933

VALUE     Int64
VOLUME    Int64
dtype: object

1.1.5
                     VALUE  VOLUME
Date                              
2020-09-03 20:00:01   <NA>    4400
2020-09-03 20:00:01   <NA>    3720
2020-09-03 20:00:01   <NA>    5814
2020-09-03 20:00:01   <NA>    2494
2020-09-03 20:00:01   <NA>    2507
2020-09-03 20:00:01   <NA>     436
2020-09-03 20:00:01   <NA>     356
2020-09-03 20:00:01   <NA>   16933

VALUE     Int64
VOLUME    Int64
dtype: object

igorg · September 2020

Do you see that 'AC.MX' in 1.1.2 printed results? It's missing in 1.1.4 and 1.1.5

pierre.faurel · September 2020

Yes, I see.
we're releasing a new version 1.1.6 and will restore it for your convenience.

igorg · September 2020

This sounds great. Do you have an ETA for 1.1.6?

pierre.faurel · September 2020

It needs to be validated but I'm confident for next week

igorg · September 2020

Thanks. I will check it next week

igorg · September 2020

About 'using dataFrame.columns.name' - yes, we rely on this field to get the RIC name. This was a workaround to solve the more general issue :

The response representation is not consistent and different for :

- if there are multiply rics received

- if there is 1 ric

- if there is 1 field requested and more than 1 ric returned

The representations of last 2 are similar , but for 3rd response instead of RIC , it has a field name.

I'm not a pro in python/pandas. But using common logic I do not understand why the data structure of the result depends on the number of requested RICs and fields. I would use the same data structure like there are multiple RICs and multiple fields.

pierre.faurel · September 2020

Hi,
eikon 1.1.6 was release on pypi.org

It contains a fix for the raised issue, but I remind that dataFrame.columns.name property is not a standard from pandas but 'added value' from eikon lib for timeseries on 1 RIC.

If you call any functions that modify the dataframe, you can loose name property.

And iIf you request a timeseries for list of rics, you don't have it but you can extract dataframe for each ric by this way:

ts = ek.get_timeseries(['AC.MX', 'MSFT.O'], ...)
ac_mx = ts['AC.MX']
msft = ts['MSFT.O']

(ts.columns is MultiIndex)

More generally, to get column names from a pandas.DataFrame is:

names = df.columns
print([name for name in ts.columns])
['VALUE', 'VOLUME']

igorg · September 2020

Hi,

Thanks for the update. You also used that 'added field' in case I request 1 field only. Was it removed as well? That added value had a field name.

Another question is why do you use different dataframe structure if there is only 1 RIC requested?

This does not work :

ts = ek.get_timeseries(['AC.MX'], ...)
ac_mx = ts['AC.MX]

EIkon 1.1.4 is not backward compatible with 1.1.2

Best Answer

Answers

Categories