Issue while flattening the JSON file to CSV in RDP ESG Bulk
Hello there,
I did ESG Bulk extraction. However, when I am trying to flatten the JSON to CSV, few columns still has the json tags. It doesn't seem to flatten the JSON file completely. Appreciate you support to investigate and help on this.
For instance, I tried converting the RFT-ESG-Scores-Full-Init-2021-04-25.jsonl.gz file from JSON to CSV using the following code,
#convert specific json to csv
filedestinationpath = 'C:\\$files\\$ESG\\RDP_BULK\\Results\\'
filename = filedestinationpath + 'RFT-ESG-Scores-Full-Init-2021-04-25' + '.jsonl.gz'
f=gzip.open(filename,'rb')
file_content=f.read()
lines = file_content.splitlines()
df_inter = pd.DataFrame(lines)
df_inter.columns = ['json_element']
df_resolve = df_inter['json_element'].apply(json.loads)
df_final = pd.json_normalize(df_resolve)
resultspth = filedestinationpath + 'RFT-ESG-Scores-Full-Init-2021-04-25' + '.csv'
df_final.to_csv(resultspth, index = False)
It seem to convert, but not all the column. For example, OrganizationName doesn't seem to flatten out completely, it still carries the json tags.
Similarly, when I tried "RFT-ESG-Symbology-SEDOL-Init-2021-04-29.jsonl"; few columns as shown in the screengrab below seems to be the issue.
Appreciate if you can review and support on this.
Best Answer
-
Hello @Bala Ilango,
Perhaps you may wish to take the same approach further, and double-normalize the fields that contain nested objects on RFT-ESG-Scores:
For example:
import gzip
import pandas as pd
import json
#convert specific json to csv
filedestinationpath = '.\\'
filename = filedestinationpath + 'RFT-ESG-Scores-Current-init-2021-05-02' + '.jsonl.gz'
f=gzip.open(filename,'rb')
file_content=f.read()
lines = file_content.splitlines()
df_inter = pd.DataFrame(lines)
df_inter.columns = ['json_element']
df_resolve = df_inter['json_element'].apply(json.loads)
df_resolve
df_final = pd.json_normalize(df_resolve)
df_final['ESGOrganization.Names.Name.OrganizationName'] = pd.json_normalize(df_final['ESGOrganization.Names.Name.OrganizationName'].str[0]
resultspth = filedestinationpath + 'RFT-ESG-Scores-Current-init-2021-05-02' + '.csv'
df_final.to_csv(resultspth, index = False)
df_finalResulting in
0
Answers
-
Hi @Bala Ilango,
May I ask how you got the 'RFT-ESG-Scores-Full-Init-2021-04-25.jsonl.gz' data file in question? was it through Python? If so, would you mind sharing your code (removing identifiable text)?0 -
@jonathan.legrand Yes. I am using Python. Shared the codes via email. Appreciate your support with this. Thanks.
0 -
Hello @Bala Ilango,
The columns that do not convert, and retain square brackets and curly braces are nested objects.
ESGOrganization.Names.Name.OrganizationName is in this case a nested object, implementing array and containing a map, with, potentially, multiple names, for example:
{"ObjectId":"4295864969;111","StatementDetails":{"OrganizationId":"4295864969","FinancialPeriodEndDate":"2020-12-31T00:00:00.000Z","FinancialPeriodFiscalYear":"2020","FinancialPeriodIsIncomplete":"true"},"ESGOrganization":{"Names":{"Name":{"OrganizationName":[{"OrganizationNormalizedName":"China High Speed Transmission Equipment Group Co Ltd"}]}}},"ESGScores":{"ESGCombinedScore":{"Value":"0.5429413762056495","ValueCalculationDate":"2021-05-01T18:05:32.161Z","ValueScoreGrade":"B-"},"ESGScore":{"Value":"0.5429413762056495","ValueCalculationDate":"2021-05-01T18:05:32.161Z","ValueScoreGrade":"B-"},"EnvironmentPillarScore":{"Value":"0.6736242884250474","ValueCalculationDate":"2021-04-24T17:12:23.608Z","ValueScoreGrade":"B+ "},"ESGResourceUseScore":{"Value":"0.7903225806451613","ValueCalculationDate":"2021-04-03T21:34:07.385Z","ValueScoreGrade":"A-"},"ESGEmissionsScore":{"Value":"0.7258064516129032","ValueCalculationDate":"2021-04-24T17:12:23.608Z","ValueScoreGrade":"B+"},"ESGInnovationScore":{"Value":"0.5","ValueCalculationDate":"2021-04-03T21:34:07.385Z","ValueScoreGrade":"C+"},"SocialPillarScore":{"Value":"0.4414690382081688","ValueCalculationDate":"2021-04-24T17:12:23.608Z","ValueScoreGrade":"C+ "},"ESGWorkforceScore":{"Value":"0.6375","ValueCalculationDate":"2021-04-10T17:37:21.596Z","ValueScoreGrade":"B"},"ESGHumanRightsScore":{"Value":"0.15217391304347827","ValueCalculationDate":"2021-04-03T21:34:07.385Z","ValueScoreGrade":"D"},"ESGCommunityScore":{"Value":"0.4375","ValueCalculationDate":"2021-04-24T17:12:23.608Z","ValueScoreGrade":"C+"},"ESGProductResponsibilityScore":{"Value":"0.3484848484848485","ValueCalculationDate":"2021-04-10T17:37:21.596Z","ValueScoreGrade":"C"},"GovernancePillarScore":{"Value":"0.4760119460883173","ValueCalculationDate":"2021-05-01T18:05:32.161Z","ValueScoreGrade":"C+ "},"ESGManagementScore":{"Value":"0.5246305418719212","ValueCalculationDate":"2021-05-01T18:05:32.161Z","ValueScoreGrade":"B-"},"ESGShareholdersScore":{"Value":"0.46798029556650245","ValueCalculationDate":"2021-04-10T17:37:21.596Z","ValueScoreGrade":"C+"},"ESGCsrStrategyScore":{"Value":"0.24496644295302014","ValueCalculationDate":"2021-04-10T17:37:21.596Z","ValueScoreGrade":"D+"},"ESGCControversiesScore":{"Value":"1.0","ValueCalculationDate":"2021-04-03T21:34:07.385Z","ValueScoreGrade":"A+"}},"DiversityAndInclusionScores":{"ControversiesScore":{"Value":null,"ValueCalculationDate":null},"DiversityScore":{"Value":null,"ValueCalculationDate":null},"InclusionScore":{"Value":null,"ValueCalculationDate":null},"PeopleDevelopmentScore":{"Value":null,"ValueCalculationDate":null},"Score":{"Value":null,"ValueCalculationDate":null}}}
Therefore, it will not fully flatten into CSV, without loosing meaning. If you are ok with only the first name represented, you can do something like
df = pd.json_normalize(df_resolve)
df['ESGOrganization.Names.Name.OrganizationName'] = df['ESGOrganization.Names.Name.OrganizationName'].str[0]And similarly, you can select the key and the value and insert them into dataframe as separate columns, removing the hierarchical column, but I think you would like to preserve the structure in this case, as it enables you to retain the meaning.
2 -
@zoya.farberov Can you please help me with my other part of the question on best practice to flatten the ESG bulk Symbology files.
0 -
@zoya.farberov When I am running the following line in the code.
df_final['ESGOrganization.Names.Name.OrganizationName'] = pd.json.normalize(df_final['ESGOrganization.Names.Name.OrganizationName'].str[0])
I am getting the following error:
AttributeError: module 'pandas' has no attribute 'json'
0 -
0
-
Hello @Bala Ilango,
To save ESG Bulk symbology files in CSV format, you may also find example ESGBulkToCSV on GutHub useful.
1 -
Update on this post..
Flat CSV files are now available for ESG Bulk files.0
Categories
- All Categories
- 6 AHS
- 37 Alpha
- 161 App Studio
- 4 Block Chain
- 4 Bot Platform
- 16 Connected Risk APIs
- 47 Data Fusion
- 30 Data Model Discovery
- 608 Datastream
- 1.3K DSS
- 577 Eikon COM
- 4.9K Eikon Data APIs
- 7 Electronic Trading
- Generic FIX
- 7 Local Bank Node API
- Trading API
- 2.7K Elektron
- 1.3K EMA
- 236 ETA
- 519 WebSocket API
- 33 FX Venues
- 10 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 20 Messenger Bot
- 2 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 59 Open Calais
- 264 Open PermID
- 39 Entity Search
- 2 Org ID
- PAM
- PAM - Logging
- 8.4K Private Comments
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 20 RDMS
- 1.4K Refinitiv Data Platform
- 367 Refinitiv Data Platform Libraries
- 3 Refinitiv Due Diligence
- LSEG Due Diligence Portal API
- 3 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.1K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 10 World-Check Customer Risk Screener
- 990 World-Check One
- 44 World-Check One Zero Footprint
- 45 Side by Side Integration API
- Test Space
- 3 Thomson One Smart
- 1.2K TR Internal
- Global Hackathon 2015
- 2 Specialists Who Code
- 10 TR Knowledge Graph
- 150 Transactions
- 142 REDI API
- 1.7K TREP APIs
- 4 CAT
- 21 DACS Station
- 117 Open DACS
- 1.1K RFA
- 103 UPA
- 172 TREP Infrastructure
- 224 TRKD
- 886 TRTH
- 5 Velocity Analytics
- 5 Wealth Management Web Services
- 59 Workspace SDK
- 9 Element Framework
- 5 Grid
- 13 World-Check Data File
- Yield Book Analytics
- 46 中文论坛