How to import json files of metadata of Research reports?
My customer who contracted for both Research API and Research bulk feed(= historical data of real-time research reports) is now trying to import a metadata file of the Research bulk feed to python using the following scripts, but can't import it successfully.
Can you please advise on how to import the metadata file to python successfully?
Scripts:
import pandas as pd
df = pd.read_json("ctb4614_050_metadata.json", lines=True)
Returen:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-87067cc29fae> in <module>
1 # df = pd.read_json("ctb4614_050_metadata.json", orient="split")
----> 2 df = pd.read_json("ctb4614_050_metadata.json", lines=True)
3 # df = pd.read_json("ctb4614_050_metadata.json", lines=True, orient="values")
4 # df = pd.read_json("ctb4614_050_metadata.zip", compression='infer')
/opt/anaconda3/lib/python3.7/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines, chunksize, compression)
590 return json_reader
591
--> 592 result = json_reader.read()
593 if should_close:
594 try:
/opt/anaconda3/lib/python3.7/site-packages/pandas/io/json/_json.py in read(self)
713 elif self.lines:
714 data = ensure_str(self.data)
--> 715 obj = self._get_object_parser(self._combine_lines(data.split("\n")))
716 else:
717 obj = self._get_object_parser(self.data)
/opt/anaconda3/lib/python3.7/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json)
737 obj = None
738 if typ == "frame":
--> 739 obj = FrameParser(json, **kwargs).parse()
740
741 if typ == "series" or obj is None:
/opt/anaconda3/lib/python3.7/site-packages/pandas/io/json/_json.py in parse(self)
847
848 else:
--> 849 self._parse_no_numpy()
850
851 if self.obj is None:
/opt/anaconda3/lib/python3.7/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
1091 if orient == "columns":
1092 self.obj = DataFrame(
-> 1093 loads(json, precise_float=self.precise_float), dtype=None
1094 )
1095 elif orient == "split":
ValueError: Expected object or value
Best Answer
-
Hello @hiroko.yamaguchi
The ctb4614_050_metadata.json file is not a valid JSON message format file. It contains a lot of "docID" nodes without array element (example {"docID": 1}, {"docID": 2}, {"docID":3}).
The application needs to add "[" and "]" to set all "docID" nodes in the same array element
[{"docID": 1}, {"docID": 2}, {"docID":3}]
I have tried to add "[" at the first of the file and "]" at the end of file, then load with pd.read_json("ctb4614_050_metadata.json") and it works fine.
0
Answers
-
I could replicate it! Thank you!
On the other hand, they're provided with lots of such a file of metadata for all brokers they're entitled to, therefore, is there any way to add the "[" and "]" programmatically?
0 -
Hi @hiroko.yamaguchi
I think the first step is to verify if those metadata json files are the valid JSON message or not , then fix the file/content based on how errors or invalid format.The Python application can use json.load() function to validate JSON file, but the file needs to be manual fix if it is invalid format.
0
Categories
- All Categories
- 6 AHS
- 37 Alpha
- 161 App Studio
- 4 Block Chain
- 4 Bot Platform
- 16 Connected Risk APIs
- 47 Data Fusion
- 30 Data Model Discovery
- 608 Datastream
- 1.3K DSS
- 577 Eikon COM
- 4.9K Eikon Data APIs
- 7 Electronic Trading
- Generic FIX
- 7 Local Bank Node API
- Trading API
- 2.7K Elektron
- 1.3K EMA
- 236 ETA
- 519 WebSocket API
- 33 FX Venues
- 10 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 20 Messenger Bot
- 2 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 59 Open Calais
- 264 Open PermID
- 39 Entity Search
- 2 Org ID
- PAM
- PAM - Logging
- 8.4K Private Comments
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 20 RDMS
- 1.4K Refinitiv Data Platform
- 367 Refinitiv Data Platform Libraries
- 3 Refinitiv Due Diligence
- LSEG Due Diligence Portal API
- 3 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.1K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 10 World-Check Customer Risk Screener
- 990 World-Check One
- 44 World-Check One Zero Footprint
- 45 Side by Side Integration API
- Test Space
- 3 Thomson One Smart
- 1.2K TR Internal
- Global Hackathon 2015
- 2 Specialists Who Code
- 10 TR Knowledge Graph
- 150 Transactions
- 142 REDI API
- 1.7K TREP APIs
- 4 CAT
- 21 DACS Station
- 117 Open DACS
- 1.1K RFA
- 103 UPA
- 172 TREP Infrastructure
- 224 TRKD
- 886 TRTH
- 5 Velocity Analytics
- 5 Wealth Management Web Services
- 59 Workspace SDK
- 9 Element Framework
- 5 Grid
- 13 World-Check Data File
- Yield Book Analytics
- 46 中文论坛