Is there is some header we need to set, as we might have triggered some kind of recompression during
Hi all.
We are consuming the https://selectapi.datascope.refinitiv.com RestApi/v1.
Methods I am calling are:
- [1] StandardExtractions/UserPackages
- [2] StandardExtractions/UserPackageDeliveries
- [3] StandardExtractions/UserPackageDeliveryGetUserPackageDeliveriesByPackageId
The process is fine and we are getting the file list and are able to download files, but the issue I am facing is that the MD5 checksum and file size received from the api [1] does not match the file's MD5 checksum or file size when I download it.
For example, this is what we receive from API:
file: AEX-2024-04-09-Instruments-SEDOL-1-of-1.csv.gz
package: 0x08e342ecb3b9c483
md5: f805ce4d58892c7c097a6486c002452a
file_size: 1104824
But the actual downloaded file has md5sum equal to b31566ff09092d14480389c276a2729b and its filesize is 1028387.
If I decompress the archive, the md5sum is 1c4bf6cfe0207081e84082761ef68f09
My only suspicion is that there is some header we need to set, as we might have triggered some kind of recompression during download?
If it matters, I am using the Python programming language and the module I use to query / download is requests.
I checked https://developers.lseg.com/content/dam/devportal/api-families/thomson-reuters-tick-history-trth/thomson-reuters-tick-history-trth-rest-api/documentation/tick_hist_rest_api-guide_november2019.pdf and the answer was not there
Here is the minimum code example (username and password omitted):
import requests
import os
import json
import hashlib
URL = "https://selectapi.datascope.refinitiv.com"
API = "RestApi/v1"
def uri(path):
return os.path.join(URL, API, path)
def download_file(path, output_stream):
payload = {"Credentials": {"Username": username, "Password": password}}
headers = {"Content-type": "application/json"}
token = requests.post(uri("Authentication/RequestToken"), headers=headers, data=json.dumps(payload)).json()["value"]
auth_header = {"Authorization": f"Token {token}"}
md5 = hashlib.md5()
file_size = 0
for chunk in requests.get(uri(path), stream=True, headers=auth_header).iter_content(chunk_size=8192):
output_stream.write(chunk)
md5.update(chunk)
file_size += len(chunk)
return (md5.hexdigest(), file_size)
with open("/tmp/foo", "wb") as f:
path = "StandardExtractions/UserPackageDeliveries('0x08e342ecb3b9c483')/$value?fn=AEX-2024-04-09-Instruments-SEDOL-1-of-1.csv.gz"
ret = download_file(path, f)
print(ret)
Best Answer
-
The Python requests module will try to decompress the file on the fly and it can fail due to the large size. The advise in this manner is to download the raw file and the perform the actions like MD5 or unzip etc. Here is the code that I used along with the results for your file:
dResp = requests.get(url, headers=hdrs, stream=True)
# do not auto decompress the data
dResp.raw.decode_content = False
chunkSize = 1024*1024
with open(fileName, 'wb') as f:
for chunk in dResp.iter_content(chunk_size=chunkSize):
if chunk:
f.write(chunk)The result:
> ls -la *.gz
-rwx------+ 1 xxxxx None 1104824 Apr 10 10:02 AEX-2024-04-09-Instruments-SEDOL-1-of-1.csv.gz
> certutil -hashfile AEX-2024-04-09-Instruments-SEDOL-1-of-1.csv.gz MD5
MD5 hash of AEX-2024-04-09-Instruments-SEDOL-1-of-1.csv.gz:
f805ce4d58892c7c097a6486c002452aThe file size and the MD5 matches the original published parameters.
1
Answers
-
In addition you can also download the file from AWS direct for faster downloads. It should not effect the file size or MD5 hash. Add this to your request headers:
'X-Direct-Download': 'true'
0 -
Hi @Gurpreet Many thanks for your guidance. Appreciate it
Client managed to make it work by using:
resp.raw.stream(1024*1024, decode_content=False):
instead of:
resp.iter_content(chunk_size=1024*1024):
0
Categories
- All Categories
- 6 AHS
- 37 Alpha
- 161 App Studio
- 4 Block Chain
- 4 Bot Platform
- 16 Connected Risk APIs
- 47 Data Fusion
- 30 Data Model Discovery
- 608 Datastream
- 1.3K DSS
- 577 Eikon COM
- 4.9K Eikon Data APIs
- 7 Electronic Trading
- Generic FIX
- 7 Local Bank Node API
- Trading API
- 2.7K Elektron
- 1.3K EMA
- 236 ETA
- 519 WebSocket API
- 33 FX Venues
- 10 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 20 Messenger Bot
- 2 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 59 Open Calais
- 264 Open PermID
- 39 Entity Search
- 2 Org ID
- PAM
- PAM - Logging
- 8.4K Private Comments
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 20 RDMS
- 1.4K Refinitiv Data Platform
- 367 Refinitiv Data Platform Libraries
- 3 Refinitiv Due Diligence
- LSEG Due Diligence Portal API
- 3 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.1K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 10 World-Check Customer Risk Screener
- 990 World-Check One
- 44 World-Check One Zero Footprint
- 45 Side by Side Integration API
- Test Space
- 3 Thomson One Smart
- 1.2K TR Internal
- Global Hackathon 2015
- 2 Specialists Who Code
- 10 TR Knowledge Graph
- 150 Transactions
- 142 REDI API
- 1.7K TREP APIs
- 4 CAT
- 21 DACS Station
- 117 Open DACS
- 1.1K RFA
- 103 UPA
- 172 TREP Infrastructure
- 224 TRKD
- 886 TRTH
- 5 Velocity Analytics
- 5 Wealth Management Web Services
- 59 Workspace SDK
- 9 Element Framework
- 5 Grid
- 13 World-Check Data File
- Yield Book Analytics
- 46 中文论坛