How to check the Downloaded Zip file has complete data?
I am downloading 18 years of data from Tick History Raw using REST API but sometimes i unzip the zipGenius file, it is not able to extract the complete data into the csv file due to data overflow issue into csv.
Want to check if the zipped file contains the complete data and this is just the issue of csv capacity to have that much number of data rows? If yes, any other way to handle this situation?
Best Answer
-
Thanks for the response. I tried to use the suggested code to check the count of lines in the Gzip file. It is coming out to be more than 1.3 million but when i extract this file using ZipGenius, it extracts data till somewhere around 1 million rows in csv (precisely 1048576 rows in csv). I think it is due to the CSV limitation of having rows more than this. Isn't it?
If yes, any other way to get the complete data into excel?
0
Answers
-
@pj4, I see you are using an extract of our TRTH_OnDemand_IntradayBars Python sample, which you have modified. That code has been tested on downloads of varying sizes (<100kB - > 100MB) without issues.
Considering you are downloading 18 years of data it might be a fairly large data set, though you do not mention how many instruments are in the request, nor the size of your gzip file.
Most Tick History reports deliver output as a gzip file. If a report is large, it delivers its output as
several smaller gzip files concatenated into a single large gzip file. For more info on this topic, see this advisory.I do not know the limits of zipGenius, but maybe it cannot handle concatenated Gzips correctly, or maybe it just has a limit on the file size ?
I suggest you try a different tool to open the Gzip and extract the CSV.
You could also use code to print out the first lines, and count all the lines as a sanity check, after downloading the file:
count = 0
maxPrintLines = 10
with gzip.open(fileName, 'rb') as fd:
for line in fd:
#Do something with the data:
count += 1
if count <= maxPrintLines:
dataLine = line.decode("utf-8")
print (dataLine)
fd.close()Instead of counting them, you could also save them to a CSV file.
0 -
CSV files have no limit of rows you can add to them.
Excel won't hold more that 1 million lines of data if you import a CSV file having more lines. The precise limit for Excel (versions >= 2007) is 2 to the power of 20 = 1'048'576 lines, exactly what you observe, so ZipGenius might have the same limit.
I suggest you try to extract the data using a different utility.
0
Categories
- All Categories
- 6 AHS
- 37 Alpha
- 161 App Studio
- 4 Block Chain
- 4 Bot Platform
- 16 Connected Risk APIs
- 47 Data Fusion
- 30 Data Model Discovery
- 608 Datastream
- 1.3K DSS
- 577 Eikon COM
- 4.9K Eikon Data APIs
- 7 Electronic Trading
- Generic FIX
- 7 Local Bank Node API
- Trading API
- 2.7K Elektron
- 1.3K EMA
- 236 ETA
- 519 WebSocket API
- 33 FX Venues
- 10 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 20 Messenger Bot
- 2 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 59 Open Calais
- 264 Open PermID
- 39 Entity Search
- 2 Org ID
- PAM
- PAM - Logging
- 8.4K Private Comments
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 20 RDMS
- 1.4K Refinitiv Data Platform
- 367 Refinitiv Data Platform Libraries
- 3 Refinitiv Due Diligence
- LSEG Due Diligence Portal API
- 3 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.1K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 10 World-Check Customer Risk Screener
- 990 World-Check One
- 44 World-Check One Zero Footprint
- 45 Side by Side Integration API
- Test Space
- 3 Thomson One Smart
- 1.2K TR Internal
- Global Hackathon 2015
- 2 Specialists Who Code
- 10 TR Knowledge Graph
- 150 Transactions
- 142 REDI API
- 1.7K TREP APIs
- 4 CAT
- 21 DACS Station
- 117 Open DACS
- 1.1K RFA
- 103 UPA
- 172 TREP Infrastructure
- 224 TRKD
- 886 TRTH
- 5 Velocity Analytics
- 5 Wealth Management Web Services
- 59 Workspace SDK
- 9 Element Framework
- 5 Grid
- 13 World-Check Data File
- Yield Book Analytics
- 46 中文论坛