Invalid EndOfStream c# gzip

if i try to read a gzip file with GZipStream on c# like this

string filePath = "....gz";

int count = 0;

using (FileStream reader = File.OpenRead(filePath))

using (var zip = new GZipStream(reader, CompressionMode.Decompress))

using (StreamReader unzip = new StreamReader(zip))

{

while (!unzip.EndOfStream)

{

var data = unzip.ReadLine();

count++;

}

}

Console.WriteLine(count);

i get less row than read decompressed csv file(decompress with Windows shell)

filePath = "...csv";

count = 0;

using (FileStream reader = File.OpenRead(filePath))

using (StreamReader unzip = new StreamReader(reader))

{

while (!unzip.EndOfStream)

{

var data = unzip.ReadLine();

count++;

}

}

Console.WriteLine(count);

The sample are in https://developers.thomsonreuters.com/elektron-data-solutions/datascope-select-rest-api/downloads

Any ideas, the Size and Packed size on gz archive is strange, Packed size are bigger than decompressed Size(on winrar ui)

Best Answer

Answers

  • M.ROSSIGNOLI, exactly which sample did you use ? Please note that the samples under the URL you posted are for DSS. For TRTH the samples are in https://developers.thomsonreuters.com/thomson-reuters-tick-history-trth/thomson-reuters-tick-history-trth-rest-api/downloads.

  • M.ROSSIGNOLI, what you observe reminds me of this issue. It was in Java, but the symptoms were similar: counting the number of lines of the file did not deliver the same thing when decompressing from the data stream from the server, or from a file saved on disk. Small data amounts worked fine, but with larger ones the end of the file was dropped. The issue was intermittent, so we had varying numbers of lines for what should have delivered a constant number of lines.

    We found out that it was due to an issue with decompressing data on the fly, the popular public libraries we were using were not reliable enough to decompress large amounts of data flowing in through an input stream. After a long investigation we found other libraries that were more reliable. We also found a workaround, which was to first save the file to disk (without decompressing), and then reading it back from disk and decompressing at that time. That worked fine, without dropping data.

  • The sample above is from file on disk. I logged in DataScope download with browser(Chrome) .gz file and after run code. It's a very weird behaviour.

  • i mean "C# Example Application" Dss.Api.Examples.sln .net solution

  • @M.ROSSIGNOLI

    I have found similar issue. It seems that either .Net GZipStream or DeflateStream somehow cannot completely decompress large gzip files generated by TRTH.

    I use the SevenZipSharp to decompress the file as workaround. It requires both

    SevenZipSharp.dll and 7-Zip 9.15 DLL-s files. To use it, you need to add the SevenZipSharp.dll as Reference and modify the path in the code to 7z.dll file's location. Below are the sample code.

    //using (var zip = new GZipStream(reader, CompressionMode.Decompress))
    SevenZip.SevenZipExtractor.SetLibraryPath(@<your local path>\7z.dll);

    using (var extractor = new SevenZip.SevenZipExtractor(filePath))
    using (MemoryStream ms = new MemoryStream())
    {
    int indexZip = extractor.ArchiveFileData.First().Index;
    //Decompress result to memory stream
    extractor.ExtractFile(indexZip, ms);
    ms.Position = 0;
    using (StreamReader unzip = new StreamReader(ms))
    while (!unzip.EndOfStream)
    {
    var data = unzip.ReadLine();
    count++;
    }
    }
    Console.WriteLine(count);

    Hope this helps.

  • Ah yes, ok. That one is the same for DSS and TRTH.