TRTH: Retrieving range data using Time and Sales Data

Unknown · July 2017

Hi there,

I have a problem with extracting data from Tick History

I specified the range in the report request but couldn't retrieve all data. How can I retrieve all data I wrote in the code below? Any help would be appreciated.

Thank you,

body_data = json.dumps({
    "ExtractionRequest": {
        "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.TickHistoryTimeAndSalesExtractionRequest",
        "ContentFieldNames":[
            "Quote - Bid Price",
            "Quote - Bid Size",
            "Quote - Ask Price",
            "Quote - Ask Size"

        ],
        "IdentifierList": {
            "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.InstrumentIdentifierList",
            "InstrumentIdentifiers": [ { "Identifier": "JNIc1", "IdentifierType": "Ric" } ],
            "ValidationOptions": None,
            "UseUserPreferencesForValidationOptions": False
        },
        "Condition": {
            "MessageTimeStampIn": "",
            "ReportDateRangeType": "Range",
            "QueryStartDate":"2017-01-03T23:45:00.000Z",
            "QueryEndDate": "2017-01-06T20:30:00.000Z",
            "DisplaySourceRIC": True
        }
    }
})

responseGet = requests.post( "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractRaw",
                              data = body_data,
                              headers = header2)

res_json = responseGet.json()
job_id = res_json['JobId']
response_obj = requests.get( "https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/RawExtractionResults('{0}')/$value".format(job_id),
                            headers = header2, stream=True)

gzip_file = "jnic1.csv"


with open(gzip_file, 'wb') as f:
    for data in response_obj.raw.stream(decode_content=True):
        f.write(data)

Christiaan Meihsl · July 2017

YK, am I right in guessing that you are only receiving the first part of the expected data ? If yes, if you run the query several times (try at least 10 times), is the number of lines of received data always the same, or does it vary ? If yes, this might be related to a similar issue we saw in Java with libraries that were not robust enough and dropped the stream when decoding data on the fly.

I see you set decode_content=True. If I am not mistaken, that means the file will be decompressed before saving to disk. Can you try setting it to false ?

veerapath.rungruengrayubkul · July 2017

@YK
"

With the code, the extracted result starts at 2017-01-04T08:45:00.077934619+09

and ends at 2017-01-04T08:45:00.077934619+09. Do you receive the same result?

Also, how could you verify if all data is not received? Could you please elaborate?

Unknown · July 2017

@veerapath.rungruengrayubkul

Thanks your help.

I receive the same result by ur code.

I could not get the whole data because it didn't include the data I specified in the condition.

Unknown · July 2017

Christiaan Meihsl

After setting decode to false, I can get all data in a gzip file.

I still can't figure out why I cannot get the whole data by using decode_content = True.. Does it simply overflow the capacity of API? or some other reasons..

but it's ok it clears.

Thank you!

Christiaan Meihsl · July 2017

YK, in the similar issue I mentioned with the Java libraries, we observed that when the data set was small the decoding worked fine. But it started failing when the data set was larger. I guess many such libraries were tested on fairly small data sets, which correspond to the common use cases. With TRTH we are often handling large data sets, which is somewhat atypical, and it seems some libraries were just not built for that.

Glad I helped solve the issue.

xi.yang · March 2018

@Christiaan MeihslI found the example of getting the latest schedule files or venue files with decode_content=True. I am wondering why it works for that and why should we treat it differently compared to the on demand request for the decode_content

TRTH: Retrieving range data using Time and Sales Data

Best Answer

Answers

Categories