x-direct-download aws access key error

sgan208 · October 2017

Hi,

I'm trying to download the latest extracted file for an everyday schedule. I was using the x-direct-download header to download the files directly from the aws server. However I get an aws access key error. I guess the key should be something that must be provided from your api end when it redirects. If it is supposed to be provided by us, then how to get it? I get the following error messages when trying the direct download:

<Error><Code>SignatureDoesNotMatch</Code><Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message><AWSAccessKeyId>AKIAJVAI4XORJURKYMEA</AWSAccessKeyId><StringToSign>GET

text/plain
1508866448
x-amz-request-payer:requester
/tickhistory.query.production.hdc-results/569288DA08B4447C911246BAFDFF15E2/data/merged/merged.csv.gz?response-content-disposition=attachment; filename=latestdownload.csv.gz</StringToSign><SignatureProvided>jSLffd/FgvE9+OJxV4MJesQmru8=</SignatureProvided><StringToSignBytes>47 45 54 0a 0a 74 65 78 74 2f 70 6c 61 69 6e 0a 31 35 30 38 38 36 36 34 34 38 0a 78 2d 61 6d 7a 2d 72 65 71 75 65 73 74 2d 70 61 79 65 72 3a 72 65 71 75 65 73 74 65 72 0a 2f 74 69 63 6b 68 69 73 74 6f 72 79 2e 71 75 65 72 79 2e 70 72 6f 64 75 63 74 69 6f 6e 2e 68 64 63 2d 72 65 73 75 6c 74 73 2f 35 36 39 32 38 38 44 41 30 38 42 34 34 34 37 43 39 31 31 32 34 36 42 41 46 44 46 46 31 35 45 32 2f 64 61 74 61 2f 6d 65 72 67 65 64 2f 6d 65 72 67 65 64 2e 63 73 76 2e 67 7a 3f 72 65 73 70 6f 6e 73 65 2d 63 6f 6e 74 65 6e 74 2d 64 69 73 70 6f 73 69 74 69 6f 6e 3d 61 74 74 61 63 68 6d 65 6e 74 3b 20 66 69 6c 65 6e 61 6d 65 3d 6c 61 74 65 73 74 64 6f 77 6e 6c 6f 61 64 2e 63 73 76 2e 67 7a</StringToSignBytes><RequestId>6D0CF27BCFA8F1B8</RequestId><HostId>SsWFsDplls+pBOt4kIJ6oaHmPd2u5pajDLqEAgUzxOJYC3V8ihWBw14IyJAUV7wQLlEjvUTeDQg=</HostId></Error>

sgan208 · October 2017

Figured out the problem was with the requests library. Upgrading it made everything work perfectly.

Christiaan Meihsl · October 2017

sgan208,

You do not have to manage the AWS key, it should be transparent to you. Here is the mechanism:

When you send the call to retrieve your data with the X-Direct-Download: true header, you receive a response with HTTP status 302 (redirect). The
response header contains a redirection URI in item Location. It has this format:

https://s3.amazonaws.com/tickhistory.query.production.hdc-results/C15C8A6BBD824C5FB5BE39AD36BD9B16/data/merged/merged.csv.gz?AWSAccessKeyId=AKIAJVAI4XORJURKYMEA&Expires=1508181359&response-content-disposition=attachment%3B%20filename%3D_OnD_0x05e996379ebb3036.csv.gz&Signature=6TRD6XTbeesQ7165A3KF%2BDofgPs%3D&x-amz-request-payer=requester

This is a self signed URI, using an AWS Access Key Id which is included directly in the URI.

Most HTTP clients automatically follow the redirection, which means you have nothing to do. A call is made in the background to this URI, and the data is retrieved. You can actually check if the redirection works by sniffing the traffic, with a tool like Fiddler or equivalent.

Some HTTP clients support and follow the redirection, but fail to connect to AWS,
because they include the Authorization
header in the request message redirected to AWS, which then returns a BadRequest status (400) with the error:

“Only one auth
mechanism allowed; only the X-Amz-Algorithm query parameter, Signature query
string parameter or the Authorization header should be specified”.

That is described in an advisory.

But the error you encountered is different.

If this generic description of the mechanism is not sufficient to solve your issue:

You might want to look at some of our samples, they were recently enhanced to use AWS. Under the downloads tab there are the .Net SDK Tutorials code (2, 4 and 5 use AWS), the Java samples (2 for immediate schedules and 2 for On demand extractions) and a Python sample.
Alternatively, if you'd like us to help you further, we'd need to know exactly what workflow you followed, from the call to retrieve the data till the occurence of the error. The code you use would also be of help.

sgan208 · October 2017

Hi,

Here's the full code:

#!/bin/python
# coding: utf-8

import requests
import json
import shutil
import time
import sys


base_url = "https://hosted.datascopeapi.reuters.com/RestApi/v1/"
auth_req_url = base_url+"Authentication/RequestToken"
instrument_url = base_url + "Extractions/InstrumentLists"
get_scheduled_url = base_url + "Extractions/Schedules"
get_report_status_url = base_url + "Extractions/ReportExtractions"
download_extracted_url = base_url + "Extractions/ExtractedFiles"

requestHeaders={ "Prefer":"respond-async", "Content-Type":"application/json" }
credData = { 'Credentials' : { 'Username' : CENSORED, 'Password' : CENSORED} }

def getAuthToken( data = credData):
    r = requests.post(url = auth_req_url, json = data, headers = requestHeaders)
    if(r.status_code == 200):
        jsonResponse = json.loads(r.text.encode('ascii', 'ignore'))
        token = jsonResponse["value"]
        return token

def addToInstrumentList( token, listName, identifierList ):
    requestHeaders["Authorization"] = "token " + token
    url = instrument_url+"('" +listName + "')/ThomsonReuters.Dss.Api.Extractions.InstrumentListAppendIdentifiers"
    data = {"Identifiers": identifierList, "KeepDuplicates":False}
    print data
    r =requests.post(url, json=data, headers=requestHeaders )
    # if(r.status_code == 200):
    print r.status_code, r.text

def getLatestData( token, scheduleId ):
    requestHeaders["Authorization"] = "token " + token
    url = get_scheduled_url+ "('" + scheduleId + "')/LastExtraction"
    # print url
    r = requests.get(url, headers=requestHeaders)
    if(r.status_code==200):
        jsonResponse = json.loads(r.text.encode('ascii', 'ignore'))
        return jsonResponse["ReportExtractionId"]
    else:
        print "Can't get report id. Here's some debug info: ",r.status_code, r.text

def getReportFiles( token, extractionId ):
    requestHeaders["Authorization"] = "token " + token
    url = get_report_status_url + "('" + extractionId + "')/Files"
    r = requests.get( url, headers=requestHeaders )
    if(r.status_code == 200):
        jsonResponse = json.loads(r.text.encode('ascii', 'ignore'))["value"]
        fileTuple = {}
        fileTuple["notes"] = jsonResponse[0]["ExtractedFileId"]
        fileTuple["file"] = jsonResponse[1]["ExtractedFileId"]
        return fileTuple
    else:
        print "Can't get extracted files. Here's some debug info: ",r.status_code, r.text


def downloadReportFiles( token, fileId, outfile ):
    requestHeaders["Authorization"] = "token " + token
    requestHeaders["Accept-Encoding"] = "gzip"
    requestHeaders["Content-Type"] = "text/plain"
    requestHeaders["X-Direct-Download"] = "true"
    url = download_extracted_url + "('" + fileId + "')/$value"
    # print url, requestHeaders
    r = requests.get( url, headers=requestHeaders, stream=True )
    if(r.status_code == 302):
        print r
    r.raw.decode_content = False
    print r.status_code, r.headers["Content-Type"]#, r.headers["Content-Encoding"], r.headers["Content-Length"]
    fileName = outfile
    chunk_size = 1024
    rr = r.raw
    with open(fileName, 'wb') as fd:
        shutil.copyfileobj(rr, fd, chunk_size)

if __name__ == "__main__":
    token = getAuthToken()
    print "Token is: ",token
    scheduleId = CENSORED
    reportId = getLatestData(token, scheduleId)
    extractedFiles = getReportFiles(token, reportId)
    timestr = time.strftime("%Y%m%d-%H%M%S")
    outFileName = "download_"+timestr+".csv.gz"
    downloadReportFiles(token, extractedFiles["file"], outFileName)
    # notesFileName = "notes_"+timestr+".csv.gz"
    # downloadReportFiles(token, extractedFiles["notes"], notesFileName)

Also the problem seems machine specific. Works on some machine but on others I get this 403 forbidden. Any idea why?

Christiaan Meihsl · October 2017

sgan208, your code is ok, I have just tested it successfully.

Your last sentence makes me wonder: are you by chance behind a firewall or proxy ?

sgan208 · October 2017

No.There's no firewall. The python libraries may differ. I'll check and get back.

Jirapongse · October 2017

You can try to disable redirection in requests. For example:

requests.get(url, headers=requestHeaders, allow_redirects=False)

After that, the application will get the HTTP response with status code 302.

HTTP/1.1 302 Found
Cache-Control: no-cache
Date: Sun, 03 Sep 2017 10:34:17 GMT
Expires: -1
Location: https://s3.amazonaws.com/tickhistory.query.production.hdc-results/xxx/data/merged/merged.csv.gz?AWSAccessKeyId=xxx&amp;Expires=1504456458&amp;response-content-disposition=attachment; filename=_OnD_0x05dbb5f5a62b3016.csv.gz&amp;Signature=xxx&amp;x-amz-request-payer=requester

The Location header in the response contains the AWS URL for download. Then, the application needs to send another GET request with this AWS URL without any headers to download a file.

Christiaan Meihsl · October 2017

sgan208, thank you for letting us know :-)

x-direct-download aws access key error

Best Answer

Answers

Categories