How to sort RDP tick history files by size
Following is a sample of response by pulling a file-set via RDP CFS APIs. Can you advise how to use 'jq' to sort the response by the value of 'size'? Thanks.
{
"value": [
{
"id": "4a8e-1abb-6fd8c40c-94f4-352733f33ebc",
"name": "salesforce-1-XXX_normalised_89568BBAF27C4E2EAC292CC28D80E875",
"bucketName": "TICKHISTORY_CUSTOM_ARCHIVE",
"packageId": "4a13-95e9-c37783d6-bba0-4851594758d0",
"attributes": [
{
"name": "fileName",
"value": "hmh tas 2013"
},
{
"name": "size",
"value": "1858351473"
},
{
"name": "view",
"value": "normalised"
},
{
"name": "message_id",
"value": "e9d7e3cb-e0ff-4f4c-8150-d327a0f9e51b"
},
{
"name": "md5",
"value": "1de22361eebb6993217e5da10e9a4798"
},
{
"name": "releaseTime",
"value": "2022-08-17T10:45:28Z"
}
],
"files": [
"4478-d443-7f611230-bdf4-86c64a1deb05",
"45d2-4891-60c79c56-a3bd-71ddb0ba2425",
"4761-13f8-a64f6b32-ae82-250cc77c6b99"
],
"numFiles": 3,
"availableFrom": "2022-08-17T12:07:02Z",
"availableTo": "2023-01-13T18:26:30Z",
"status": "READY",
"created": "2022-08-17T12:07:03Z",
"modified": "2022-08-17T12:07:04Z"
},
{
"id": "4aa5-3b50-d2924fcb-8633-032e977e0041",
"name": "salesforce-1-XXX_normalised_37316F8A011E432AAA540E5868132330",
"bucketName": "TICKHISTORY_CUSTOM_ARCHIVE",
"packageId": "4a13-95e9-c37783d6-bba0-4851594758d0",
"attributes": [
{
"name": "md5",
"value": "e55e6731f5b008d28e1c070436bc5ebb"
},
{
"name": "view",
"value": "normalised"
},
{
"name": "message_id",
"value": "b6a715b3-e1f5-4b1f-bdb4-3aded9173939"
},
{
"name": "releaseTime",
"value": "2022-08-17T10:45:26Z"
},
{
"name": "fileName",
"value": "XXX index tas 2007"
},
{
"name": "size",
"value": "1197364640"
}
],
"files": [
"41c3-80e1-f008c970-b117-72dfa0d856c3",
"4565-eed3-1caf7c98-9872-5dd0fe328f9f",
"45be-0607-b8b66964-8f94-2369ce109094"
],
"numFiles": 3,
"availableFrom": "2022-08-17T11:03:06Z",
"availableTo": "2023-01-13T16:34:39Z",
"status": "READY",
"created": "2022-08-17T11:03:08Z",
"modified": "2022-08-17T11:03:10Z"
},
{
"id": "4ab0-7d5e-fefa8fd5-98e2-6a34b035cfff",
"name": "salesforce-1-XXX_normalised_BB8B299D6C634AF580CE9DAC7C5F28A8",
"bucketName": "TICKHISTORY_CUSTOM_ARCHIVE",
"packageId": "4a13-95e9-c37783d6-bba0-4851594758d0",
"attributes": [
{
"name": "md5",
"value": "9552ea275e7eb1987ee1627f845a3a4b"
},
{
"name": "releaseTime",
"value": "2022-08-17T10:45:56Z"
},
{
"name": "message_id",
"value": "9e719be7-0dde-4a01-b954-7f4ab9348d16"
},
{
"name": "size",
"value": "2850978035"
},
{
"name": "view",
"value": "normalised"
},
{
"name": "fileName",
"value": "hcus tas 2021"
}
],
"files": [
"4708-70f8-41d7d449-b522-2c3694b62e79",
"4c5c-d187-3e002a0b-a6a5-6c957685a0a8",
"4f18-c3cb-7d8daf29-acff-1d1b01b18030"
],
"numFiles": 3,
"availableFrom": "2022-08-17T12:09:12Z",
"availableTo": "2023-01-18T09:48:47Z",
"status": "READY",
"created": "2022-08-17T12:09:13Z",
"modified": "2022-08-17T12:09:14Z"
}
]
Best Answer
-
Please try this query:
jq-win64.exe ".value |= sort_by(.attributes | from_entries | .size | tonumber)"
To reverse the order, please use this query.
jq-win64.exe ".value |= sort_by(.attributes | from_entries | .size | tonumber) | .value |= reverse"
0
Answers
-
Hello @Tony.Zhu_R ,
I think you are requesting Filsets:
https://{{RDP_HOST}}/file-store/{{RDP_VERSION}}/file-sets?bucket={{RDP_TH_Bucket}}&attributes=venue:{{CFS_VBD_VENUE}},view:normalised
Which makes sorting on file size challenging?
Try requesting File details by Fileset id:
https://{{RDP_HOST}}/file-store/{{RDP_VERSION}}/files?filesetId={{fileSetId}}
The output should look like:
{
"value": [
{
"id": "4709-44ae-ae48bed4-8651-6ca30681bec8",
"filename": "LSE-2018-06-01-NORMALIZEDMP-Report-1-of-1.csv.gz",
"filesetId": "4000-ee57-9e402927-8e3e-a967e20d26f9",
"fileType": "File",
"description": "Merged data file",
"storageLocation": {
"url": "https://s3-us-east-1.amazonaws.com/a205143-use1-prod-results-vbd/normalised/LSE/2018-06-01/data/merged/LSE-2018-06-01-NORMALIZEDMP-Report-1-of-1.csv.gz",
"rolearn": "arn:aws:iam::259431915815:role/a205143-prod-push-mechanism-EdsCfsS3Access",
"@type": "s3"
},
"created": "2022-06-25T02:33:09Z",
"modified": "2022-06-25T02:33:09Z",
"href": "https://api.refinitiv.com/file-store/v1/files/4709-44ae-ae48bed4-8651-6ca30681bec8/stream",
"fileSizeInBytes": 904974,
"md5": "1d9d418dd96e7659c055b857c2bfc0a9"
},
{
"id": "497f-2e71-5796a24e-8162-917fef100c75",
"filename": "LSE-2018-06-01-NORMALIZEDMP-Data-1-of-1.csv.gz",
"filesetId": "4000-ee57-9e402927-8e3e-a967e20d26f9",
"fileType": "File",
"description": "Merged data file",
"storageLocation": {
"url": "https://s3-us-east-1.amazonaws.com/a205143-use1-prod-results-vbd/normalised/LSE/2018-06-01/data/merged/LSE-2018-06-01-NORMALIZEDMP-Data-1-of-1.csv.gz",
"rolearn": "arn:aws:iam::259431915815:role/a205143-prod-push-mechanism-EdsCfsS3Access",
"@type": "s3"
},
"created": "2022-06-25T02:33:09Z",
"modified": "2022-06-25T02:33:09Z",
"href": "https://api.refinitiv.com/file-store/v1/files/497f-2e71-5796a24e-8162-917fef100c75/stream",
"fileSizeInBytes": 1067179413,
"md5": "591c5c30619f26e046cfc8ea1857fa5a"
}
]
}Enabling you to sort on fileSizeInBytes.
Will this work for your use case, or you are looking for something different?
0 -
The JSON I put in above is got from
https://api.refinitiv.com/file-store/v1/file-sets?bucket=TICKHISTORY_CUSTOM_ARCHIVE
Normally, it responds with 25 filesetIds but up to 100 when pageSize = 100 specified.
What I asked is how to use jq to sort those filesetIDs by its attribute of size.
{
"name": "size",
"value": "2850978035"
},For instance, I need to filter out those filesetIDs with sizes of more than 100GBytes what's the filter to use for jq if the total number of filesetIDs is hundreds?
0 -
Hello @Tony.Zhu_R ,
Let me try to explain what I meant, hoping that the explanation will help clarify. I believe that the size that you obtain from the call is not the size of a file, rather is the size of a fileset. For example:
https://{{RDP_HOST}}/file-store/{{RDP_VERSION}}/file-sets?bucket={{RDP_TH_Bucket}}&attributes=venue:{{CFS_VBD_VENUE}},view:normalised
Results in:
{
"value": [
{
"id": "4000-0fef-8c14bb0e-ac4a-9d9356699d2f",
"name": "LSE_normalised_2007-05-28",
"bucketName": "TICKHISTORY_VBD_UNLIMITED",
"packageId": "40eb-1240-f887f89f-b6e2-ff08ec2e2603",
"attributes": [
{
"name": "message_id",
"value": "f546fa27-7d1f-477a-89a6-ba2f3ca325e2"
},
{
"name": "view",
"value": "normalised"
},
{
"name": "depth",
"value": "UNLIMITED"
},
{
"name": "size",
"value": "23001"
},
{
"name": "releaseTime",
"value": "2007-05-29T04:00:00.000000000Z"
},
{
"name": "venue",
"value": "LSE"
},
{
"name": "md5",
"value": "565390e25c255a712e3d42ef790e07f0"
}
],
"files": [
"4414-fdfd-35d0e262-be9a-664b4ea09787",
"44d8-267c-5a5af2c9-adbe-9abe81c1c060"
],
"numFiles": 2,Whereas:
https://{{RDP_HOST}}/file-store/{{RDP_VERSION}}/files?filesetId=4000-0fef-8c14bb0e-ac4a-9d9356699d2f
Results in:
"value": [
{
"id": "4414-fdfd-35d0e262-be9a-664b4ea09787",
"filename": "LSE-2007-05-28-NORMALIZEDMP-Report-1-of-1.csv.gz",
"filesetId": "4000-0fef-8c14bb0e-ac4a-9d9356699d2f",
"fileType": "File",
"description": "Merged data file",
"storageLocation": {
"url": "https://s3-us-east-1.amazonaws.com/a205143-use1-prod-results-vbd/normalised/LSE/2007-05-28/data/merged/LSE-2007-05-28-NORMALIZEDMP-Report-1-of-1.csv.gz",
"rolearn": "arn:aws:iam::259431915815:role/a205143-prod-push-mechanism-EdsCfsS3Access",
"@type": "s3"
},
"created": "2021-04-22T15:33:18Z",
"modified": "2021-04-22T15:33:18Z",
"href": "https://api.refinitiv.com/file-store/v1/files/4414-fdfd-35d0e262-be9a-664b4ea09787/stream",
"fileSizeInBytes": 22560,
"md5": "fda504b3a6efa56a34db998e2128b7f0"
},
{
"id": "44d8-267c-5a5af2c9-adbe-9abe81c1c060",
"filename": "LSE-2007-05-28-NORMALIZEDMP-Data-1-of-1.csv.gz",
"filesetId": "4000-0fef-8c14bb0e-ac4a-9d9356699d2f",
"fileType": "File",
"description": "Merged data file",
"storageLocation": {
"url": "https://s3-us-east-1.amazonaws.com/a205143-use1-prod-results-vbd/normalised/LSE/2007-05-28/data/merged/LSE-2007-05-28-NORMALIZEDMP-Data-1-of-1.csv.gz",
"rolearn": "arn:aws:iam::259431915815:role/a205143-prod-push-mechanism-EdsCfsS3Access",
"@type": "s3"
},
"created": "2021-04-22T15:33:18Z",
"modified": "2021-04-22T15:33:18Z",
"href": "https://api.refinitiv.com/file-store/v1/files/44d8-267c-5a5af2c9-adbe-9abe81c1c060/stream",
"fileSizeInBytes": 441,
"md5": "4da4c501efd0f042bb7d1c1b5130306c"
}
]23001 = 22560+441
Before starting to parse this, are you looking for sizes of files, or sizes of FileSets, that can potentially contain multiple files?
If you are looking for sizes of FileSets, you can paginate over your results, concatenate them into a dataframe and derive fileset sizes.
If you are looking for sizes of files, you will need an extra step- to paginate through FileSets results from the bucket, and request either File details by FilesetID or File Details by FileID. Both will contain the sizes of the discrete files. You could concatenate these results into a dataframe and obtain all file sizes from this dataframe.
At this time, I am not seeing any files in TICKHISTORY_CUSTOM_ARCHIVE, but this is likely because I lack the permissions to access this bucket.
Does this explanation make sense, is this what you observe on your side?
0 -
Hi @Tony.Zhu_R ,
I would like to mention another approach that you may find of interest- RD Library Python.
Please see example:
Section: Browse Tick History buckets.
This I believe will fulfill your requirement, except for pagination.
Additionally see example:
Section: Get FileSets with pagination
0 -
Thanks for above sharing, @zoya faberov
It's not what I asked for, though.
Given the number of fileset in RTH HMC bucket varies, I take 3 as an example with following response.
{
"value": [
{
"id": "4030-fd1e-db11637d-b277-19bbd95c73ac",
"name": "salesforce-1-XXX_normalised_3C4C0EC0D2A643188BB5E9C1F8C329C0",
"bucketName": "TICKHISTORY_CUSTOM_ARCHIVE",
"packageId": "4a13-95e9-c37783d6-bba0-4851594758d0",
"attributes": [
{
"name": "size",
"value": "3634152545"
},
{
"name": "md5",
"value": "612d8ee7b77605bc342f148c16db2929"
},
{
"name": "message_id",
"value": "a5c136ba-4821-4898-a511-18bfa607c86f"
},
{
"name": "releaseTime",
"value": "2022-08-17T10:45:56Z"
},
{
"name": "fileName",
"value": "hmh tas 2015"
},
{
"name": "view",
"value": "normalised"
}
],
"files": [
"4745-b059-944935e3-a2c1-1f47875d39a1",
"4b21-037a-58a832d2-b486-ff831a2e223e",
"4bb1-dc27-81b6d485-8cbc-9854df8c651b"
],
"numFiles": 3,
"availableFrom": "2022-08-17T12:07:46Z",
"availableTo": "2023-01-18T08:13:10Z",
"status": "READY",
"created": "2022-08-17T12:07:46Z",
"modified": "2022-08-17T12:07:47Z"
},
{
"id": "4039-98c8-91ff43cc-95c8-646c7f51f25c",
"name": "salesforce-1-XXX_normalised_1D80CBA8BE694370B21F08080015732C",
"bucketName": "TICKHISTORY_CUSTOM_ARCHIVE",
"packageId": "4a13-95e9-c37783d6-bba0-4851594758d0",
"attributes": [
{
"name": "message_id",
"value": "94344e73-7d97-4c8a-a773-5246edfeaac3"
},
{
"name": "md5",
"value": "51c1b3242c42307a6b51383e7d9e2f65"
},
{
"name": "releaseTime",
"value": "2022-08-17T10:45:27Z"
},
{
"name": "fileName",
"value": "hsi tas 2017"
},
{
"name": "view",
"value": "normalised"
},
{
"name": "size",
"value": "11812238421"
}
],
"files": [
"42b7-420b-b2e9cb81-9340-15f12d2e1b52",
"43d9-8883-8c8649e2-a96c-eaab9b70bed3",
"4e8e-c1a1-60ce42ad-bcf2-9cfa55b8578a"
],
"numFiles": 3,
"availableFrom": "2022-08-17T12:05:46Z",
"availableTo": "2023-01-13T17:37:03Z",
"status": "READY",
"created": "2022-08-17T12:05:46Z",
"modified": "2022-08-17T12:05:47Z"
},
{
"id": "4061-5f65-2a3a0753-aab7-9f2abaec0016",
"name": "salesforce-1-XXX_normalised_88D37A2541B44CEEBE715A1300C4F175",
"bucketName": "TICKHISTORY_CUSTOM_ARCHIVE",
"packageId": "4a13-95e9-c37783d6-bba0-4851594758d0",
"attributes": [
{
"name": "releaseTime",
"value": "2022-08-17T10:45:56Z"
},
{
"name": "message_id",
"value": "b94f669b-a925-4cca-ba7c-911d2811223a"
},
{
"name": "size",
"value": "17227956"
},
{
"name": "view",
"value": "normalised"
},
{
"name": "md5",
"value": "c8db431e72af8ae0599bf9eead6a2a48"
},
{
"name": "fileName",
"value": "hmce tas 2005"
}
],
"files": [
"4842-db68-deb53105-ad1b-48c4126d38d4",
"4cd5-e0e2-543e2fd3-b8c8-0738f180eb24",
"4ed5-1054-3b985648-8301-682f0c61e3ba"
],
"numFiles": 3,
"availableFrom": "2022-08-17T12:08:20Z",
"availableTo": "2023-01-18T08:33:00Z",
"status": "READY",
"created": "2022-08-17T12:08:21Z",
"modified": "2022-08-17T12:08:21Z"
}
]
}It's nothing to do with exact file size irrespective of fileset or its constituent files. The question is what kind of filter to use with 'jq' to sort the above response by its attribute below.
{
"name": "size",
"value": "17227956"
},For instance, I like to see the all filesets in order of its size, largest to smallest, and vice versa.
So, any idea what kind of 'jq' filter can achieve it?
0 -
Hi @Tony.Zhu_R ,
A quick disclaimer- I am not permissioned for the same content in custom archive bucket. I am testing with TICKHISTORY_VBD_UNLIMITED, which I believe to be structured the same as custom. But if you see any discrepancies in the format on your side- please let me know.
What I would like to discuss is not the only approach to sorting this result, but possibly the simplest: putting the result into a dataframe. The added benefit can be that if you would require additional transformations, next, you can do them with a dataframe making them simpler as well.
df = pd.json_normalize(jsonFullResp['value'])
Next, I create a new column size, that contains the size value, and convert the value to integer, so that we can sort the column as containing integer values, not as strings.
for index, row in df.iterrows():
print(index, [sub['value'] for sub in df['attributes'][index] if sub['name'] == 'size'])
df.loc[index,'size'] = [sub['value'] for sub in df['attributes'][index] if sub['name'] == 'size']
df.loc[index,'size'] = int(df.loc[index,'size'])My result looks thus:
And now I can sort it:
df2 = df.sort_values(by=['size'])
df2Getting:
Please let us know if this is what you are looking to do?
0 -
Hi @Tony.Zhu_R ,
I cannot test with your exact result at the moment, but I would expect the larger result sets not to be made available in a single request, and to require iterative pagination via skip token.
If that is the case, we can concatenate each new page of the results to the total results dataframe, and sort by integer size only once the result that is obtained is complete.
Let us know if this is something that you observe to be required, and would like to discuss more?
0 -
As raised initially, I'm looking for 'filter' to run with jq so sort the fileset by its 'size'
Can you use above sample of 3 filesets and sort them by size of each with jq instead of pandas?
0
Categories
- All Categories
- 6 AHS
- 37 Alpha
- 161 App Studio
- 4 Block Chain
- 4 Bot Platform
- 16 Connected Risk APIs
- 47 Data Fusion
- 30 Data Model Discovery
- 608 Datastream
- 1.3K DSS
- 577 Eikon COM
- 4.9K Eikon Data APIs
- 7 Electronic Trading
- Generic FIX
- 7 Local Bank Node API
- Trading API
- 2.7K Elektron
- 1.3K EMA
- 236 ETA
- 519 WebSocket API
- 33 FX Venues
- 10 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 20 Messenger Bot
- 2 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 59 Open Calais
- 264 Open PermID
- 39 Entity Search
- 2 Org ID
- PAM
- PAM - Logging
- 8.4K Private Comments
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 20 RDMS
- 1.4K Refinitiv Data Platform
- 367 Refinitiv Data Platform Libraries
- 3 Refinitiv Due Diligence
- LSEG Due Diligence Portal API
- 3 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.1K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 10 World-Check Customer Risk Screener
- 990 World-Check One
- 44 World-Check One Zero Footprint
- 45 Side by Side Integration API
- Test Space
- 3 Thomson One Smart
- 1.2K TR Internal
- Global Hackathon 2015
- 2 Specialists Who Code
- 10 TR Knowledge Graph
- 150 Transactions
- 142 REDI API
- 1.7K TREP APIs
- 4 CAT
- 21 DACS Station
- 117 Open DACS
- 1.1K RFA
- 103 UPA
- 172 TREP Infrastructure
- 224 TRKD
- 886 TRTH
- 5 Velocity Analytics
- 5 Wealth Management Web Services
- 59 Workspace SDK
- 9 Element Framework
- 5 Grid
- 13 World-Check Data File
- Yield Book Analytics
- 46 中文论坛