Offsets in the response are not valid

I have parsed the json file and found out that it is quite hard to align the offset defined for the entity with its place in the raw text input. There are several reasons for it:

1. Each document has its own additional offset (metadata with hash and other info) which makes the initial offset number invalid.

2. Newlines and any symbols that do not get encoded properly (e.g., "company\u2019s") move the offset to the extent where the index we need cannot be restored.

Could you please help me figure out the simplest way to process offsets?

Best Answer

Answers