Looking for tricky markup in Novus documents

Is there a way to search Novus (via Easel or the Novus API et al.) for documents that contain the markup `">"`? So for example in document "ID67A23CA17E311E2B11EA85D0B248D27" it contains markup: >>>NOTICE REGARDING PRO HAC VICE MOTION. REGARDING DOCUMENT NO. ... Specifically, i'm looking to locate all of the documents that have this `>` in a given collection.

Best Answer

  • Hi Scott, I'm a Novus Consultant, and I'll take a stab at this. So, Novus is going to resolve the & to just & when the content is indexed for searching, and so the resulting term for searching is >. From that, Novus is going to strip the & and the ; as connectors/punctuation, and simply index the term gt. You can search the collection (N_DFEDDIST00 in this case) for gt, but that will generate many documents. #gt cuts it down (exact match syntax, which eliminates some variants), but you are still left with 8K+ documents. Looking at some documents (in Easel where I've been playing with this), there are many documents with valid gt terms like initials or company names that have GT with spaces on either side, which are obvious distractions. Your example had a couple gt terms back to back, so I ran a #gt +1 #gt search (find the term gt one word ahead of the term gt) and that narrowed it down to under 1,000. You'd have to look at the results to see if this gets you close enough. Also, you probably have content with gt by itself (not within 1 word of another gt) that this doesn't help you with. But, that's the best I could come up with.
