Output Label Options
Applying heuristics to tweak the output
Sometimes the output you get from Aryn DocParse isn’t exactly what you want. You can mitigate this by specifying the output_label_options
option which will apply simple heuristics to correct the output. Currently the only heuristic we support is
promote_title
which will check if there’s no title on the first page of the document and then intelligently choose one of other elements on the first page to promote to title.
Example
Let’s look at the example below:
You’ll notice that the “Aviation Investigation Final Report” is incorrectly detected as a “Caption” here. To fix this when using the aryn-sdk
, you can call partition_file
with the output_label_options
parameter:
This will return the following output:
The heuristic chooses to promote an element on the first page whose type is in the title_candidate_elements
list and has the largest font size.
Specify Output Label Options using curl
This is how you can use curl to specify these options:
Specify Output Label Options through Sycamore
This is how you can specify these options through sycamore:
Was this page helpful?