Partition Document
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
An array containing single integers (e.g., 1) or arrays with exactly two integers representing a range (e.g., [1, 10]).
A boolean value indicating whether to extract images from the document.
A boolean value indicating whether to extract table structure from the document.
A boolean value indicating whether to use OCR or not on the document.
A number between 0 and 1 indicating the threshold for document segmentation. Defaults to auto, which uses an automatic threshold.
The options for chunking the document. If not specified, then chunking will not be performed.
The strategy to use for merging chunks. Defaults to context_rich.
The tokenizer to use for chunking. Defaults to openai_tokenizer.
The options for the tokenizer. See the full documentation here
The maximum number of tokens per chunk. Defaults to 512.
A boolean value indicating whether to merge chunks across pages. Defaults to false. Not supported for the 'mixed_multi_column' strategy.
The format of the output. Defaults to json.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Headers
Body
An array containing single integers (e.g., 1) or arrays with exactly two integers representing a range (e.g., [1, 10]).
A boolean value indicating whether to extract images from the document.
A boolean value indicating whether to extract table structure from the document.
A boolean value indicating whether to use OCR or not on the document.
A number between 0 and 1 indicating the threshold for document segmentation. Defaults to auto, which uses an automatic threshold.
auto
The options for chunking the document. If not specified, then chunking will not be performed.
The strategy to use for merging chunks. Defaults to context_rich.
context_rich
, mixed_multi_column
, maximize_within_limit
The tokenizer to use for chunking. Defaults to openai_tokenizer.
openai_tokenizer
, character_tokenizer
, huggingface_tokenizer
The options for the tokenizer. See the full documentation here
The maximum number of tokens per chunk. Defaults to 512.
A boolean value indicating whether to merge chunks across pages. Defaults to false. Not supported for the 'mixed_multi_column' strategy.
The format of the output. Defaults to json.
json
, markdown
Response
The type of the element.
The bounding box of the element.
The properties of the element.
The text representation of the element.
The binary representation of the element.
Was this page helpful?