Storage and search for your parsed documents.
docparse_storage
) to use. Documents are automatically added to it for Free Trial customers, unless you create and specify a different DocSet. Think of a DocSet like a folder for your processed Docs, and it’s optimized to store and index the elements and metadata from each Doc.
By default, DocParse will add your processed Doc to the default DocSet named docparse_storage
. You can create a new DocSet using the Aryn UI or create-docset
API, and specify it using the add_to_docset_id
when partitioning a document:
list-docsets
API.
For Pay As You Go (PAYG) customers, you can opt-out of storing your documents in two ways:
add_to_docset_id
parameter in the Partition
API.Partition
API calls.properties
) extracted from your document.
You can also use the get-doc
API to retrieve the parsed Doc, or get-doc-binary
to get the original document.
properties
) from your documents using GenAI for documents in Aryn DocSets. Properties are stored as part of your document in key:value pairs (property_name:property_value), and extracted using an LLM from all the documents in your DocSet. You can use the Extract Properties feature on the DocSet page or using the extract-properties
API.
From the DocSets tab in the left nav, click on your DocSet to open it. Then, click on the Extract button, and then select Add Property. You can add up to 15 properties in the UI, and hundreds when using the API directly. Next, add the information to guide the GenAI model to properly extract your property:
Name
: The name of the property. This is the key in the key:value pair.
Type
: The type of value to extract. Choose between String
, Number
, or Boolean
.
Description
: The description of the property being extracted.
Default Value
: If the LLM does not find a value to exract, this is what will be placed as the value for the property.
Examples
: These are comma separated example property values. The LLM will use these as examples of what a value might be for a specific property.
After providing this information, click Add Property. Then, click Extract. Aryn will run a job to extract the properties specified, and share a Task ID so you can monitor the task’s progress. Completed Tasks will disappear from the Tasks page when complete.
You can view your newly extracted properties when viewing a document in the DocSet by selecting the Properties tab in the Document viewer.