Skip to main content
Please find the documentation for how to call the Storage APIs using the Aryn SDK below. All parameters are optional unless specified otherwise.

DocSet Functions

Functions for managing document sets (DocSets) which are collections of documents.

Create DocSet

Create a new DocSet to store documents.
name
string
required
String name for the DocSet
properties
object
Optional dictionary of additional properties
schema
Schema
Optional Schema object defining document properties
prompts
object
Optional dictionary of prompts for the DocSet
from aryn_sdk.client.client import Client

docset = client.create_docset(name="My DocSet")
docset_id = docset.docset_id
A DocSetMetadata object containing:
docset_id
string
Unique identifier for the DocSet
name
string
Name of the DocSet
created
timestamp
Creation timestamp
readonly
boolean
Boolean indicating if DocSet is read-only
properties
object
Dictionary of custom properties
size
number
Size of DocSet in bytes
schema
Schema
Schema object defining document properties
prompts
object
Dictionary of prompts for the DocSet

Get DocSet

Retrieve metadata for a DocSet.
docset_id
string
required
The unique identifier of the DocSet to retrieve
docset = client.get_docset(docset_id="your-docset-id")
A DocSetMetadata object containing:
docset_id
string
Unique identifier for the DocSet
name
string
Name of the DocSet
created
timestamp
Creation timestamp
readonly
boolean
Boolean indicating if DocSet is read-only
properties
object
Dictionary of custom properties
size
number
Size of DocSet in bytes
schema
Schema
Schema object defining document properties
prompts
object
Dictionary of prompts for the DocSet
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “DocSet not found”
  • HTTPError 5xx: Internal Server Error

List DocSets

List all DocSets in the account.
page_size
integer
default:"100"
Number of items per page
page_token
string
Token for pagination
name_eq
string
Filter DocSets by exact name match
docsets = client.list_docsets().get_all()
for docset in docsets:
    print(f"DocSet: {docset.name}")
A paginated list of DocSetMetadata objects, each containing:
docset_id
string
Unique identifier for the DocSet
name
string
Name of the DocSet
created
timestamp
Creation timestamp
readonly
boolean
Boolean indicating if DocSet is read-only
properties
object
Dictionary of custom properties
size
number
Size of DocSet in bytes
schema
Schema
Schema object defining document properties
prompts
object
Dictionary of prompts for the DocSet
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 5xx: Internal Server Error

Delete DocSet

Delete a DocSet and all its documents.
docset_id
string
required
The unique identifier of the DocSet to delete
client.delete_docset(docset_id="your-docset-id")
The metadata of the deleted DocSet
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “DocSet not found”
  • HTTPError 5xx: Internal Server Error

Document Functions

Functions for managing individual documents within DocSets.

Add Document

Add a single document to the Aryn platform. This API calls DocParse to partition the document, and automatically extracts any properties registered as part of the DocSet schema.
file
BinaryIO | str | PathLike
required
A file opened in binary mode or a path specified as either a str or PathLike instance, or an HTTP URL indicating the document to add. The path can either be a local path or an Amazon S3 url starting with s3://. The URL can start with either https:// or http://. In the latter case, you must have boto3 installed and AWS credentials set up in your environment.
docset_id
str
required
The id of the DocSet into which to add the document.
options
str
DocParse options to use for partitioning the specified document. You can find more about specific options here
doc = client.add_doc(file="/path/to/myfile.pdf", docset_id="your-docset-id")
A DocumentMetadata object containing
account_id
string
Account identifier
doc_id
string
Document identifier
docset_id
string
Document set identifier
name
string
Document name
created_at
timestamp
Creation timestamp
size
number
Document size in bytes
content_type
string
MIME type of document
properties
object
Custom document properties
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “DocSet not found”
  • HTTPError 5xx: Internal Server Error

List Documents

List all documents in a DocSet.
docset_id
string
required
ID of the DocSet containing the documents
page_size
integer
default:"100"
Number of items per page
page_token
string
Token for pagination
docs = client.list_docs(docset_id="your-docset-id")
for doc in docs:
    print(f"Document: {doc.name}")
A paginated list of DocumentMetadata objects, each containing:
account_id
string
Account identifier
doc_id
string
Document identifier
docset_id
string
Document set identifier
name
string
Document name
created_at
timestamp
Creation timestamp
size
number
Document size in bytes
content_type
string
MIME type of document
properties
object
Custom document properties
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “DocSet not found”
  • HTTPError 400: “Invalid filter parameters”
  • HTTPError 5xx: Internal Server Error

Get Document

Get a document by ID.
docset_id
string
required
The unique identifier of the DocSet containing the document
doc_id
string
required
The unique identifier of the document to retrieve
include_elements
boolean
default:"true"
Boolean to include document elements
include_binary
boolean
default:"false"
Boolean to include binary data
doc = client.get_doc(docset_id="your-docset-id", doc_id="your-doc-id")
A Document object containing:
id
string
Document identifier
elements
array
List of document elements
properties
object
Document properties
binary_data
binary
Optional binary content of the document
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “Document not found”
  • HTTPError 5xx: Internal Server Error

Delete Document

Delete a document by ID.
docset_id
string
required
The unique identifier of the DocSet containing the document
doc_id
string
required
The unique identifier of the document to delete
client.delete_doc(docset_id="your-docset-id", doc_id="your-doc-id")
The metadata of the deleted document
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “Document not found”
  • HTTPError 5xx: Internal Server Error

Get Document Binary

Get the binary content of a document.
docset_id
string
required
The unique identifier of the DocSet containing the document
doc_id
string
required
The unique identifier of the document to retrieve
file
string
required
The file object to write the binary content to
output = "output.pdf"
client.get_doc_binary(docset_id="your-docset-id", doc_id="your-doc-id", file=output)
The binary content of the document
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “Document not found”
  • HTTPError 5xx: Internal Server Error

Asynchronous Document Functions

add_doc_async

Submit a document for asynchronous add_doc and get its task_id. The results of the task will remain available in the system for 48 hours. Meant to be used with get_async_result. Note: sending multiple asynchronous add_doc tasks at the same time does not guarantee that they will run simultaneously.
file
BinaryIO | str | PathLike
required
A file opened in binary mode or a path specified as either a str or PathLike instance, or an HTTP URL indicating the document to add. The path can either be a local path or an Amazon S3 url starting with s3://. The URL can start with either https:// or http://. In the latter case, you must have boto3 installed and AWS credentials set up in your environment.
docset_id
str
required
The id of the DocSet into which to add the document.
options
str
DocParse options to use for partitioning the specified document. You can find more about specific options here
with open("my-favorite-pdf.pdf", "rb") as f:
    res = client.add_doc_async(file=f, docset_id="your-docset-id")
    print(f"Task ID: {res.task_id}")
    doc_metadata = res.result().value
    print(f"Document ID: {doc_metadata.doc_id}")
A dict containing the task_id of the submitted request.
{
    "task_id": "aryn:t-47gpd3604e5tz79z1jro5fc"
}
User errors:
  • HTTPError: Error:status_code 403. Reason: "This async action requires you to upgrade your account plan"
  • Fix: Please upgrade your account here
  • HTTPError: Error:status_code 403. Reason: "No Aryn API Key provided"
  • Fix: Please provide an API key either as a parameter or specify it in the environment variable ARYN_API_KEY.
  • HTTPError: Error:status_code 403. Reason: "Invalid Aryn API key"
  • Fix: Please provide a valid API key either as a parameter or specify it in the environment variable ARYN_API_KEY.
  • HTTPError: Error:status_code 403. Reason: "Expired Aryn API key"
  • Fix: Please get a new API key here.
  • HTTPError: Error:status_code 429. Reason: "Too many requests"
  • Fix: Please try again after some time. Each account is allowed 1000 tasks to run at a time.
Other errors:
  • HTTPError: Error:status_code 5xx. Reason: Internal Server Error

get_async_result

Gets the results of an asynchronous add_doc task by task_id. Meant to be used with add_doc_async.
  • task_id: Required. A string of the task id to poll and attempt to get the result for.
  • aryn_api_key: An Aryn API key, provided as a string. You can get one for free at aryn.ai/get-started. Default is None (If not provided, the sdk will check for it in the environment variable ARYN_API_KEY or will look in aryn_config as specified above).
  • region: A string that specifies the region to use for the DocParse server. Valid values are US and None. Default is None, which uses the US region. Via the API, you can specify the region by modifying the base URL of the DocParse server.
  • aryn_config: An ArynConfig object (defined in aryn_sdk/config.py), used for finding an api key. If aryn_api_key is set it will override this. The default ArynConfig looks in the env var ARYN_API_KEY and then in the file ~/.aryn/config.yaml. Default is None (aryn-sdk will look in the aryn_api_key parameter, in your environment variables, and then in ~/.aryn/config.yaml).
  • ssl_verify: A bool that controls whether the client verifies the SSL certificate of the chosen DocParse server. ssl_verify is True by default, enforcing SSL verification.
import time

with open("/path/to/pdf", "rb") as f:
    res = client.add_doc_async(file=f, docset_id=docset_id)
    task_id = res.task_id

while True:
    result = client.get_async_result(task_id)
    if result.status_code == 202:
        print("Task still pending...")
    else:
        print(f"Task completed with status code: {result.status_code}")
        print(f"Result: {result.value}")
        break
    time.sleep(1)
A dict like the one in the example below containing “task_status”. When “task_status” is “done”, the returned dict also contains “result” which contains what would have been returned had add_doc been called directly. If there is an error with ingesting the file itself, then the “task_status” will still be “done” but the “result” will contain an “error” field indicating what went wrong.“task_status” can be “done” or “pending”.
{
    "task_status":"done",
    "result": ...
}
User errors:
  • HTTPError: Error:status_code 403. Reason: "This async action requires you to upgrade your account plan"
  • Fix: Please upgrade your account here
  • HTTPError: Error:status_code 403. Reason: "No Aryn API Key provided"
  • Fix: Please provide an API key either as a parameter or specify it in the environment variable ARYN_API_KEY.
  • HTTPError: Error:status_code 403. Reason: "Invalid Aryn API key"
  • Fix: Please provide a valid API key either as a parameter or specify it in the environment variable ARYN_API_KEY.
  • HTTPError: Error:status_code 403. Reason: "Expired Aryn API key"
  • Fix: Please get a new API key here.
  • aryn_sdk.partition.partition.PartitionTaskNotFoundError. Reason: "No such task"
  • Fix: Check to make sure the task_id specified is correct.
Other errors:
  • HTTPError: Error:status_code 5xx. Reason: Internal Server Error

cancel_async_task

Cancels the task associated with the task_id specified.
  • task_id: Required. A string of the task id to cancel.
  • aryn_api_key: An Aryn API key, provided as a string. You can get one for free at aryn.ai/get-started. Default is None (If not provided, the sdk will check for it in the environment variable ARYN_API_KEY or will look in aryn_config as specified above).
  • region: A string that specifies the region to use for the DocParse server. Valid values are US and None. Default is None, which uses the US region. Via the API, you can specify the region by modifying the base URL of the DocParse server.
  • aryn_config: An ArynConfig object (defined in aryn_sdk/config.py), used for finding an api key. If aryn_api_key is set it will override this. The default ArynConfig looks in the env var ARYN_API_KEY and then in the file ~/.aryn/config.yaml. Default is None (aryn-sdk will look in the aryn_api_key parameter, in your environment variables, and then in ~/.aryn/config.yaml).
  • ssl_verify: A bool that controls whether the client verifies the SSL certificate of the chosen DocParse server. ssl_verify is True by default, enforcing SSL verification.
with open("/path/to/pdf", "rb") as f:
    task = client.add_doc_async(file=f, docset_id=docset_id)
    task_id = task.task_id
    print(f"Cancelling task {task_id}")
    res = client.cancel_async_task(task)
    if res.status_code == 200:
        print("Cancel successful.")
No return value. Asynchronous tasks may only be successfully cancelled once. Once a task has been cancelled, any get_async_result calls using that task’s id will throw an exception.
User errors:
  • HTTPError: Error:status_code 403. Reason: "This async action requires you to upgrade your account plan"
  • Fix: Please upgrade your account here
  • HTTPError: Error:status_code 403. Reason: "No Aryn API Key provided"
  • Fix: Please provide an API key either as a parameter or specify it in the environment variable ARYN_API_KEY.
  • HTTPError: Error:status_code 403. Reason: "Invalid Aryn API key"
  • Fix: Please provide a valid API key either as a parameter or specify it in the environment variable ARYN_API_KEY.
  • HTTPError: Error:status_code 403. Reason: "Expired Aryn API key"
  • Fix: Please get a new API key here.
  • aryn_sdk.partition.partition.PartitionTaskNotFoundError. Reason: "No such task"
  • Fix: Check to make sure the task_id specified is correct.
Other errors:
  • HTTPError: Error:status_code 5xx. Reason: Internal Server Error

list_async_tasks

Lists all the add_doc tasks still running in your account.
  • aryn_api_key: An Aryn API key, provided as a string. You can get one for free at aryn.ai/get-started. Default is None (If not provided, the sdk will check for it in the environment variable ARYN_API_KEY or will look in aryn_config as specified above).
  • region: A string that specifies the region to use for the DocParse server. Valid values are US and None. Default is None, which uses the US region. Via the API, you can specify the region by modifying the base URL of the DocParse server.
  • aryn_config: An ArynConfig object (defined in aryn_sdk/config.py), used for finding an api key. If aryn_api_key is set it will override this. The default ArynConfig looks in the env var ARYN_API_KEY and then in the file ~/.aryn/config.yaml. Default is None (aryn-sdk will look in the aryn_api_key parameter, in your environment variables, and then in ~/.aryn/config.yaml).
  • ssl_verify: A bool that controls whether the client verifies the SSL certificate of the chosen DocParse server. ssl_verify is True by default, enforcing SSL verification.
tasks = client.list_async_tasks().value
print(tasks.tasks)
A dict like the one below which maps task_ids to a dict containing details of the respective task.
{
    "aryn:t-sc0v0lglkauo774pioflp4l": {
    "task_status": "pending"
},
    "aryn:t-b9xp7ny0eejvqvbazjhg8rn": {
    "task_status": "pending"
}
}
User errors:
  • HTTPError: Error:status_code 403. Reason: "This async action requires you to upgrade your account plan"
  • Fix: Please upgrade your account here
  • HTTPError: Error:status_code 403. Reason: "No Aryn API Key provided"
  • Fix: Please provide an API key either as a parameter or specify it in the environment variable ARYN_API_KEY.
  • HTTPError: Error:status_code 403. Reason: "Invalid Aryn API key"
  • Fix: Please provide a valid API key either as a parameter or specify it in the environment variable ARYN_API_KEY.
  • HTTPError: Error:status_code 403. Reason: "Expired Aryn API key"
  • Fix: Please get a new API key here.
Other errors:
  • HTTPError: Error:status_code 5xx. Reason: Internal Server Error

Properties Functions

Functions for managing document properties.

Update Document Properties

Update properties of a document.
docset_id
string
required
The unique identifier of the DocSet containing the document
doc_id
string
required
The unique identifier of the document to update
updates
array
required
List of ReplaceOperation objects defining property updates
from aryn_sdk.types import ReplaceOperation

updates = [
    ReplaceOperation(
        path="/properties/status",
        value="reviewed"
    )
]
client.update_doc_properties(
    docset_id="your-docset-id",
    doc_id="your-doc-id",
    operations=updates
)
The updated Document object containing:
id
string
Document identifier
elements
array
List of document elements
properties
object
Document properties
binary_data
binary
Optional binary content of the document
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “Document not found”
  • HTTPError 5xx: Internal Server Error

Extract Properties

Extract properties from a document.
docset_id
string
required
The unique identifier of the DocSet containing the documents
schema
object
required
Schema object defining properties to extract
from aryn_sdk.types.schema import Schema, SchemaField

schema = Schema(fields=[
    SchemaField(name="category", field_type="string")
])
client.extract_properties(docset_id="your-docset-id", schema=schema)
A job status object containing:
  • exit_status: The exit status of the job
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “DocSet not found”
  • HTTPError 5xx: Internal Server Error

Delete Properties

Delete properties from a document.
docset_id
string
required
The unique identifier of the DocSet containing the documents
schema
object
required
Schema object defining properties to delete
client.delete_properties(docset_id="your-docset-id", schema=schema)
A job status object
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “DocSet not found”
  • HTTPError 5xx: Internal Server Error

Client Options

  • region: A string that specifies the region to use for the DocParse server. Valid values are US and None. Default is None.
  • timeout: A float that specifies the timeout in seconds for the client. Default is 240.0.
from aryn_sdk.client import Client
client = Client(region="US", timeout=240.0)