Skip to main content
Please find the documentation for how to call the Storage APIs using the Aryn SDK below. All parameters are optional unless specified otherwise.

DocSet Functions

Functions for managing document sets (DocSets) which are collections of documents.

Create DocSet

Create a new DocSet to store documents.
name
string
required
String name for the DocSet
properties
object
Optional dictionary of additional properties
schema
Schema
Optional Schema object defining document properties
prompts
object
Optional dictionary of prompts for the DocSet
from aryn_sdk.client.client import Client

docset = client.create_docset(name="My DocSet")
docset_id = docset.docset_id
A DocSetMetadata object containing:
docset_id
string
Unique identifier for the DocSet
name
string
Name of the DocSet
created
timestamp
Creation timestamp
readonly
boolean
Boolean indicating if DocSet is read-only
properties
object
Dictionary of custom properties
size
number
Size of DocSet in bytes
schema
Schema
Schema object defining document properties
prompts
object
Dictionary of prompts for the DocSet

Get DocSet

Retrieve metadata for a DocSet.
docset_id
string
required
The unique identifier of the DocSet to retrieve
docset = client.get_docset(docset_id="your-docset-id")
A DocSetMetadata object containing:
docset_id
string
Unique identifier for the DocSet
name
string
Name of the DocSet
created
timestamp
Creation timestamp
readonly
boolean
Boolean indicating if DocSet is read-only
properties
object
Dictionary of custom properties
size
number
Size of DocSet in bytes
schema
Schema
Schema object defining document properties
prompts
object
Dictionary of prompts for the DocSet
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “DocSet not found”
  • HTTPError 5xx: Internal Server Error

List DocSets

List all DocSets in the account.
page_size
integer
default:"100"
Number of items per page
page_token
string
Token for pagination
name_eq
string
Filter DocSets by exact name match
docsets = client.list_docsets().get_all()
for docset in docsets:
    print(f"DocSet: {docset.name}")
A paginated list of DocSetMetadata objects, each containing:
docset_id
string
Unique identifier for the DocSet
name
string
Name of the DocSet
created
timestamp
Creation timestamp
readonly
boolean
Boolean indicating if DocSet is read-only
properties
object
Dictionary of custom properties
size
number
Size of DocSet in bytes
schema
Schema
Schema object defining document properties
prompts
object
Dictionary of prompts for the DocSet
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 5xx: Internal Server Error

Delete DocSet

Delete a DocSet and all its documents.
docset_id
string
required
The unique identifier of the DocSet to delete
client.delete_docset(docset_id="your-docset-id")
The metadata of the deleted DocSet
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “DocSet not found”
  • HTTPError 5xx: Internal Server Error

Document Functions

Functions for managing individual documents within DocSets.

Add Document

Add a single document to the Aryn platform. This API calls DocParse to partition the document, and automatically extracts any properties registered as part of the DocSet schema.
file
BinaryIO | str | PathLike
required
A file opened in binary mode or a path specified as either a str or PathLike instance, or an HTTP URL indicating the document to add. The path can either be a local path or an Amazon S3 url starting with s3://. The URL can start with either https:// or http://. In the latter case, you must have boto3 installed and AWS credentials set up in your environment.
docset_id
str
required
The id of the DocSet into which to add the document.
options
str
DocParse options to use for partitioning the specified document. You can find more about specific options here
doc = client.add_doc(file="/path/to/myfile.pdf", docset_id="your-docset-id")
A DocumentMetadata object containing
account_id
string
Account identifier
doc_id
string
Document identifier
docset_id
string
Document set identifier
name
string
Document name
created_at
timestamp
Creation timestamp
size
number
Document size in bytes
content_type
string
MIME type of document
properties
object
Custom document properties
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “DocSet not found”
  • HTTPError 5xx: Internal Server Error

List Documents

List all documents in a DocSet.
docset_id
string
required
ID of the DocSet containing the documents
page_size
integer
default:"100"
Number of items per page
page_token
string
Token for pagination
docs = client.list_docs(docset_id="your-docset-id")
for doc in docs:
    print(f"Document: {doc.name}")
A paginated list of DocumentMetadata objects, each containing:
account_id
string
Account identifier
doc_id
string
Document identifier
docset_id
string
Document set identifier
name
string
Document name
created_at
timestamp
Creation timestamp
size
number
Document size in bytes
content_type
string
MIME type of document
properties
object
Custom document properties
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “DocSet not found”
  • HTTPError 400: “Invalid filter parameters”
  • HTTPError 5xx: Internal Server Error

Get Document

Get a document by ID.
docset_id
string
required
The unique identifier of the DocSet containing the document
doc_id
string
required
The unique identifier of the document to retrieve
include_elements
boolean
default:"true"
Boolean to include document elements
include_binary
boolean
default:"false"
Boolean to include binary data
doc = client.get_doc(docset_id="your-docset-id", doc_id="your-doc-id")
A Document object containing:
id
string
Document identifier
elements
array
List of document elements
properties
object
Document properties
binary_data
binary
Optional binary content of the document
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “Document not found”
  • HTTPError 5xx: Internal Server Error

Delete Document

Delete a document by ID.
docset_id
string
required
The unique identifier of the DocSet containing the document
doc_id
string
required
The unique identifier of the document to delete
client.delete_doc(docset_id="your-docset-id", doc_id="your-doc-id")
The metadata of the deleted document
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “Document not found”
  • HTTPError 5xx: Internal Server Error

Get Document Binary

Get the binary content of a document.
docset_id
string
required
The unique identifier of the DocSet containing the document
doc_id
string
required
The unique identifier of the document to retrieve
file
string
required
The file object to write the binary content to
output = "output.pdf"
client.get_doc_binary(docset_id="your-docset-id", doc_id="your-doc-id", file=output)
The binary content of the document
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “Document not found”
  • HTTPError 5xx: Internal Server Error

Properties Functions

Functions for managing document properties.

Update Document Properties

Update properties of a document.
docset_id
string
required
The unique identifier of the DocSet containing the document
doc_id
string
required
The unique identifier of the document to update
updates
array
required
List of ReplaceOperation objects defining property updates
from aryn_sdk.types import ReplaceOperation

updates = [
    ReplaceOperation(
        path="/properties/status",
        value="reviewed"
    )
]
client.update_doc_properties(
    docset_id="your-docset-id",
    doc_id="your-doc-id",
    operations=updates
)
The updated Document object containing:
id
string
Document identifier
elements
array
List of document elements
properties
object
Document properties
binary_data
binary
Optional binary content of the document
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “Document not found”
  • HTTPError 5xx: Internal Server Error

Extract Properties

Extract properties from a document.
docset_id
string
required
The unique identifier of the DocSet containing the documents
schema
object
required
Schema object defining properties to extract
from aryn_sdk.types.schema import Schema, SchemaField

schema = Schema(fields=[
    SchemaField(name="category", field_type="string")
])
client.extract_properties(docset_id="your-docset-id", schema=schema)
A job status object containing:
  • exit_status: The exit status of the job
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “DocSet not found”
  • HTTPError 5xx: Internal Server Error

Delete Properties

Delete properties from a document.
docset_id
string
required
The unique identifier of the DocSet containing the documents
schema
object
required
Schema object defining properties to delete
client.delete_properties(docset_id="your-docset-id", schema=schema)
A job status object
  • HTTPError 403: “No Aryn API Key provided”
  • HTTPError 403: “Invalid Aryn API key”
  • HTTPError 403: “Expired Aryn API key”
  • HTTPError 404: “DocSet not found”
  • HTTPError 5xx: Internal Server Error
I