Getting Started with Aryn DocParse
aryn-sdk
client, or curl
. For building multi-stage document ETL pipelines that use DocParse for parsing, visit the Sycamore documentation.
For this Quickstart, you will need an Aryn account. You can get one and use for free here.
aryn-sdk
(the UI is limited to 25 pages per document).
Once the document is processed, you will see a visualized document segmentation with labeled bounding boxes. You can choose to download and check out the structured JSON output, which is the output of DocParse. Additionally, you can download the visual of the segmented PDF. If you prefer markdown output, please use the aryn-sdk
.
Now that you have seen how DocParse can segment complex documents, extract tables, and more, you can use the aryn-sdk
to leverage DocParse in your application or the Sycamore document ETL library to load the output into vector databases.
For additional questions on getting started, please join the Slack community here or email us.
aryn-sdk
aryn-sdk
client is a thin python library that calls Aryn DocParse and provides a few utility methods around it. It is the easiest way to add Aryn DocParse to your applications or custom data processing pipelines. You can view an example in this notebook.
You will need to use the API key for your account with the Aryn SDK. To retrieve it, click on the API key tab in the left nav in the Aryn UI.
For more information, see the Aryn SDK documentation or API reference.
curl
aryn-sdk
, but you can also use curl
to access Aryn DocParse directly.
curl
an example document to use with DocParse, if you do not have one already.
PUT API KEY HERE
below to your Aryn API key. If you have a different document, change @document.pdf
to @/path/to/your/document.pdf
below.
document.json
.