> ## Documentation Index
> Fetch the complete documentation index at: https://docs.aryn.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Quickstart

> Getting Started with Aryn DocParse

You can use Aryn DocParse to easily chunk and extract data from complex documents, and return structured output in JSON, Markdown or HTML. It can extract properties from documents and even suggest a schema of properties. DocParse can process 30+ document formats, including PDF, Microsoft Word (.docx and .doc), Microsoft PowerPoint (.pptx and .ppt) and [more](docparse/formats_supported).

We show you how to get started with DocParse through the DocParse UI, the Python `aryn-sdk` client, or `curl`. For building multi-stage document ETL pipelines that use DocParse for parsing, visit the [Sycamore documentation](https://sycamore.readthedocs.io/en/stable/).  For tutorials on suggesting schemas and extracting properties, visit [Suggest Properties](docparse/tutorials/suggestion_tutorial) and [Property Extraction](docparse/tutorials/extraction_tutorial).

For this Quickstart, you will need an Aryn account. You can get one and use for free [here](https://console.aryn.ai/signup/).

## Using the DocParse UI

After you sign-up, go to the [Aryn UI](http://app.aryn.ai/) and click on the DocParse tab in the left nav to go the DocParse UI.

Next, select a document to parse, and choose the options for DocParse (e.g. OCR). Click on "Chunk document," and DocParse will process the first 25 pages of your PDF. If you have a larger document, use the `aryn-sdk` (the UI is limited to 25 pages per document).

Once the document is processed, you will see a visualized document segmentation with labeled bounding boxes. You can choose to download and check out the structured JSON output, which is the output of DocParse. Additionally, you can download the visual of the segmented PDF. If you prefer markdown or HTML output, please use the `aryn-sdk`.

Now that you have seen how DocParse can segment complex documents, extract tables, and more, you can use the `aryn-sdk` to leverage DocParse in your application or the [Sycamore document ETL library](https://github.com/aryn-ai/sycamore) to load the output into vector databases.

For additional questions on getting started, please join the Slack community [here](https://join.slack.com/t/aryn-community/shared_invite/zt-36vhennsx-mN3UsqD6PT2vxVZxpqdHsw) or [email us](mailto:info@aryn.ai).

## Using the DocParse `aryn-sdk`

The DocParse `aryn-sdk` client is a thin python library that calls Aryn DocParse and provides a few utility methods around it. It is the easiest way to add Aryn DocParse to your applications or custom data processing pipelines. You can view an example in [this notebook](https://github.com/aryn-ai/sycamore/blob/main/notebooks/ArynPartitionerPython.ipynb).

You will need to use the API key for your account with the Aryn SDK. To retrieve it, click on the API key tab in the left nav in the Aryn UI.

For more information, see the [Aryn SDK documentation](/docparse/aryn_sdk) or [API reference](/api-reference/endpoint/docparse/partition).

## Using `curl`

We recommend using the `aryn-sdk`, but you can also use `curl` to access Aryn DocParse directly.

`curl` an example document to use with DocParse, if you do not have one already.

```bash theme={null}
curl http://arxiv.org/pdf/1706.03762 -o document.pdf
```

Change `PUT API KEY HERE` below to your Aryn API key. If you have a different document, change `@document.pdf` to `@/path/to/your/document.pdf` below.

```bash theme={null}
export ARYN_API_KEY="PUT API KEY HERE"
curl -s -N -D headers "https://api.aryn.cloud/v1/document/partition" -H "Authorization: Bearer $ARYN_API_KEY" -F "file=@document.pdf" | tee document.json
```

Your results have been saved to `document.json`.

```bash theme={null}
cat document.json
```

### Different File Formats

```bash theme={null}
export ARYN_API_KEY="PUT API KEY HERE"
curl -s -N -D headers "https://api.aryn.cloud/v1/document/partition" -H "Authorization: Bearer $ARYN_API_KEY" -F "file=@document.pdf" | tee document.json
curl -s -N -D headers "https://api.aryn.cloud/v1/document/partition" -H "Authorization: Bearer $ARYN_API_KEY" -F "file=@document.docx" | tee document.json
curl -s -N -D headers "https://api.aryn.cloud/v1/document/partition" -H "Authorization: Bearer $ARYN_API_KEY" -F "file=@document.doc" | tee document.json
curl -s -N -D headers "https://api.aryn.cloud/v1/document/partition" -H "Authorization: Bearer $ARYN_API_KEY" -F "file=@document.pptx" | tee document.json
curl -s -N -D headers "https://api.aryn.cloud/v1/document/partition" -H "Authorization: Bearer $ARYN_API_KEY" -F "file=@document.ppt" | tee document.json
```

### Remote Document URLs

```bash theme={null}
export ARYN_API_KEY="PUT API KEY HERE"
curl -s -N -D headers "https://api.aryn.cloud/v1/document/partition" -H "Authorization: Bearer $ARYN_API_KEY" -F "file_url=https://www.example.com/document.pdf" | tee document.json
```

## Switching Regions

You can switch the region of the DocParse server by modifying the base URL of the DocParse server.

```bash theme={null}
export ARYN_API_KEY="PUT API KEY HERE"
curl -s -N -D headers "https://api.eu.aryn.cloud/v1/document/partition" -H "Authorization: Bearer $ARYN_API_KEY" -F "file=@document.pdf" | tee document.json
```

Via the SDK, you can specify the region through a `region` parameter.

```python theme={null}
from aryn_sdk.partition import partition_file
data = partition_file("https://www.example.com/document.pdf")
```

Via the UI, you can switch the region by clicking on the `Region` dropdown in the settings menu.

## Next steps

* To load your parsed documents into a vector database, use [Sycamore](https://github.com/aryn-ai/sycamore) to create a document ETL pipeline in Python for additional processing and loading. Sycamore is a scalable, open source document ETL library that integrates with DocParse. You can check out an example notebook [here](https://github.com/aryn-ai/sycamore/blob/main/notebooks/pinecone-writer.ipynb).

* To use DocParse with Langchain, you can check out [this example notebook here](https://github.com/aryn-ai/sycamore/blob/main/notebooks/ArynPartitionerWithLangchain.ipynb).

* To extract tables from your documents and run analytics on them, visit [here](docparse/tutorials/tables).

* To extract images from your documents and process them directly, visit [here](docparse/tutorials/images).

* To extract properties from your documents, visit [here](docparse/tutorials/extraction_tutorial) and [here](docparse/tutorials/suggestion_tutorial)

* If you're interested in building an end-to-end system to query and search your documents, [learn more about the Aryn Platform](aryn-platform/core_concepts). It uses DocParse under the hood for parsing and processing your documents when ingesting them.
