- Return the structured output of each document in JSON or Markdown, and provide labeled bounding boxes for titles, tables, table rows and columns, images, and regular text.
- High quality AI models for complex table extraction, optical character recognition (OCR), image summarization, and more.
- Process over 30 types of document formats, including PDFs, Microsoft Word, Microsoft PowerPoint, text, and more.
- Store and index processed documents, extract metadata using GenAI, search your documents at scale with vector (semantic) or keyword search.
- Optional integration with Python document ETL pipelines using the open source Sycamore document ETL library. Customize your pipeline with additional data transforms, LLM-based entity extraction, data enrichment, data cleaning, and loading vector databases and search engines.
Getting started
Sign-up here for free) to get started with DocParse.Quickstart
Get Started with Aryn DocParse
Use the Aryn-SDK
Using the Aryn-SDK to call DocParse
Use DocParse UI
Access the DocParse UI to visualize how your documents will be partitioned
Slack Community
Join the Slack community for any questions
API Reference
Aryn DocParse API Reference
Aryn DocParse SDK Reference
Aryn DocParse Python SDK Reference