You can use Aryn to ingest, enrich, store, and query your complex documents at scale using Deep Analytics and search. Aryn supports 30+ document formats, including PDF, Microsoft Word (.docx and .doc), Microsoft PowerPoint (.pptx and .ppt) and more. If you just want to get started with DocParse, visit the DocParse Quickstart.

We show you how to get started with Aryn through the Aryn UI. For more information on using the Aryn Python SDK, click here.

You will need an Aryn account, which you can get for free here. You can also sign-up and log-in with your Google or GitHub credentials.

Creating and loading a DocSet

After you sign-up for Aryn, go to the Aryn UI.

First, you will create and load your documents into a DocSet. A DocSet is a collection of Documents, including chunks (or elements) and metadata (or properties), and can be thought of similarly to a table in a data warehouse, but for unstructured documents. When we add documents to our DocSet, Aryn will use a composite set of AI models to parse, extract, and process them (including OCR and more - it uses DocParse under the hood).

Click the “DocSet” page in the left nav, and then the “New DocSet” button on the DocSets page. In the popup, enter the Name of your DocSet and an optional description, and then click “Create DocSet”.

Next, click on your new DocSet in the DocSet list to go to the DocSet page. Click on the blue ”+” icon at the bottom of the Documents panel to add documents to your DocSet. In the popup, you can drag and drop files or click on it to open a file browser. You can optionally change the Processing Options to include OCR and more. Click “Upload” to add and process your documents. You can go to the Tasks page in the left nav to see the progress of your docs.

Once your documents are added, you can view them in the Documents panel on the DocSet page. You can click the “Elements” tab in the right nav to view how Aryn parsed your document, and view extracted properties (metadata) on the “Properties” tab. We didn’t specify any properties to be extracted…yet…so this should be empty. Aryn’s agentic query engine uses these properties when creating query plans, so it will be important to enrich your documents as you prepare your DocSet for complex query workloads.

Querying your DocSet in a Workspace

Now that you’ve added documents, let’s query them using the Workspaces UI. From the Docsets page, click the Open in Workspaces button. This will take you to a new Workspace configured to query your DocSet. You can rename this Workspace using the “Rename” button.

You can enter your query in the Query Box using natural language. After you submit your query, Aryn will generate and share a query plan showing the steps it will take to execute your query. You can then choose to run the query, cancel the query, or edit the query using natural language instructions on what to change.

After you start running the query, you’ll see the status of each step in the query plan. Each step is an operator (Aryn has both database style and LLM-powered operators), and Aryn will execute the plan and return a result. You can choose to Bookmark the result using the Bookmark icon at the top of the Query cell. A Bookmark is a saved result set from a query that can be used with an Workspace, and you can run subsequent Queries on that Bookmark.

Congrats! You’ve created a DocSet, added and procesed your documents, and ran a query. We call the process of navigating through and querying your documents Deep Analytics.

Next steps

So, now what? Here are some next steps to move beyond the 101 of Aryn to prepare your DocSet for deeper analytics:

For additional questions on getting started, please join our Slack community here or email us.