Title

> ## Documentation Index
> Fetch the complete documentation index at: https://docs.aryn.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Output Structure

> The output format of Aryn DocParse

## Overall Format

### JSON Format

The default output format of Aryn DocParse is JSON.

```text theme={null}
{
  "status": in-progress updates,
  "error": any errors encountered,
  "elements": a list of elements,
  "properties": a list of extracted properties
}
```

### Markdown Format

If the request to Aryn DocParse has the `output_format` option set to `markdown`, a successful response will look like this:

```text theme={null}
{ "status": ...,
  "markdown": "# Title\ndolorem ipsum, quia dolor sit amet consectetur..." }
```

If `group_by_page` is set to True in `markdown_options`, the markdown output will be returned as an array of markdown where each element of the array represents a page:

```text theme={null}
{
  "status": ...,
  "markdown": ["# Title\ndolorem ipsum", "quia dolor sit", ..., "amet consectetur"]
}
```

### HTML Format

If the request to Aryn DocParse has the `output_format` option set to `html`, a successful response will look like this:

```text theme={null}
{ "status": ...,
  "html": "<h1>Title</h1><p>dolorem ipsum, quia dolor sit amet consectetur...</p>" }
```

## Element Format

### General Structure

It is often useful to process different parts of a document separately. For example, you might want to process tables differently than text paragraphs, and typically small chunks of text are embedded separately for vector search. In Aryn DocParse, these chunks are called elements. The Aryn Platform also uses the same format for Documents and Elements.

Elements follow the following format:

```text theme={null}
{"type": type of element (str),
"bbox": Coordinates of bounding box around element (float),
"properties": { "score": confidence score (float),
                "page_number": page number element occurs on (int)},
"text_representation: for elements with associated text (str),
"binary_representation: for Image elements when extract_table_structure is enabled (bytes)}
```

Each element always has a `type`, `bbox`, `properties`, and `text_representation` field. The `type` field indicates the type of the element (e.g., text, image, table, etc.), the `bbox` field contains the coordinates of the bounding box around the element, the `properties` field contains additional information about the element (e.g., confidence score, page number, etc.), and the `text_representation` field contains the text content of the element. The `score` tag represents the confidence score in the label assignment (`Text`, `Image`, `Table`, etc.). The `page_number` is the 1-indexed page number the element occurs on.

An example element is given below:

```json theme={null}
{
    "type": "Text",
    "bbox": [
      0.10383546717026654,
      0.31373721036044033,
      0.8960905187270221,
      0.39873851429332385
    ],
    "properties": {
      "score": 0.9369918704032898,
      "page_number": 1
    },
    "text_representation": "It is often useful to process different parts of a document separately. For example, you\nmight want to process tables differently than text paragraphs, and typically small chunks\nof text are embedded separately for vector search. In Aryn DocParse, these\nchunks are called elements.\n"
}
```

#### Element Type

```text theme={null}
"type": one of the element types below
```

|           Type | Description                                                                                                                                                                                                                                       |
| -------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|          Title | Large Text                                                                                                                                                                                                                                        |
|           Text | Regular Text                                                                                                                                                                                                                                      |
|        Caption | Description of an image or table                                                                                                                                                                                                                  |
|       Footnote | Small text found near the bottom of the page                                                                                                                                                                                                      |
|        Formula | LaTeX or similar mathematical expression                                                                                                                                                                                                          |
|      List-item | Part of a list                                                                                                                                                                                                                                    |
|    Page-footer | Small text at bottom of page                                                                                                                                                                                                                      |
|    Page-header | Small text at top of page                                                                                                                                                                                                                         |
|          Image | A picture or diagram. When `extract_images` is set to `true`, this element includes a `binary_representation` tag which contains a base64 encoded ppm image file. When `extract_images` is false, the bounding box of the Image is still returned |
| Section-header | Medium-sized text marking a section                                                                                                                                                                                                               |
|          table | A grid of text. See the `extract_table_structure` option to extract information from the table rather than just detecting its presence                                                                                                            |

#### Bounding Box

```text theme={null}
"bbox": coordinates of the bounding box around the element contents
```

Takes the format `[x1, y1, x2, y2]` where each coordinate is given as the proportion of how far down or across the screen the element is. For instance, an element that is 100 pixels from the left border of a document 400 pixels wide would have an x1 coordinate of 0.25.

#### Properties

```text theme={null}
"properties":
    { "score": confidence between 0 and 1, with 1 being the most confident 
                that this element type and bounding box coordinates are correct (float),
      "page_number": 1-indexed page number the element occurs on (int) }
```

The `score` is the model's "confidence" in its prediction for that particular bounding box. By default, we automatically select bounding boxes to achieve good coverage with high prediction accuracy, but the user can control this by using the `threshold` parameter (defaults to "auto"). If the user specifies a numeric value between 0 and 1, only Elements with a confidence score higher than the specified threshold value will be kept.

#### Text Representation

```text theme={null}
"text_representation": text associated with this element (str)
```

Text elements contain `\n` when the text includes a line return.

#### Binary Representation

When `extract_images` is set to True, Images include a `binary_representation` tag which contains a `base64` encoded `ppm` image file of the pdf cropped to the bounds of the detected image. When `extract_images` is false, the bounding box of the Image is still returned.

```text theme={null}
"binary_representation": base64 encoded ppm image file of the pdf cropped to the image (bytes)
```

For a tutorial on how to use the output of Aryn DocParse, see the [output walkthrough](./tutorials/output_walkthrough) page.

### Table Structure

Tables are represented as a list of cells, where each cell is a dictionary with the following attributes:

```json theme={null}

{
  "type": "table",
  "bbox": [int, int, int, int],
  "properties": {
    "score": float,
    "title": str,
    "columns": int,
    "rows": int,
    "page_number": int
  },
  "text_representation": str,
  "table": {
    "cells": [
      {
        "content": str,
        "rows": list[int],
        "cols": list[int],
        "is_header": bool,
        "bbox": [int, int, int, int],
        "properties": dict
      }
    ]
  }
}
```

A table also contains a `table` tag with individual `cells`. Each cell has 6 attributes: `content`, `rows`, `cols`, `is_header`, `bbox`, and `properties`. The `content` attribute contains the text content of the cell, the `rows` attribute contains the row index of the cell, the `cols` attribute contains the column index of the cell, the `is_header` attribute indicates whether the cell is a header cell and is optional, the `bbox` attribute contains the bounding box of the cell, and the `properties` attribute contains additional properties of the cell.

The `text_representation` tag contains the text content of the table and is either not specified or is the CSV representation of the table. In the `properties` dictionary, the `title` tag represents the string table title if specified, `columns` is an integer representing the number of columns in the table, and `rows` is an integer representing the number of rows. They may all not be specified.

### Image Structure

```json theme={null}
{ 
  {
    "type": "Image",
    "bbox": [int, int, int, int],
    "properties": {
      "score": float,
      "image_size": [int, int],
      "image_mode": str,
      "page_number": int
    },
    "binary_representation": bytes
  }
}
```

If `image_extraction_options: {'associate_captions': True}` is passed in the request, the output will also include a `caption` element in the `properties` dictionary.

```text theme={null}
"caption": {
  "type": "Caption",
  "bbox": [int, int, int, int],
  "properties": {
    "score": float,
    "_element_index": int,
    "font_size": float
  },
  "text_representation": str,
  "binary_representation": bytes
  }
}
```

For an image, the `binary_representation` tag contains a `base64` encoded `ppm` image file of the pdf cropped to the bounds of the detected image. The properties contain extra attributes: `image_size`, which represents \[`width`, `height`] of the image, and `image_mode`, which represents the color mode of the image.

## Properties Format

If `property_extraction_options: {'schema': [{...}]}` is passed in the request, the output will contain a top-level `properties` element which will contain a dictionary of key value pairs where each key maps to a property in the schema and its value is the result of extraction from the document.

### Example schema

```json theme={null}
{
  "schema": [
    {
      "name": "invoice_number",
      "type": {
        "type": "string",
        "description": "Unique invoice identifier",
        "examples": ["INV-2024-001", "00123", "A-4567"],
        "validators": [
          {
            "type": "boolean_exp",
            "expression": "len() > 0"
          }
        ]
      }
    },
    {
      "name": "invoice_date",
      "type": {
        "type": "date",
        "description": "Date the invoice was issued, if not provided, return null",
        "examples": ["2024-01-15", "2024-03-22"]
      }
    },
    {
      "name": "subtotal",
      "type": {
        "type": "float",
        "description": "Subtotal amount before taxes and fees",
        "examples": [1250.00, 89.99, 2500.50],
        "validators": [
          {
            "type": "boolean_exp",
            "expression": ">= 0"
          }
        ]
      }
    },
    {
      "name": "line_items",
      "type": {
        "type": "array",
        "description": "Array of individual line items on the invoice",
        "item_type": {
          "type": "object",
          "properties": [
            {
              "name": "description",
              "type": {
                "type": "string",
                "description": "Description of the product or service",
                "examples": ["Software Development - 40 hours", "Office Supplies", "Consulting Services"],
                "validators": [
                  {
                    "type": "boolean_exp",
                    "expression": "len() > 0"
                  }
                ]
              }
            },
            {
              "name": "quantity",
              "type": {
                "type": "float",
                "description": "Quantity of items or hours",
                "examples": [1.0, 40.0, 2.5],
                "validators": [
                  {
                    "type": "boolean_exp",
                    "expression": "> 0"
                  }
                ]
              }
            }
          ]
        }
      }
    }
  ]
}
```

### Example output

```json theme={null}
{
  "properties": {
      "invoice_number": {
        "name": "invoice_number",
        "type": "string",
        "value": "2631",
        "pages": [
          1
        ]
      },
      "invoice_date": {
        "name": "invoice_date",
        "type": "string",
        "value": "2023-03-02",
        "pages": [
          1
        ]
      },
      "subtotal": {
        "name": "subtotal",
        "type": "float",
        "value": 1398.00,
        "pages": [
          1
        ]
      },
      "line_items": {
        "name": "line_items",
        "type": "array",
        "value": [
          {
              "name": null,
              "type": "object",
              "value": {
                "description": {
                  "name": "description",
                  "type": "string",
                  "value": "Appliance:XRS18GGABB CROSLEY 18 TM REF GLASS BL Textured Door",
                  "pages": [
                    1
                  ]
                },
                "quantity": {
                  "name": "quantity",
                  "type": "float",
                  "value": 2.0,
                  "pages": [
                    1
                  ]
                }
              }
          }
        ]
      }
  }
}

```

## Property Metadata

When extracting properties, additional metadata about the extraction will be present under the `property_metadata` key. This object is a dictionary mapping property names to metadata structures like the following:

```json theme={null}
{
  "name": "named_insured",
  "type": "string",
  "value": "Global Logistics Solutions Inc.",
  "attribution": {
    "element_indices": [
      4
    ],
    "page": 1,
  },
  "additional_guesses": [
    "Global Logistics Solutions Inc.",
    "Global Logistics Solutions Inc."
  ],
}
```

The `name` and `type` matches what is in the schema, and the value matches what is present in `properties`. The `attribution` field contains information about where the extracted value was found -- `page` is the page number in the document, and `element_indices` is a list of indices of the element containing the property in the `elements` array. Note that attribution is best effort, so it is possible that attribution will be `null` even when a property was extracted.

For properties of type `array`, the `value` field will be an array of metadata objects. The `attribution` field for all arrays will be `null`, but the individual array elements will have attribution.

If `"voting: true"` is present in `property_extraction_options`, then the `additional_guesses` field will contain the values returned by the other two LLMs used for voting.

For a tutorial on how to use the output of Aryn DocParse, see the [output walkthrough](./tutorials/output_walkthrough) page.