Input Formats

Aryn DocParse supports the following input formats:

  • .pdf
  • .docx
  • .doc
  • .pptx
  • .ppt
  • .csv
  • .jpg (.jpeg)
  • .png
  • .bmp
  • .tiff
  • .html
  • .odt
  • .rtf
  • .txt
  • .xls
  • .xlsx
  • .xml
  • .svg
  • .webp
  • .wmf
  • .emf
  • .mml
  • .ods
  • .xhtml
  • .odp
  • .odg
  • .odf
  • .ots
  • .xltx
  • .fods
  • .xlt
  • .slk

Output Formats

Aryn DocParse supports the following output formats:

  • .json
  • .md

OCR Languages

Aryn DocParse supports the following OCR languages. The default is english:

  • Abaza: abaza
  • Adyghe: adyghe
  • Afrikaans: afrikaans
  • Albanian: albanian
  • Angika: angika
  • Arabic: arabic
  • Avar: avar
  • Azerbaijani: azerbaijani
  • Belarusian: belarusian
  • Bhojpuri: bhojpuri
  • Bihari: bihari
  • Bosnian: bosnian
  • Bulgarian: bulgarian
  • Chinese: chinese
  • Chinese (Traditional): chinese_traditional
  • Croatian: croatian
  • Czech: czech
  • Danish: danish
  • Dargwa: dargwa
  • Dutch: dutch
  • English: english
  • Estonian: estonian
  • French: french
  • German: german
  • Hindi: hindi
  • Hungarian: hungarian
  • Icelandic: icelandic
  • Indonesian: indonesian
  • Ingush: ingush
  • Irish: irish
  • Italian: italian
  • Japanese: japanese
  • Kabardian: kabardian
  • Konkani: konkani
  • Korean: korean
  • Kurdish: kurdish
  • Lak: lak
  • Latvian: latvian
  • Lezghian: lezghian
  • Lithuanian: lithuanian
  • Magahi: magahi
  • Maithili: maithili
  • Malay: malay
  • Maltese: maltese
  • Maori: maori
  • Marathi: marathi
  • Mongolian: mongolian
  • Nagpuri: nagpuri
  • Nepali: nepali
  • Newari: newari
  • Norwegian: norwegian
  • Occitan: occitan
  • Persian: persian
  • Polish: polish
  • Portuguese: portuguese
  • Romanian: romanian
  • Russian: russian
  • Serbian (Cyrillic): serbian_cyrillic
  • Serbian (Latin): serbian_latin
  • Slovak: slovak
  • Slovenian: slovenian
  • Spanish: spanish
  • Swahili: swahili
  • Swedish: swedish
  • Tabassaran: tabassaran
  • Tagalog: tagalog
  • Tamil: tamil
  • Telugu: telugu
  • Turkish: turkish
  • Ukrainian: ukrainian
  • Urdu: urdu
  • Uyghur: uyghur
  • Uzbek: uzbek
  • Vietnamese: vietnamese
  • Welsh: welsh

For information, refer to the output structure page. For a walkthrough with a document, see the output walkthrough page.