DocParse
Supported Formats and Languages
Here are the input and output formats, and OCR languages supported by Aryn DocParse
Input Formats
Aryn DocParse supports the following input formats:
.pdf
.docx
.doc
.pptx
.ppt
.csv
.jpg
(.jpeg
).png
.bmp
.tiff
.html
.odt
.rtf
.txt
.xls
.xlsx
.xml
.svg
.webp
.wmf
.emf
.mml
.ods
.xhtml
.odp
.odg
.odf
.ots
.xltx
.fods
.xlt
.slk
Output Formats
Aryn DocParse supports the following output formats:
.json
.md
OCR Languages
Aryn DocParse supports the following OCR languages. The default is english
:
- Abaza:
abaza
- Adyghe:
adyghe
- Afrikaans:
afrikaans
- Albanian:
albanian
- Angika:
angika
- Arabic:
arabic
- Avar:
avar
- Azerbaijani:
azerbaijani
- Belarusian:
belarusian
- Bhojpuri:
bhojpuri
- Bihari:
bihari
- Bosnian:
bosnian
- Bulgarian:
bulgarian
- Chinese:
chinese
- Chinese (Traditional):
chinese_traditional
- Croatian:
croatian
- Czech:
czech
- Danish:
danish
- Dargwa:
dargwa
- Dutch:
dutch
- English:
english
- Estonian:
estonian
- French:
french
- German:
german
- Hindi:
hindi
- Hungarian:
hungarian
- Icelandic:
icelandic
- Indonesian:
indonesian
- Ingush:
ingush
- Irish:
irish
- Italian:
italian
- Japanese:
japanese
- Kabardian:
kabardian
- Konkani:
konkani
- Korean:
korean
- Kurdish:
kurdish
- Lak:
lak
- Latvian:
latvian
- Lezghian:
lezghian
- Lithuanian:
lithuanian
- Magahi:
magahi
- Maithili:
maithili
- Malay:
malay
- Maltese:
maltese
- Maori:
maori
- Marathi:
marathi
- Mongolian:
mongolian
- Nagpuri:
nagpuri
- Nepali:
nepali
- Newari:
newari
- Norwegian:
norwegian
- Occitan:
occitan
- Persian:
persian
- Polish:
polish
- Portuguese:
portuguese
- Romanian:
romanian
- Russian:
russian
- Serbian (Cyrillic):
serbian_cyrillic
- Serbian (Latin):
serbian_latin
- Slovak:
slovak
- Slovenian:
slovenian
- Spanish:
spanish
- Swahili:
swahili
- Swedish:
swedish
- Tabassaran:
tabassaran
- Tagalog:
tagalog
- Tamil:
tamil
- Telugu:
telugu
- Turkish:
turkish
- Ukrainian:
ukrainian
- Urdu:
urdu
- Uyghur:
uyghur
- Uzbek:
uzbek
- Vietnamese:
vietnamese
- Welsh:
welsh
For information, refer to the output structure page. For a walkthrough with a document, see the output walkthrough page.
Was this page helpful?