![]() Support for quite many languages/scripts is available in the form a downloadable trained data sets, e.g. List installed languages: $ tesseract -list-langs Print the recognized text to stdout: $ tesseract -oem 1 -l deu page page-0001.png stdout $ tesseract -oem 1 -l deu input.list output pdf Its OCR performance is much better than the previous OCR model used in version 3.Įxample (produce a PDF file output.pdf with a text layer for a scanned german document): $ echo page-*.png > input.list As of 2020, the best available open source OCR software is Tesseract 4 with its new LSTM neural network OCR model.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |