Techie Weblog for Geeks
Welcome Cosmo Cyber Check out Weblinks Page for Great stuff

Tuesday, June 1, 2010

Extract Text from Images & Scanned PDF Manuals Online

Extract Text from Images & Scanned PDF Manuals Online

If you are on a budget, the built-in OCR engine of Google Search is almost a perfect option for converting scanned PDFs to text – just put all your scanned PDF images onto a public website and wait for Google spiders to convert them into editable digital text.

Obviously there are two drawbacks associated with the original idea. The PDF conversion process is not real time and second, you need access to a public web server where you can upload the PDF images so that Google bots can find them.

If you aren’t willing to wait that long and need to perform instant OCR without downloading any of the software tools, try OCR Terminal – it’s an online Optical Character Recognition service where you can upload scanned images, multi-page PDF documents or even screenshots and convert them into searchable text documents.



The conversion results, as you can noticed in the screenshot above, are pretty accurate and it also preserves the document formatting and layout. You may download the extracted text as RTF or a Word Document. The output is also available as a PDF image though I didn’t find that option very useful.

OCR Terminal is a free service but you are only allowed to convert up to 30 scanned pages in a day and allows for text extraction only from English language documents. They are developing a desktop client that will allow users to convert scanned PDFs or TIFF images and get them back as formatted Word files without the web browser


~

No comments:

Post a Comment