How to count words in a Scan

How to Count Words in a Scan

Have you ever wondered how to count words in a scan? Similarly, have you ever wondered how to extract text from a scanned document or image? With a regular pdf, counting words is usually not a problem. One simply has to copy the text and paste it in a word document. The automated word count embedded within word will give you the number of words.
If you are reading this article in another language than English, it has been automatically translated by our WordPress Multilingual Plugin.

However, once the words are scanned in a pdf, they lose their text characteristics and are viewed as an image. This is no reason to despair, however, as you could use an Optical Character Recognition (OCR) program. What OCR does is read each line of the scanned document to determine what the black and white pixels on the screen represent (numbers, words, characters etc.). If you have already paid for programs such as Adobe Acrobat Professional and Abby FineReader, these have OCR functionality built in.

Count Words in a Scan with Open Source Software

There are, however, perfectly suitable free alternatives that are available for download. One of them is the free OCR at free-ocr.com. Here are the steps that you must follow in order to get the word count from your scanned document:

  • Follow the following link to free-ocr.com
  • Upload your content using the upload button
  • Select the language the text is in
  • The text will be presented to you in the box
  • Copy the text and paste it into an MS word document
  • The automated word counter will give you an accurate indication

Although quick and easy, there is one concern with this approach. It would require you to upload your content onto the site. This may not be an appropriate option if you are dealing with confidential information.

A more secure alternative is for you to download the free version of paperfile.net’s OCR, which can be downloaded here. Once you have downloaded and installed the program, here are the steps you should take to extract the text:

    • Open up the program. You should be presented with an example extraction like the one below. There are a few instructions about how to increase the quality of your extraction
count-words-in-a-scan-step1
    • Select the pdf file that you would like to count words from / extract text
count-words-in-a-scan-step2
    • Click the OCR button and OCR the current page
count-words-in-a-scan-step3
    • If scanned correctly, the scanned text should appear in the right hand box
count-words-in-a-scan-step4
  • Click the “word” button between the two sheets in order to extract the text to word
  • Open the file in MS Word and obtain your word count

There are only two disadvantages of the free program though:

  • It does not allow you to choose a language other than English
  • You have to switch between different pages to obtain the word count on a page-by-page basis.

Count Words in a Scan with Premium Products

There are other programs such as AnyCount 7.0 and Solid documents which are specifically designed for counting words, characters and lines. Although these programs require payment for the license, they have superior functionality to the free versions listed above.

Both also have free trial versions, which you can download from their respective sites.

Finally, there is a mobile solution to counting the words in your scanned document. This is to use the TextExtractor Scanner iPhone app available to download on the Apple store. You merely snap a picture of the scanned document and the app will attempt to automatically extract the text into either a Word document or pdf. Extracting the text into a word document will allow you to easily count the words in the document. Moreover, the app allows for extraction from a number of different languages including character-based languages such as Mandarin and Japanese

What is important to note with all OCR programs and methods is that the quality of the scan impacts on the accuracy of the extraction.

Other posts you might enjoy:
Pascal Evertz
p.evertz@buyersunited.nl