Google is now indexing scanned documents. In the past scanned documents were simply pictures to Google's spiders. This means that they were unable to read the contents of these files and simply relied on their file names and tags to index them. This was also the same with the popular Portable Document Fornat (PDF) from Adobe. Recently however, PDF became an ISO standard and more and more documents are stored and uploaded on the internet this way.
Now, Google is proud to announce that they are now able to index these documents.
"We are now able to perform OCR on any scanned documents that we find stored in Adobe's PDF format. This Optical Character Recognition (OCR) technology lets us convert a picture (of a thousand words) into a thousand words -- words that can be searched and indexed, so that these valuable documents are more easily found. This is a small but important step forward in our mission of making all the world's information accessible and useful."
Indeed this feat is vastly important. Although OCR is not exactly rocket science nor is it new technology, Google had much difficulty perfecting their robots on how to read scanned images. Of course this takes into consideration that scanned documents are most of the time riddled with imperfections like coffee stains or whatever stains there may be. Now, Google is able to read through these imperfections and properly index scanned documents.
Comments
-
Would you like to comment?
Join tj.com , or sign in if you are already a member
.
Venture Capital Latest News
- GazProm Media Acquires RuTube
- Tesla Motors Get $40 Million Cash Infusion
- Fitbit Raises $2 Million from True Ventures and SoftTech VC
- Giga Omni Completes $4.5 Million Round
Startups Latest News
- JPG Magazine Folds
- Reunion.com and Wink Merge
- Layoffs Galore: Startups
- Passionato Launches Classical Music Download
Gadgets Latest News
- Sony Touchscreen Walkman Rumored at The CES 2009
- Puma Glow Rider
- Mojo Mobility Develops Wireless Charger
- AMD Yukon Looks Beyond Netbooks
Advertisers








maxi, 2008-11-03 05:04:03
SEO now gets a little more interesting....