Optical character recognition (OCR) is the electronic conversion of a scanned picture to text format. It is extensively used to convert books and documents into electronic files, computerize a record-keeping system in an office and publish text on a website. An OCR application involves calibration to read a specific font, earlier version required to be programmed with images of each character and worked on one font at a time Optical character recognition enables users to convert picture to text, edit text, search for a word or phrase and apply systems such as text to speech, machine translation and text mining to it
Gustav Tauschek, form Germany was the first man to create an official document on OCR in 1929. In 1933 an American, Paul.W.Handel obtained a US patent on OCR. Two years later Tauschek was also granted a US patent on his method – U.S. Patent 2,026,329. (His machine was one which used templates and a photo detector)
Later, in 1949 RCA engineers worked on the first archaic computer-type OCR to assist visually impaired individuals in the US Veterans Association. However, their device converted printed characters to machine language and then read out the letters instead of converting it only to a machine language. Even though the device was a commendable technological display, it proved to be too expensive and was not pursued after testing.
Finally in 1950, David.H.Shepard, an American cryptanalyst in the armed forces security agency dealt with the problem of converting printed messages into a machine language for computer processing by building a machine which dissolved the problem. From there on, the creation of the first commercial OCR system was only a few years away.
Desktop and Server OCR Software
OCR software technology is an analytical simulated intelligence system which considers chains of characters rather than complete words or phrases. After the analysis of sequential lines and curves, the conclusion was- an OCR application makes the 'best character estimates’. It achieves this by using database look-up tables to closely connect or match the chain of characters which form words.
WebOCR and OnlineOCR
WebOCR or OnlineOCR is a new development initiated to meet a bigger volume and superior group of users. Internet and broadband technologies have made OnlineOCR easily accessible to both individual users and corporate users. Since 2000, some of the most popular OCR vendors began offering WebOCR and online picture to text software
With the widespread popularity of Optical character recognition systems in the image conversion domain, OCR systems started facing a large variety of problems with regard to the original format of various documents. e.g. : complicated backgrounds, degraded-images, heavy-noise, paper skew, picture distortion, low-resolution, disturbed by grid & lines, text image consisting of special fonts, symbols, glossary words and etc.
All these aspects affected the OCR application’s stability in recognition accuracy. Recently, major OCR technology providers began to develop dedicated OCR systems, each for a specific type of image. To improve the recognition accuracy they merged various optimization techniques related to the special image, such as business rules, standard expression, glossary dictionary and rich information contained in color image.
This particular strategy is called “Application-Oriented OCR” or "Customized OCR", it is used on a large scale in the fields of Business-card OCR, Invoice OCR, Screenshot OCR, ID card OCR, Driver-license OCR or Auto plant OCR and so on.
Optical character recognition technology today can read and recognize an array of languages and convert files into a number of formats. Even though it was developed decades ago, it continues to be changed, edited and improved. The OCR application is an apt tool for picture to text conversion and can be used by different people in a variety of environments. It continues to be supported by a wide range of products and systems, from top-of-the-line machinery to compact easy-to-use solutions.