preloader
image

Scene To Text Detection and Pronounciation

Project Details

Scene texts contain rich semantic information which may be used in many vision-based applications, consequently detecting and recognizing scene texts have received increasing attention in recent years. Reading text from photographs is a challenging problem that has received a significant amount of attention. The recent technological advancements are focusing on developing smart systems to improve the quality of life. Machine learning algorithms and artificial intelligence are becoming elementary tools, which are used in the establishment of modern smart systems across the globe.
In this context, an effective approach is suggested for automated text detection and recognition for the natural scenes. The incoming image is firstly enhanced by employing Contrast Limited Adaptive Histogram Equalization (CLAHE). Two key components of most systems are (i) text detection from images and (ii) character recognition and many recent methods have been proposed to design better feature representations and models for both. We apply methods recently developed in machine learning–specifically, large-scale algorithms for learning the features automatically from unlabeled data–and show that they allow us to construct highly effective classifiers for both detection and recognition to be used in a high accuracy end-to-end system. Afterward, the text regions of the enhanced image are detected by employing the Stroke Width Transform (SWT) feature detector. The non-text SWTs are removed by employing appropriate filters. The remaining SWTs are grouped into words.
The text recognition is performed by employing an Optical Character Recognition (OCR) function. The extracted text is pronounced by using a suitable speech synthesizer. The proposed system prototype is realized. The system functionality is verified with the help of an experimental setup. Results prove the concept and working principle of the devised system. It shows the potential of employing the suggested method for the development of modern devices for visually impaired people.

Libraries

Numpy
Optical Character Recognition
Tensorflow
Keras
OpenCV
Google text to speach synthesizer

  • Date

    11 Aug, 2020
  • Categories

  • Academy Project

    Final project of Engineering