Researchers at the MIT Media Laboratory have built a prototype finger-reading device to convert printed text for the visually impaired into spoken text. The FingerReader works with a camera and also provides tactile or audible feedback.
The latter is especially important to keep following the correct sentence and not accidentally read another line. According to one of the makers on the MIT news site, the translation that the device transmits to the visually impaired person must be done quickly and with immediate feedback as soon as things start to go wrong, also so as not to interrupt the illusion of the story being read.
The FingerReader’s main innovation is the algorithm that can very locally scan and read sequential text as single lines or as blocks of text, and which also works when someone is skipping through the text. The paper describing the algorithm will be presented in April at the Association for Computing Machinery’s Computer-Human Interface conference.
In addition to the algorithm, the paper describes which different variations of the FingerReader the researchers used. For example, one of the versions has two haptic motors, one on top of the finger and the other on the bottom. The vibration of the motors makes it clear whether the finger needs to be moved slightly up or down to keep following the correct sentence. Another version uses sound as feedback, with a tone that gets louder as the finger moves off the correct path. For the time being, there is no agreement on what works better.
The heart of the system is the ability to decode the camera images in real time. Every time a user places the finger at the beginning of a new line, the algorithm tries to estimate the root line of the letters. Since most lines have letters with bottoms extending below the baseline, and because a finger misalignment can confuse the system with nearby baselines, those estimates will differ. The algorithm then selects the median of the denser cluster. That last value contains the system’s estimates per new frame of video as the user’s finger moves to the right, reducing the computational power required.
The algorithm also tracks the individual words that pass the camera’s eye. If it recognizes a word positioned in the center of the camera image, only this word is cropped out of the image. With that, that word can be aligned with the rest of the sentence, so that strange angles are compensated. The text is then read further by means of open source software and translated into synthesized speech.
In the prototype, the finger scanner was linked to a laptop while reading, but a more mobile version is in the pipeline.