© Peter B.L. Meijer 2010. All rights reserved.

The vOICe MIDlet

For information about installing the software, please first visit the web page

In reading this manual, it is assumed that you have already succeeded in installing[1] and running[2][3] The vOICe seeing-with-sound MIDlet. The purpose of this manual is mostly to discuss the available key commands in more detail. You may wish to use a headset in order not annoy other nearby people with the rather unusual and hence attention-drawing sounds from your phone. Moreover, a stereo headset is advised for use with phones that offer stereo sound capabilities[4]. The vOICe’s options menu - called up with the phone’s Options key - contains a “Channels” entry that allows you to select mono (default), stereo or 3D audio channels. The stereo and 3D audio options ease the perception of The vOICe’s visual left-to-right scanning, but they may give severely distorted mono sound on phones that lack stereo capabilities. Just check what works for your phone. Also, some phones contain built-in stereo speakers on their side[5], and the “View rotation” entry in the options menu can be used to correspondingly adapt the camera view when holding the phone with left and right speaker on the left and right, respectively.

Note that while using a screen reader it may be necessary to first mute The vOICe (with key "0") to hear the screen reader speak all menu and submenu items under the phone’s Options key. After changing settings, you can then unmute The vOICe again with key "0".

Once started, The vOICe MIDlet continuously grabs and sounds live snapshots from your phone camera. There are no connection costs while using it, because The vOICe MIDlet runs off-line. Each camera snapshot is sounded via a left-to-right scan through the view, while associating height with pitch and brightness with loudness. By default, a black-and-white camera view is sounded in just one second. For example, a bright rising line on a dark background sounds as a rising pitch sweep, a small bright spot sounds as a short beep, and a bright filled rectangle sounds as a noise burst. The vOICe’s simplest application is as a light probe, but it is actually far more powerful because its changing polyphonic visual sounds or “soundscapes” now also track position and shape of objects, even with multiple objects within your camera view. Thus it allows you to locate light sources, recognize basic image patterns such as stripes and various textures, find borders, identify shapes, and so on. In addition, The vOICe MIDlet offers a number of color detection features, and includes a talking color identifier.

Many settings are accessible through the application’s Option menu if you have a phone screen reader, but there is also built-in speech support for the main features. For compatibility with phone screen readers, The vOICe supports two menu styles: the "Textual" style for the submenus is only advised for use with the old Talks 1.40.1 (to avoid crashes), while the "Normal" style is advised with Mobile Speak as well as Talks 2.0 and later. In addition, a number of keyboard shortcuts exist for direct access to various features, and we begin with a brief overview of the main key commands. At the end of this document a table overview is given.

The "0" key toggles the muted state. Pressing this key twice in rapid succession, like “00”, toggles a muted paused state that minimizes CPU load while releasing the camera resource for best responsiveness when accessing the menus with a phone screen reader. The "1" key toggles the negative video mode, which can help to see/find small or thin dark items on a bright background. The "3" key toggles the built-in speech off and on. The "7" key toggles a mode that helps prevent visual sound stuttering and buzzing on devices that cannot handle simultaneous visual sound rendering and playing. You can try it to find out what works best with your phone. The "9" key cycles over different contrast enhancement modes. The "*" (star, asterisk) key toggles the talking color identifier on and off. The "#" (pound, hash) key cycles over different sound volume levels when not muted. Other settings are controlled with the joystick. The default audio sample rate is 16 kHz, but lower sample rates can be selected by using the "DOWN" key (joystick down), and higher sample rates can be selected by using the "UP" key (joystick up). Available sample rates are 8 kHz, 11 kHz, 16 kHz and 22 kHz, but phones need not support all of these sample rates. Lower sample rates give lower sound quality, but may make the phone more responsive. The "RIGHT" key doubles the visual sound duration to at most two seconds, while the "LEFT" keyhalves the visual sound duration to at least half a second. Note that on some phones the "UP", "DOWN", "LEFT" and "RIGHT" keys may be mapped through the "2", "8", "4" and "6" numeric keys, respectively. Many of the program settings persist across multiple runs of The vOICe MIDlet.

Now more about the color detection features. As was stated above, the "*" (star) key toggles the talking color identifier on and off. This mobile color recognizer speaks the color of whatever shows at the center of your camera view, while alternating with the visual sound of the camera view that tells you about the shape and brightness of items in your view. If you prefer to only hear the talking color identifier, simply press the "*" (star) key twice in rapid succession, much like a double-click, and you will then only get to hear the color names. So “*” toggles color identification alternating with visual sounds, while “**” toggles color identification without the visual sounds. Pressing the joystick “Fire” button will speak the color name once, even if The vOICe was muted, and on suitable phones[6] it will use the built-in flash. In any case, recognized colors include (dark, normal, and light) red, green, blue, cyan, yellow, orange and magenta, as well as combination colors such as red-orange. Black, grey and white are also identified, bringing the total number of identified colors and shades to 47. Beware that the choice of color names can be culturally biased: cyan is a color in between green and blue, while magenta is basically the same as the color purple. Also, light-magenta and light-red make for the color pink or very similar colors, while dark-red-orange, dark-orange and dark-orange-yellow appear as various shades of brown. Dark yellow-green makes for olive-green.

Results of color recognition inevitably depend on ambient light and camera quality. Try to use good lighting whenever possible, preferably broad daylight. Still, under relatively low light conditions, better results may be obtained by first calibrating The vOICe for the given visual environment. To do this, point the camera to a known white surface (such as a white sheet of paper) near the object of which you want to identify the color, and apply the “Calibrate white” entry in The vOICe’s options menu[7], which will basically tell The vOICe that this surface really is white or light grey rather than its actual grey or dark grey appearance. In fact it will also correct for the yellowish colors from incandescent lighting and many other sources of color bias. Next you can point the camera to other items of interest to identify their colors. Apply the calibration option with care: only apply it when you are certain that the full camera view is indeed white and relatively bright, or else you may get very poor color identification results due to a badly skewed color calibration! Calibration settings do not persist across runs to avoid unintended continued use of a calibration that would no longer match changing ambient light conditions. The vOICe does not normally need calibration in broad daylight conditions, but if applied with care, it can yield significantly more accurate color recognition results under relatively low light conditions. The calibration process takes only about a second and applies for the duration of the run unless you recalibrate or reset The vOICe via its menus.

The color identifier tells you the color at the center of the camera view, but sometimes you may wish to know where items of a given color are. Rather than pointing the camera around until the color identifier finally “hits” the object with the color of interest, you can tell The vOICe to sound the entire camera view but only sound items of the color that you specified. This is done either via the color filter options in the menus or by keying the first letter of the supported color name, being “r” for red, “g” for green, “b” for blue, “c” for cyan, “y” for yellow, “o” for orange and “m” for magenta. Now you need to know how to enter these letters, unless your phone includes a QWERTY keyboard[8]. As you may know, letters are associated with keys 2 through 9 on your phone. In particular, key “2” holds the associated letters “a”, “b” and “c”, or “abc” for short. If you press key “2” once in The vOICe, you specify the digit “2”, but if you press key “2” multiple times in rapid succession, you get to the letters “a”, “b” and “c”. Pressing key “2” twice means “a”, pressing key “2” three times means “b” (which toggles the blue-only color filter), and pressing key “2” four times means “c” (which toggles the cyan-only color filter). The same principle applies to the other numeric keys. Key “3” holds “def”, key “4” holds “ghi”, key “5” holds “jkl”, key “6” holds “mno”, key “7” holds “pqrs”, key “8” holds “tuv”, and key “9” holds “wxyz”. Therefore, if you want to see and find only green items in your view, you press key “4” twice to specify “g” for green, or if you want to see and find only red items in your view, you press key “7” four times to specify “r” for red. These functions act like a toggle, so applying the same one another time turns the color filter off to return to the normal mode of operation. (Alternatively, you may also press key “9” twice to apply “w” for white which is equivalent to having no color filter.)

If you want to run a more complete analysis of what items of what shape and of what color show where in your camera view, you can press key “2” twice to toggle “a” for “Analyze”, which will then cycle over all available color filters for finding any objects and shapes that are red, green, blue, cyan, yellow, orange or magenta.

The combination of color filters with the visual sound bitmaps implies that over 4000 (namely 64×64) different locations for colored items can be represented, while at the same time including shape, shading and texture information - in just one or two seconds of sound. The general image-to-sound mapping makes that top left gives high pitch early in the visual sound, bottom left gives low pitch early in the visual sound, top right gives high pitch late in the visual sound, and bottom right gives low pitch late in the visual sound, with other positions giving intermediate positions in pitch and time.

Let’s consider an example where the general scanning of the visual sounds is combined with color filters to solve a practical problem. Suppose you want to know the color of something small or thin, say a thin electrical wire. Then it is extremely difficult to orient the camera such that the center of the camera view points exactly at this item of interest to get the color identification right. However, by using the visual sounds of the full view along with the "Analyze" submenu option for filtering colors (keyboard shortcut "a"), The vOICe will filter for each color in turn, such that at some point it only sounds any red items in the view along with saying the color name "red", and any red wire will appear as a single tone going up or down in pitch depending on its visual orientation. Of course such advanced uses may require some practice depending on the exact nature of what you are trying to accomplish.

On suitable phones[9], pressing key “p” will save a snapshot picture to the memory card. The resulting JPEG[10] format image file contains a numeric timestamp[11] in the filename, e.g., as in "vOICe_1155477325843.jpg". The timestamp ensures that each snapshot automatically receives a unique filename. You may have to give several permissions while saving, depending on the phone’s security limitations. The saved image file may subsequently be used for many purposes, such as OCR (optical character recognition), or for sharing with friends. The file location is either the Images folder or the root of the memory card, depending on the type of phone. After saving the image, which may take several seconds, normal operation resumes.

When using the phone camera outside in the sunshine, color readings can be badly affected by glare if there is direct sunlight on the phone. In such situations, try to cup one hand over the phone without blocking the camera view, such that your hand acts like a sunshade - much like a hat can keep your face out of direct sunlight.

Finally, there is support for an additional very special color: skin. Pressing key “1” twice, or “11”, will toggle the skin-only color filter. This will in principle only sound any exposed skin in your view, such as faces and hands, which might find uses in for instance determining how many people are nearby or locating empty chairs in a conference room. The skin color filter also takes into account typical racial differences. However, certain materials such as wood can have a color that is very similar to skin, in which case you need to also take into account apparent shape and size in the visual sounds to try and determine for yourself if results of the skin-only filter only show skin. The best way to start and learn is to experiment. There should be a difference with and without clothes on.

In all uses, please stay aware that pointing the phone’s camera at people who do not know you or The vOICe, in public places or elsewhere, might trigger hostile reactions, for instance because people may think that you are taking their photograph without their permission or otherwise invading their privacy.Similar issues may apply when pointing the camera at certainproperties.

Have fun!

Key / Action / Default
0 / Toggles muted state / Off
1 / Toggles negative video / Off
3 / Toggles speech feedback / On
7 / Toggles "anti-stutter/buzzing" mode / Off
9 / Cycles contrast enhancement / 100%
* / Toggles talking color identifier / Off
# / Cycles sound volume levels / 50%
UP / Higher sample rate, up to 22 kHz / 16 kHz
DOWN / Lower sample rate, down to 8 kHz / 16 kHz
LEFT / 0.5 or 1 second visual sound / 1 second
RIGHT / 1 or 2 second visual sound / 1 second
FIRE / [Flash and] say color / Off
00 / Mute and pause (low CPU load) / Off
** / Color identifier, no visual sounds / Off
## / Toggle blinders (narrow view) / Off
r / Red-only color filter / Off
g / Green-only color filter / Off
b / Blue-only color filter / Off
c / Cyan-only color filter / Off
y / Yellow-only color filter / Off
o / Orange-only color filter / Off
m / Magenta-only color filter / Off
11 or s / Skin-only color filter / Off
a / Analyze colors by cycling filters / Off
p / Save snapshot picture to memory card

Overview of The vOICe key commands

Quirks mode

You can also experiment with a “bat call” quirks mode toggled by a long-press of the FIRE button: this gives you two loud but very brief high-pitched chirps in rapid succession, much like an audible version of the clicks or sound flashes emitted by bats during echolocation. The double sound flashes may thus be used with the phone’s built-in speaker to detect nearby obstacles from any echoes that you hear. The sound flash patterns are repeated with the same interval used for the visual sounds, every second by default. If you prefer, you can toggle use of single sound flashes by pressing the “1” key while in the bat call mode. You can also independently cycle the audio volume of the bat calls by pressing the "#" (pound, hash) key while in the bat call mode. If necessary, use your hand to form a cone over the phone’s speaker for improved directionality of the sound flashes, and hold the phone in a position that is consistently aligned with your ears, for instance in front of your face.