Autonomous Roving Sound Locator

AROSOL

(Autonomous Roving Sound Locator)

Sean Lyons

4/21/14

University of Florida

Department of Electrical and Computer Engineering

EEL 4665C – IMDL – Formal Report

Instructors: Dr. A. Antonio Arroyo, Dr. Eric M. Schwartz

TAs: Andy Gray, Josh Weaver, Nick Cox, and Daniel Frank

Contents

Abstract ...... 2

Executive Summary ...... 2

Introduction ...... 2

Integrated System ...... 3

Mobile Platform ...... 4

Actuation ...... 4

Sensors ...... 4

Behaviors ...... 7

Experimental Layout and Results ...... 8

Conclusion ...... 10

Documentation ...... 10

Appendices ...... 10

Abstract

Two senses that dominate the human experience are sight and sound. In order for a robot to interact naturally with humans, it must be able to interact in via sight and sound. There is a rich suite of applications available for robots that can perform sound localization, including defense, speech recognition, virtual conferencing, and other humanoid robotics [1], [2].

Executive Summary

AROSOL was born out of the dream of autonomously tracking sound using a simple microcontroller. Vision was another side interest that was added in order to aide in localizing the sound using color detection.

Low level sensing is provided by three front facing short-range IR sensors. These allow AROSOL to avoid objects that cross its path. For objects that are missed by the IR sensors, there are front and rear bump sensors that are implemented using four push button switches.

Actuation is carried out via two Pololu motor driver carriers, controlled by the microcontroller. The motor drivers drive two 34:1 gear motors. The motors come with hall-effect shaft encoders that provide feedback to the microcontroller. This feedback allows the microcontroller to deliver proportional control as well as PID control.

A custom-built band pass filter/amplifier circuit achieves audio sensing. The filter/amp combo isolates a narrow high frequency band ( and amplifies and level shifts to make the audio signal useable by the microcontroller.

Power is provided via a 7.4V two cell Li-Po battery, it is distributed with a custom power PCB. Battery eliminator circuits power the microcontroller and the high level ARM processor.

There are four states that AROSOL operates in: Sound, Obstacle, Camera, and Mission Complete. AROSOL begins in sound state where it tracks a sound, it then transitions into Camera state which uses color detection to hone in on sound source. The obstacle state can interrupt any state in the even that an obstacle is in the way. Mission complete is a state where the robot shows off that it has completed the mission.

Introduction

AROSOL (Autonomous Roving Sound Locator) is a high-frequency pitch following robot. Similar to a dog, the bot will have “hearing” that is sensitive to high frequency pitches, and will be able to locate the source of the sound, much like a dog following a dog whistle. The problem to be investigated is machine intelligence via sound. Sound offers new challenges and a lot of opportunities to grow because it naturally has lots of interference and noise. Sound is an important application to study because it bridges the gap between humans and robots. It lends itself to creating humanoid robots (specifically speech processing), but also to the vastly opposing field of defense. AROSOL will synthesize the digital world of microcontrollers and embedded computers and the analog world of audio to investigate methods of how sound can make intelligent machines.

Integrated System

Intelligence will be achieved by AROSOL in 2 two ways: a level-low level environment sensing/reaction, and high-level stimulus processing (vision). Fig 1 is a graphic describing each of the functional systems within AROSOL.

Fig 1

Mobile Platform

The platform is comprised of two levels. The top level holds the microphones, Raspberry Pi, and the camera, this isolates the microphones from noise created the motors and drivers and allows the camera unobstructed vision. The bottom level contains the bump switches, IR sensors, and microcontroller board. This minimizes cable routing. The simplistic platform design was chose to maximize the area of each layer, thus spreading the microphones apart for between processing.The mobile platform is shown below in Fig 2

Fig 2

Actuation

AROSOL is differential drive robot with two drive wheels and a castor. Since the robot is only intended for indoor use on flat flooring the motors are not required to have a lot of torque or need to spin at extremely high speeds. AROSOL is a low speed robot with lots of control over its motion. Two Pololu 25mm DC motors motors with a 34:1 gear ratio will power the two drive wheels. These motors offer high pulling capacity for a light robot at low speeds. The lower torque/RPM requirements allow for lower power consumption of each motor. Shaft encoders are included on each motor as a form of feedback to a software PID controller for very smooth driving capabilities.

Sensors

Short range:
Power: 3.1-5.5 DC
Range: 3 – 30 cm (1.18 – 11.8 in)
Output Voltage: .3 – 3.1 V
Type: Analog

Sound

Electret Microphones:
Power: 5V max
Sensitivity: -44dB
Resistance: 680 Ohm
Size: 9.77mm diameter
Audio Op-Amp: LT1632

Bump Sensors

Contact Switch:
Will be attached to mechanical arm

Motors/Drivers

25mm D 34:1 Metal DC Gearmotor:
Power: 6V
RPM: 165
Free running currnent: 80mA
Stall Current: 2A
Torque: 40oz-in (2.9 kg-cm)
MC33926 Motor Driver Carrier:
Channels: 1
Power(min): 5V
Power(max): 28V
Continuous Current: 2.5A
Peak Current: 5A
Current Sensing: .525 V/A
Max PWM Frequency: 20kHz
Min logic voltage: 2.5V
Max logic voltage: 5.5V
Reverse voltage protection

Boards

Mattair Tech MT-X1S Dev Board:
ATxmega128a1u based
3.3V LDO (1A)
USB – Serial Bridge
LUFA
32.768kHz RTC
32MHz internal Clk
many more features

Raspberry Pi Model A:
Power: 5V (1.2A)
Clk: 700MHz
GPU: openGL ES 2.0 compatible, OpenVG compatible
Raspberry Pi Camera attached

Special Sensor

The special sensor is a 2nd order two channel analog audio filter. The filter is used pick out a “whistle (f = 8.7k) from the ambient noise of the room. The circuit schematic is shown below in Fig 3

Fig 3

The frequency response is shown in Fig4.

Fig 4

Notice that the center frequency is around 8.7kHz with a 40dB gain in the pass band. A graph of Vo vs Vin is shown in Fig 5.

Fig 5

Notice the small amount of clipping in the negative half cycle of the output wave, this is due to the input signal attempting to drive the op-amp below the negative supply rail of 0V so clipping occurs. Clipping is a form of distortion and is usually undesirable in a audio system, but since the wave isn’t be reconstructed after processing, it should be allowable for basic binaural detection.

Experimental test results showed that the circuit had a bandwidth of roughly 1kHz, a gain of around 50V/V, and a center frequency of 8.7kHz.

Due to the component tolerances used, the left channel had an observed center frequency of 8.7kHz and the right center frequency was 8kHz.

Behaviors

There are four main states AROSOL operates in:

Wait for whistle (roam and obstacle avoid)
Track whistle location (proportional control from microphone data)
Hone whistle location (using camera/ color detection)

When the robot boots up it drives around avoiding obstacles and waiting for the sound of the whistle. Once it has heard the whistle sound, it continues to obstacle avoid while looking for the location of the whistle. Once the whistle location is found and the bot is reasonably close, it turns on the camera and searches for the master. The master can be identifiedas the sound source with a purple rectangle around it. Once the master is found, AROSOL will continue its approach. Once the bot is sufficiently close to the sound source. It will stop and go into the mission complete state, signifying that the first run of sound localization is complete. After the mission complete stage runs for a few seconds, the robot enters sound state and restarts the process. A state diagram is shown below in Fig 6. Note: Obstacle avoidance occurs in all states.

Fig 6

Experimental Layout/Results

The first challenge was making use of the sound data. There were several challenges along this road. The first was timing. The microcontroller needed to process data from motor encoders, IR sensors, bump sensors, microphones and serial data all at the same time. However, the priority this information is varied. An interrupt system was implemented to achieve this goal. The PID algorithm was tuned and implemented using motor encoder information that is sampled ten times per second. The audio samples were polled the quickest of all at around 40kHz which satisfies the rate required by the Shannon sampling theorem. To gather data, the ADC was used in differential mode, with a 1.25V precision reference provided on board. The envelope detection was performed and the envelope of the two signals is used to determine the location of the sound. Amplitude based testing such as this is not as precise as time delay testing, but still was accurate enough for the application. Some experimental data is shown in Fig 7. Note that SumL, SumR, and Difference are 16 bit unsigned integer values from the Xmega’s ADC.

Ear / Angle(deg) / Distance / SumL / SumR / Difference
Left / 0 / 2 ft / 532 / 34 / 498
571 / 43 / 528
782 / 38 / 754
Left / 45 / 2 ft / 246 / 41 / 202
219 / 41 / 178
219 / 42 / 177
Left / 60 / 2 ft / 143 / 42 / 101
123 / 43 / 80
140 / 42 / 98
Left / 90 / 2ft / 81 / 41 / 40
85 / 42 / 43
83 / 43 / 40
Right / 0 / 2 ft / 25 / 601 / 576
25 / 592 / 567
27 / 566 / 539
Right / 45 / 2 ft / 29 / 112 / 83
32 / 108 / 76
27 / 96 / 69
Right / 60 / 2 ft / 27 / 108 / 81
27 / 108 / 81
27 / 109 / 82
Right / 90 / 2 ft / 30 / 67 / 37
29 / 59 / 30
28 / 41 / 13

Fig 7

The vision data was gathered using openCV. I used a custom builtopenCV library written in C that uses a modified MMAL calls for the raspberry Pi camera to improve speed. No benchmarking was performed, but there is a significant performance boost over Python. Images were processed in the HSV color space because it allows for more independence on different angles of reflectance, etc. However, it was found that in order for the vision to be precise, a constant light source would need to be added to the top of the robot. With the light source added, there were very few false positives and almost no missed detections. The area of the detected image was then calculated and movement was based on the threshold lines shown below.

PID tuning was performed in a debug state in which the robot acts under RC control to allow for stopping quickly if the controller “blows up”. The gains were tuned in the following method:

Set Kp (proportional control term) such that the robot oscillates in a straight line path
Set Kd (derivative term) such that the oscillations are nulled
Set Ki (integral term) such that the robot reaches steady state quickly and in a stable manner

Conclusion

AROSOL fully explored the challenges faced by intelligent sound localization performed by a simple microcontroller. While it is possible, the microcontroller often will need an external intelligence source such as a DSP or a high level ARM board with a camera. Using the current hardware setup, sound localization could be improved by using the Raspberry Pi for sound processing with precision ADCs and the floating-point math optimizations available to the Pi. These improvements will be added in the future.

The Intelligent Machine Design Lab course at the University of Florida teaches the art of systems integration as it applies to machine intelligence. Multiple systems had to be interfaced to achieve AROSOL’s goal. These systems include: microcontrollers, motors, motor drivers, sensors, microphones, high-level processors, and cameras. Controlling this varied hardware required a lot of advanced software that was developed using C/C++.

Documentation

[1]
Atxmega128A1U:
Raspberry Pi:
Code used/ borrowed
Nick Cox’s IMDL xmega Code:
OpenCV example code:
Custom BuiltOpenCV library:
WiringPi Library:

Appendices

My Github (containing the project’s code)