A Hybrid Annealing Algorithm for Tuning Walking Gaits

Machine Learning CS7333

Patrick McDowell

November 27, 2000

Introduction

Adaptation and learning is key to the success of robotic systems which operate in unstructured environments. Without the ability to adapt to changes in the environment, “low level” activities such as walking are difficult. Several institutions have studied the problem and there exist extensive literature on the subject. The LSU Robotics Research Lab has reviewed some of the literature, mainly from MIT and McGill University, and built three prototype walking machines for the LSU Robo-Tiger project.

The prototypes were built and programmed in order the study the feasibility of building full size walking machines that could operate autonomously in unstructured environments. There have been many issues, including materials, electronics, sensor integration, and computational. “Stubby”, the third prototype, is the culmination of the lab’s feasibility study. A full description is available in [4]. Figures 1 and 2 are photos of Stubby.

Figure 1. On the right hand side of the picture is LSU’s “Stubby”. It is the third of three small prototypes developed during the LSU Robotic Tiger feasibility study. Note that Stubby has the same general size as its biological counterpart, Mitzy.

Figure 2. Top view of Stubby. The black boxes on its legs are the RC servo-motors. Also note the two controller boards above its rear legs.It can walk on smooth surfaces in an acceptable manner. Its electrical and mechanical systems are adequate for limited study of basic learning pertaining to walking.

The current software control system resembles MIT’s Subsumption Architecture [1] (See Appendix A) in that it is asynchronous and control is distributed. It is called the “Virtual Controller System” because each leg has a software controller that operates independently of the others. A gait generator provides coordination of the legs. The whole system works acceptably well, and is a vast improvement of the previous control systems. The robot can walk across a smooth surface using a trotting type gait and a galloping gait. Currently there is no method to adapt to surface textures or to tune the walking gaits. The gaits are tuned by hand so they are not at all optimized. Irregularities in the surface cause the robot stub its toes, and work against itself, resulting in a labored walk.

The remainder of this paper discusses a proposed methodology of providing adaptation and learning in order to improve the efficiency of Stubby’s walking gaits.

Adaptation and Learning

Stubby’s stage of development could be compared to that of an animal that has just been born in that it has a workable control system that has not yet adapted to its body or trained in its environment. The big difference is that animals have excellent sensors and brains that are pre-wired to immediately start the processes of adaptation and learning. For it to walk better, it needs to be able to adapt and learn.

In the following paragraphs, “straw-man” design for implementing gait learning in Stubby is described. It relies on a reflex triggered learning scheme. That is, the learning algorithm is triggered when the robot’s reflexes are fired frequently. This method relies on the general assumption that in biological systems, reflexes are used to correct short-term conditions that are detected. If these conditions are detected repeatedly, then something must be wrong with the overall plan that is being executed. For example if a person keeps spilling their coffee because every time they sip it, they jerk it away from their mouth because it burns them, then the coffee is too hot and they should do something to change the situation. In Stubby’s case, it will detect and react to two error conditions associated with walking, toe stubbing and pushing its legs when they are not on the ground. If these conditions arise frequently during the process of walking, this will serve as a trigger to tune to the walking gait.

Reflexes will be implemented by using subsumption-like extensions to the current virtual controller system. A new leg has been designed and prototyped that uses two switches, one for detection of a foot down situation and the other for detection of a toe stub.

When a micro switch detects a toe stub (a toe stub would occur if the robot is reaching forward with a leg, but the toe scrapes the ground with enough force to actuate the micro switch), the reflexive behavior will subsume (take precedence over) the reach forward behavior and pick the foot up higher. By doing this, the action of reaching forward will not push the robot backwards, as currently happens in some situations. The “find the ground” reflex will be implemented in a similar fashion when Stubby is pulling with a leg but its weight is not on that leg.

If the robot were to remain on a constant surface, toe stubbing and other walking problems could be attributed to the walking gait having either timing errors or the leg motions themselves being wrong, or both. An optimal walking gait can be found by altering the timing and/or leg movements in a way that minimizes the number of times that the reflexive actions are used. Figure 3 below shows a flow diagram of the control system using gait reflexive actions and gait optimization.

Figure 3. This diagram illustrates the proposed interaction of the reflexive actions and the leg motion controls. Reflexive actions will take priority over the leg motion controls. Gait adjustment will be implemented at the when the frequency of reflexive actions exceeds a user defined threshold.

Gait adjustment using a Hybrid Annealing Algorithm

Stubby’s current walking gaits are based on work done at MIT’s Leg Lab. The Leg Lab has been able to generalize multi-leg walking into a “virtual biped” system [3]. That is, groups of legs are collected together and carry out the function of the right leg or left leg of a biped. For example, a trotting gait for a four-legged robot pairs the front left leg with the rear right and the front right with the rear left. For galloping, the front legs are a pair and the rear legs are the other pair. Appendix B illustrates the above concepts.

Currently, Stubby’s different gaits are selected using a simple bit scheme. There is a bit for each leg, a set of four bits, one for each leg makes a frame and two frames describe a walking gait. So the gait for the four legged Stubby takes up one byte. If the 1st bit is on in the first frame is set, leg one reaches forward, if it is off it pulls, and so forth. In order to tune the gait to the terrain and to the robot we propose to add the following seven parameters.

Time delay (msec) between frames (Time gait frames).
Time delay (msec) between leg movements (Time between starting one joint moving and the next).
Time delay (msec) between joint movements (Individual joint speeds).
Legs that are going to reach start time.
Legs that are going to pull start time.
Leg number that starts reaching first.
Leg number that starts pulling first.
Time delay (msec) for 2nd leg to start reaching.
Time delay (msec) for 2nd leg to start pulling.

Since the current system uses only the bit frames, and parameters 1 through 3, parameters 4 through 9 are effectively 0. That is, both legs that are going to pull, pull at the same time etc.

The idea of the learning routine is based on simulated annealing. A perturbation to the system is kept based on a “temperature” based probability function. If it is not to be kept the system is returned to its previous state. Initially the system is assumed to be hot, that is changes that degrade the system performance are unlikely to be kept, but there is a finite chance. As the system cools, the chance of keeping such changes approaches zero. Parameters to be perturbed are randomly selected. Using this method, leaning takes place in a constructive manner one step at a time.

Simulated Annealing does not guarantee that an optimal solution will be found, but this algorithm is well suited for a robot because it works on improving the present state of the system one step at a time. Genetic algorithms are similar in effect but are not as well suited for an online situation because of the many different solutions that need to be run concurrently.

So, for Stubby, initially all parameters (1 to 9) to be altered would have an equal probability of being selected. Each time a change helps performance, (as measured by the frequency of reflex actions) the probability of that parameter being selected again for modification will be increased. If a change harms performance the opposite effect will take place. NOTE: A parameter being kept is different from a parameter increasing performance. In annealing, as stated above, occasionally parameters that do not help performance are kept. As the system converges on minimizing reflexive actions, the perturbation amount will be decreased as will the likelihood of keeping non-productive changes to the system. The algorithm below describes the gait adjustment method.

Input: Gait description (Reach/Pull bit frames, and Parameters 1 through 9)

Output: New set of Parameters 1 through 9

Function LearnToWalk(Gait Description)

{

prevReflexFreq = reflexFreq = Get frequency of reflexive actions;

If (reflexFreq == low)

Return;

Make the probability of selection of any of the parameters 1 through 9 equal;

Set annealing temperature anealTemp = hot;

Set perturbation distance pDist = some;

While (reflexFreq != low) {

Select and modify a parameter parmNum.

reflexFreq = Get frequency of reflexive actions;

/* Modify the selection probabilities. */

if (reflexFreq < prevReflexFreq) {

increase probability of selecting parmNum;

}

else {

decrease probability of selecting parmNum;

}

/* Decide if soloution is to be kept. */

if (keepSoltuion(reflexFreq, prevReflexFreq, anealTemp) {

prevReflexFreq = reflexFreq;

}

else {

return system to previous state;

}

/* Decrease temp and perturbation distance. */

pDist --;

anealTemp--;

} /* End while. */

} /* End learnToWalk. */

Carrying this idea one step further, the pattern of the reflexive actions could be used to reference a starting point in an associative memory for the learning algorithm. For example, if the robot made the transition from a known smooth surface, to a known rough surface, the type and timing of the reflexive actions that would occur to correct the robot should have a somewhat distinct pattern. This is akin to walking blindfolded through a parking lot into a field of thick tall grass. A person would naturally start picking up their feet a little higher. The first few steps may surprise them, but after that it would become routine and they would not be surprised again, until the walking surface changed again. Using this supposition, we plan to create a system that matches the pattern generated by the reflexive actions firing, and a gait that will minimize those actions.

So, when the reflexive actions increase above some threshold, the pattern will be used to key an associative memory that holds the closest gait. If the reflexive actions decrease below the threshold, the new gait is adequate, and the surface is known; if not, the gait optimization process will begin again using the new gait as a starting point.

Summary

The LSU Robo-Tiger project has provided a platform from which to research, explore and integrate the various technologies that will be used in creating truly useful machines that can operate autonomously in unstructured environments. Experience with earlier prototypes has shown that propulsion, balance and control of a four-legged robot is not trivial. Using the subsumption-like Virtual Controller method the robot's performance and walking manner improved tremendously over previous ad-hoc and genetic algorithm methods. Using the Virtual Controller method as a base to build on, this paper presents a simple system of reflexes that trigger a hybrid annealing algorithm used for tuning existing walking gaits online. It describes how the pattern of the reflexive actions can be used to access situational memories which can be used as a starting point for the hybrid annealing algorithm.

Appendix A

The Subsumption Architecture is a method for controlling mobile robots that differs markedly from centrally controlled, top down approach, classical AI methods. It can be described as a hierarchical, asynchronous, distributed system whose components produce behaviors. The interactions of the behavioral components in a subsumption architecture produce the system functionality. In general, sensor inputs are available throughout the system and thus sensor input and actuator actions are closely coupled. The behavioral components are set up in a hierarchy of layers, in which higher layers can inhibit or subsume lower layers, thus the name Subsumption Architecture. Figure 1 illustrates the differing approaches of classical symbolic AI and the Subsumption Architecture.

Figure 4. The differences between the Classical AI approach and that of Subsumption are illustrated in this figure. Classical AI is much more of a top down, centrally controlled approach while Subsumption takes a bottom up distributed approach.

Learning under the Subsumption Architecture is primarily a matter of conflict resolution among the various behaviors. That is, deciding which behaviors should be active at any particular instance in time. Historically the developers resolve these conflicts at compile time or by using a description of the desired behavior selection. In both cases, the resulting behavior hierarchy is static [2].

Appendix B

Figure 5 below is an illustration of the virtual biped concept developed at the MIT Leg Lab.

References

[1] Brooks, Rodney A. "A Robust Layered Control System for a Mobile Robot", MIT AI Lab Memo 864, September 1985.

[2] Maes, P. and R. A. Brooks, "Learning to Coordinate Behaviors", AAAI, Boston, MA, August 1990, pp. 796--802.

[3] “Quadraped (1984 –1987)”, http//

[4] “LSU Robotic Tiger: Results of Small Prototype Reasearch and Development Phase”, May 1, 2000;