Skip to main content

Project Synopsis

Naturalistic Control of Naturalistic Mobility

A MORE NATURAL APPROACH TO ROBOT CONTROL


Currently, there is a lack of convenient ways to control robots. Most robots are controlled through either a command line interface or use proprietary controllers or software to control their movement. This can lead to issues of replacement parts being gatekept or prohibitively expensive to replace. Additionally, there is a learning curve with every unique robot having to learn a new control method. Our project is working to create a control interface which uses natural inputs to control a humanoid robot. 

We are creating a Linux application that will interface with the robot to allow for control through natural inputs. These natural controls will come in the form of voice commands and a video interface which maps human movements to robot movements. By designing a system in which people are able to easily give commands, this will lower the barrier to entry for leaning how to control a robot. Additionally, this will remove the barrier of proprietary software. 

For the GUI/Interface, we are using tkinter to create a simple view that has the camera output, visualized command graph, and buttons to turn the audio and visual threads on and off. These threads run classification on either camera or microphone input and then submit those movement commands to the movement queue. Confidence weighted exponential averaging is then run on those commands to generate a smoothed movement command that is then written to a csv as output. This output can be fed into a robot simulator to issue commands to the robot.

Figure 1. Mock up of GUI of our Application

For our voice input, we are using OpenWakeWord to scan input audio for a wakeword to activate the command queue. After that, we are using OpenAI's Whisper as our main verification with an additional check of PocketSphinx in the event of a minimum recognition threshold not being met to listen for commands. To build out our word list, we have created an additional program that allows for users to choose synonyms of words in the command set to add as additional commands. This allows for a well built out word set that is curated and allows for large word set growth.

Figure 2. Command line output of voice recognition

For our video input, we are utilizing Google's Mediapipe to take in a video feed and map joint landmarks onto a person. We then apply an algorithm to classify movements every 20~ frames, these potential classifications are then passed through a custom binary classifier to further validate the current movement to the respective frame. Once validated, we pass this validated movement to the parent thread of our GUI to add the respected pose classification to our movement queue.

Figure 3. Video Input with joint mapping and pose detection