SIAM News Blog
SIAM News
Print

Moving Towards True Autonomy in Robots

By Lina Sorg

In recent years, technological advancements have allowed robots to better perceive, understand, and function in the world around them. As a result, sophisticated drones, self-driving car systems, Mars rovers, and even home and service robots are edging closer to becoming everyday realities. Lightweight electronics and tremendous communication architectures are contributing to current, higher-level autonomy capabilities. “It’s a really fun time to be a roboticist,” Nicholas Roy of the Massachusetts Institute of Technology said. During a minisymposium presentation at the 2021 SIAM Conference on Computational Science and Engineering, which is taking place virtually this week, Roy spoke about semantic developments that help robots effectively perceive their surroundings in a variety of situations.

Sensors are one of the biggest enabling technologies for the field of robotics. The LiDAR (light detection and ranging) devices on top of self-driving cars, which use pulsed laser waves to map the car’s distance from surrounding objects, are but one example. “This type of high-precision sensing has really enabled autonomous vehicles in the last 20 years or so,” Roy said. He then turned his attention to the Skydio 2 drone, which navigates urban settings at high speeds and generates a picture of its surroundings that includes trees and building structures. Rather than rely on heavy, expensive, and high-accuracy materials, it utilizes a series of cameras that are positioned in a ring around the drone. More sophisticated robotic systems are beginning to embrace this kind of technology to glean a rich understanding of their environments.

Figure 1. Expansion of the initial control loop cycle. The green circles identify spots where the models can be learned from data. Information flow is imbedded in each arrow.
Although society is certainly moving closer to achieving ubiquitous robotic autonomy, persistent issues with safety and reliability remain. “Most self-driving vehicles have the same structure,” Roy said. The structure—which is comprised of a sensor, estimator, motion planner, controller, and motors—works quite well at moving a vehicle (as a point in space) from one place to another but does not do much else. For example, if designers want their robot to conduct a more complicated task—such as pick up a passenger or explore a Mars crater—they must then include an expansion in the control loop. The expansion adds a slow-duty cycle to the existing fast-duty cycle; a symbolic planner receives a symbolic state extraction and is then fed into the motion planner of the initial loop. This extension runs very smoothly, as researchers know how to verify symbolic state extraction.

The green circles in Figure 1 identify places where the models can be learned from data; this is true of the entire control loop. “This is why we have such a good ability to make our robots move through their worlds,” Roy said. However, the two reactions indicated by the arrows that lead to and from the expansion cycle typically cannot learn from data and must be manually specified. “This is a big limitation in verifying and deriving trustworthy safety-critical autonomy,” Roy said. “Trustworthy safety-critical autonomy will require true understanding of the environment. And true understanding will require getting the representations right.” Unfortunately, researchers do not yet have a good theory for completely rectifying these representations.

Roy next delved into a discussion of “things” (objects) versus “stuff” (materials) to provide additional context. The computer vision world is quite adept at detecting “things” but tends to leave a lot of “stuff” out when mapping an environment. For instance, spatial extensions like trees pose a problem because they move and cannot be neatly encompassed in a well-defined bounding box. And pavement’s lack of clean boundaries means that robots cannot easily sense or describe its “end.” Even “things” can be problematic, as robots might not understand the relationship between a window that is cut off by the edge of the frame or a door that is partially obscured by other parts of the scene. These nuances present a challenge for any kind of verification system.

Figure 2. Use of an RGB camera, a depth camera, and semantic segmentation to map an environment.

Fusing the notion of “stuff” with the concepts of geometry offers a possible solution. For example, one can use an RGB camera with extended visual range to extract objects from their surroundings, then utilize a depth camera to paint the objects into a scene. Though this result is still not great, conducting semantic segmentation of an RGB camera image yields a meaningful representation that allows researchers to think about correctness for the vehicles’ performance (see Figure 2).

Beyond the concept of “things” versus “stuff,” robots must learn to reason with the relationships between objects and their environments. “There’s a lot more to be done with computer vision right now in terms of verification and trustworthiness,” Roy said. The following questions are therefore paramount:

  • How can a robot identify everything around it?
  • How can a robot reason at different levels of representation?
  • How can a robot use very sparse representations?

Given our knowledge and understanding of the world, humans can easily interpret a scene, extract valuable details, and attach meaningful semantics. For example, consider a scenario wherein one must move to an unspecified goal that is 100 meters away with two possible courses of action: straight down a corridor or through a door into a classroom (see Figure 3). People’s inherent familiarity with buildings confirms that they are unlikely to reach the goal just by entering the classroom, as classrooms typically are not 100 meters wide. Consequently, if the goal is only five meters away, humans know to go in the classroom rather than travel down the lengthy corridor.

Figure 3. Representation of a scenario in which one must select a symbolic action to reach a goal that is 100 meters away.
Systems that contain this type of learning component present a huge issue for autonomy, as even weather or lighting changes can present unexpected challenges. Robots must ultimately learn from sparse corpora of data that represent the real world.  “Learned models need a lot of data from all operating conditions,” Roy said. He then presented a video of a Learned Subgoal Planner robot navigating the halls in an apartment building. The robot understood that hallways are more likely to lead to its goal and had learned to ignore open doors to rooms.

Moving forward, full autonomy will require new mathematical theory of representation. Such theory must allow joint inference over the continuous physical world and discrete concepts in a principled fashion; learn representations that correspond to the structure of the physical world and human understanding of this world; and represent the unexpected without having seen it. “One of the challenges of our autonomous systems is that they can’t reason outside of what they’ve seen in the training set,” Roy said. This limitation will be a key focus area for future work.

 Lina Sorg is the managing editor of SIAM News.
blog comments powered by Disqus