| July 15, 2022

Classification Models Explore Human Trust in Autonomous Vehicles

The autonomous vehicle market is not yet fully autonomous; at present, self-driving cars are meant to be used in conjunction with human operators. Because factors like stress, exhaustion, and workload may degrade human performance, the decision to assign certain driving tasks to automated systems can increase both individual driver success and overall road safety. As such, researchers are studying the factors that facilitate human confidence in yielding some level of vehicular control to automation.

During a Workshop Celebrating Diversity minisymposium at the 2022 SIAM Annual Meeting, which is currently taking place in Pittsburgh, Pa., Carlos Bustamante-Orellana of Arizona State University utilized classification models to examine reliance on and trust in vehicular automation. He employed a testbed to study the factors that contribute to this complex problem (see Figure 1). Using a ride simulator, participants completed a simulated “leader-follower” task during which they applied automation—in the form of buttons and pedals—to follow a lead car at a steady distance. Completion of the course, which contained nine different zones, took about 12 minutes.

Figure 1. Participants used a ride simulator to complete a 12-minute, simulated “leader-follower” task during which they applied automation to follow a lead car at a steady distance while handling perturbations.

Participants relied on various types of automation so that Bustamante-Orellana could see the way in which each one influenced the decision-making process. Every participant completed five distinct simulations: manual (to obtain baseline physiological data), speed high, speed low, full high, and full low. During “speed” simulations, the automation only alters a participant’s speed in response to velocity changes in the lead vehicle. In contrast, “full” simulations allow the automation to adjust a participant’s speed and steering patterns to keep the vehicle centered in the lane. Finally, “high” and “low” refer to the level of variability in maintaining the following distance and lane position; a higher automation level should result in better performance.

Several environmental conditions—which served as perturbations—were then introduced to the course. These conditions allowed Bustamante-Orellana to analyze participant reactions to the conditions and their consequential effects on automation use. Sample perturbations include incoming traffic, gusts of wind that deviate the car from its lane, changes in the lead car’s speed, and static and dynamic pedestrians who appear on the side of the road or cross in front of the car. Drivers had to push a button to make the pedestrians disappear while also anticipating other hazards and maintaining the same distance behind the lead car.

16 individuals took part in the simulations. Each person performed five trials, yielding a total of 80 individual data sets. Behavioral data—which pertained to acceleration, velocity, and position in the context of the participant’s car, lead car, and seven additional simulated vehicles—was collected at 60 Hertz (60 samples per second). A performance evaluation scored participants’ utilization of automation at the appropriate times.

“The presented data and data collection were done with the goal of developing and validating models for measuring and drawing inferences about trust in automation,” Bustamante-Orellana said. “Although there is no clear definition of trust in automation, it is a variable that often determines the willingness of a human to rely on automation.” Confidence levels in automation depend on several factors that ultimately shape people’s willingness to adopt automation systems in their everyday lives.

Next, Bustamante-Orellana outlined his machine learning approach to predict automation use with human operators (see Figure 2). “We start with the pre-processing of the data, which includes data extraction, cleaning, and normalization,” he said. After the data is processed, Bustamante-Orellana classified the collected features into five different categories:

Same as target (these features are not considered for the classification models)
User/automation response to perturbations (including position, acceleration, and velocity of the participant’s car; button pushes; steering commands; and any driving violations)
Environment/perturbations (including distance and position of pedestrians; distance, position, acceleration, and velocity of the lead vehicle and traffic vehicles; and the appearance of perturbations at a given time)
Performance features (including score as well as counters for lane violations, incorrect button presses, and distance range violations)
Relative risk.

Figure 2. A machine learning approach to predict automation use with human operators.

He used 11 measures—the most notable of which are Pearson correlation, recursive feature elimination, and selection from model—to measure the top features. “These methods are widely used in feature selection,” Bustamante-Orellana said. “Feature selection picks the most important features, which are those that have a normalized importance value bigger than 0.01.” He focuses on 20 of the 91 total features since the top 20 are usually the most important. Additionally, the models run faster with less features and the difference in accuracy is minimal. The top features include score, risk, the linear velocity of the lead vehicle, and the distance to the lead vehicle. “This make sense because these are highly determined by the performance of the participant,” Bustamante-Orellana said.

After identifying the top features, Bustamante-Orellana utilizes different classification models that reflect the features from the previous stage and ultimately predict whether each participant will use automation at a given time. He evaluates the training and testing data set in two ways: with participant sampling and random sampling. The former trains the model with the data of 13 participants (roughly 80 percent) and uses the data of the remaining three participants (roughly 20 percent) for testing. In contrast, random sampling combines the data from all 16 participants into one data set for each condition and splits the set into 80 percent training and 20 percent testing. “Both the training and testing datasets contain information about all participants,” Bustamante-Orellana said. “But random sampling captures individuality more.”

Figure 3. Accuracy of different classification models under different training-testing splitting techniques.

Bustamante-Orellana then selected five common classification models—random forest, logistic regression, support vector machines, K-nearest neighbors, and naïve Bayes—to test the data and displayed the accuracy of each model with both random and participant sampling (see Figure 3). He found that random forest yields the highest accuracy, with a success rate of 94 to 99 percent via random sampling and 82 to 88 percent via participant sampling.

Ultimately, Bustamante-Orellana’s study lends further insight into the potential of autonomous vehicles and people’s trust in them. “This analysis will serve to propose a model for human reliance on automation and validate it with the data,” he said.

Acknowledgments: Carlos Bustamante-Orellana used data that was collected by the Army Research Laboratory to conduct his research.

Lina Sorg is the managing editor of SIAM News.