As technology continues its progressive advance and large amounts of data are increasingly available, artificial intelligence (AI) is becoming an everyday reality. Some people embrace AI while others fear it, but nearly everybody lives it in one form or another.
During the Women in Data Science (WiDS) Conference held at Stanford University last Friday, Fei-Fei Li discussed the development and future of AI in a talk entitled “A Quest for Visual Intelligence in Computers.” Li is currently chief scientist of AI/machine learning at Google Cloud. She is also an associate professor in the Computer Science Department at Stanford, where she directs the Stanford Artificial Intelligence Lab and the Stanford Vision Lab.
Li began by outlining a brief history of the field’s growth, beginning when intelligent machines were just a mere concept of science fiction. She credited Alan Turing with motivating humanity towards the concept of artificial intelligence with the question, “How can we build intelligent machines?” Turing believed that researchers needed to (i) provide intelligent machines with the best possible sense organs, and (ii) teach them to understand English. Vision and language are two important ingredients in AI technology, Li said. More than half of the human brain is occupied with visual processing at any given point. And intelligent machines must be able to communicate in human languages to function successfully.
Terry Winograd created a computer system called SHRDLU, which allows users to communicate with the computer in a simplistic “blocks world.” Image credit: Fei-Fei Li, WiDS presentation.
Li went on to praise Terry Winograd, a professor of computer science at Stanford, for his significant involvement in the advancement of AI research. “Terry operationalized Turing’s idea of building an intelligent machine,” she said. He put forward components of syntax, semantics, and inference, three critical parts of AI. “Syntax is about understanding the structure of the world, semantics is about the meaning of the world, and inference is the prediction,” Li said.
Winograd created a system called SHRDLU, an early computer program for understanding natural language that allows the user to communicate with the computer in a simplistic “blocks world.” Users can tell the computer to pick up the big red block, for example, and it will look at the world, understand what’s happening, infer the situation, and find and pick up the block. The system understands more complicated instructions as well, such as “find a block that’s taller than the red one, pick it up, and put it in the gray box,” Li said. SHRDLU was one of the first indications that researchers could actually build Turing’s imagined machines – machines that understand humans and perform human tasks.
Unfortunately, Winograd didn’t push SHRDLU much beyond laboratories, and AI didn’t significantly advance between 1970 and 1990. This slowdown was due in part to the limitations and problems of hand-designed rules, which dominate the backend of intelligent machines. Scalability is one such problem, as it is difficult to anticipate all possible and unexpected scenarios that can arise when working with and training a machine. Adaptability is another hindrance because machines have trouble acclimating to new situations, such as new languages. Additionally, many hand-designed rules do not actually work in the so-called “real world,” rendering some AI developments closed-world scenarios.
Machine learning allows computers by use of algorithms, rather than traditional human programming. Image credit: Fei-Fei Li, WiDS presentation.
Luckily, the emergence of machine learning renewed the push for AI. “Machine learning came to the rescue,” Li said. “It became a blossoming field that injected new life into AI.” She then offered a brief overview of the differences between traditional programming versus machine learning, which allows computers to learn without explicit programming. For example, deep learning and neural networks are very active in machine learning algorithms. Deep learning stems from neuroscience research, and was shaped in part by the ideas of Frank Rosenblatt in the 1950s and David Hubel and Torsten Wiesel in 1962. Research related to neocognition and convolutional neural networks (CNNs) in 1980 and 1998, respectively, brought about further developments.
21st-century research focused largely on syntax – understanding the visual world and its 3D structure. Self-driving cars, for example, operate on a foundation laid by 20 years of 3D reconstruction research. However, 2012 was the defining renaissance moment for deep learning due to the creation of a CNN called AlexNet. AlexNet is similar to CNN work from the 1990s, but it combines three critical elements: a neural network algorithm, hardware (a GPS for parallel computing), and ImageNet lab data. Machine learning suddenly became a viable tool that embraced Winograd’s AI operation structure of syntax, semantics, and inference.
Li acknowledged that there is still much to be done, and her lab at Stanford is currently working to combine visuals and language in machines. “We have two works, and one is a dataset that benchmarks the syntax of vision and language,” she said. She also addressed the importance of awareness to prevent data bias, especially when working with such large datasets, and added that rule-based systems are not necessarily always safe, making AI and security a really interesting topic for further scrutiny. Regardless, it’s clear that the time of intelligent machines is upon us. “I don’t think I have to convince you that we’ve entered the age of AI,” Li said.
|| Lina Sorg is the associate editor of SIAM News.