By Karthika Swamy Cohen
Yann LeCun of New York University and Facebook gave an overview of deep learning and the many challenges and opportunities it presents to mathematicians in his invited talk, aptly titled “Deep Learning: A Cornucopia of Applications and Mathematical Mysteries.” His presentation was part of the 2016 SIAM Annual Meeting.
Deep learning has greatly enhanced state-of-the-art techniques in speech recognition, natural language understanding, object detection, visual object recognition, and language translation. The technique has had an impact on various disciplines ranging from artificial intelligence and imaging to biomedicine to drug discovery and genomics.
LeCun began his presentation by relating learning representations and features of deep learning to traditional models of pattern recognition and machine learning. “Machine learning really has its roots in pattern recognition,” he said.
Deep learning involves computational models composed of multiple processing layers in order to learn representations of data with several levels of abstraction. At every layer the system takes the features of the motifs of different layers. “The reason we want multiple stages is because the world is compositional,” said LeCun. “The various motifs can be assembled into parts of an object, like an arm or leg or wheel of a car.”
Thus, deep learning techniques discover intricate structure in large data sets and facilitate classification, organization, and prediction of data. The key aspect of deep learning is that these layers are not designed by humans but are rather learned from data using a general-purpose learning procedure.
LeCun proceeded to explain the realities and complexities of deep learning. Hundreds of millions of parameters are involved in deep learning problems. Recognizing each sample may take billions of operations, so the size is very large. Operations, however, are simple, like addition and multiplication.
Deep learning architecture, such as neural networks, have multilayer neural nets with multiple layers of simple units. Each unit computes a weighted sum of its inputs, which is passed through nonlinear functions.
LeCun then moved on to convolutional neural networks (CNN), which are composed of one or more convolutional layers, followed by one or more-fully connected layers similar to standard multilayer neural networks. He began by explaining how he had first used convolutional networks in handwriting recognition.
In CNN, each layer applies hundreds or thousands of different filters; this application produces feature maps and combines their results. Hence, convolutions can run over large images. This can also be applied to cursive writing.
Previous techniques required the breaking up of individual characters, but with convolutional networks, the network can run over the entire word.
In the late 1990s and early 2000s, CNN found application in face recognition technology. Unfortunately, this involved training a large amount of data, and the computers were not fast enough to handle this. CNN can also be used to parse scenes both indoors and outdoors as well, as in long-range adaptive robot vision.
While researchers can use CNN in object recognition—since the size of data sets is small—it needs many of samples to train. In this context, the “Large Scale Visual Recognition Challenge” came about in 2013. The project used 1.2 million training samples to train large neural networks on the ImageNet database; it yielded great results and pushed CNN to the forefront.
Researchers are now attempting to apply CNN for image captioning, the process of generating a descriptive sentence based on an image. At this juncture, CNN works within a limited range of images.
LeCun explained that Facebook has started using CNN for facial recognition via initial training on several thousand people. This learning phase produces features regardless of the face of the person. CNN can also detect key points on the body of a person to discern his or her pose.
Additionally, CNN is able to recognize objects and draw their outlines."This makes me hope that someday a robot will pick out broccoli,” said LeCun of object recognition technology. The audience laughed as he added, “I don’t like broccoli.”
Scientists can also apply CNN in vision systems for cars; the Mobileye vision system enables cars to stay in lanes and visualize traffic. Medical imaging is another area where CNN is finding a great deal of use. “Convolutional networks will essentially revolutionize radiology,” LeCun said.
The current challenge comes with designing algorithms that employ optimization over multiple machines. Nevertheless, deep learning is making advances that have long resisted artificial intelligence, and is at the core of computer vision and auditory perception by machines. It is used in self-driving cars, social network content filtering, and ranking of search engines. And the technology is only getting better.
||Karthika Swamy Cohen is the managing editor of SIAM News.