| December 05, 2013

The Future of Computer Science

The information age is causing fundamental changes to all aspects of our lives and I believe that those individuals, organizations, and nations who position themselves for the future will benefit enormously. In particular, there are major opportunities right now for those who are starting their careers in computer science research.

My professional career started in 1964 when I graduated from Stanford in electrical engineering. I was hired as an assistant professor at Princeton in an electrical engineering department. There were no computer science departments. Fortunately for me, Ed McCluskey asked me to teach a course in computer science. There were no books so I had to ask what is the content of such a course. He gave me four papers and told me if I covered them it would be fine. What I did not realize is that teaching that course made me one of the world’s first computer scientists. Whenever someone was looking for a senior computer scientist I was probably on the short list. That is probably why in 1992, President George H.W. Bush appointed me to the National Science Board which oversees the National Science Foundation. Imagine if I had been in high energy particle physics. I would still be waiting today for the senior faculty ahead of me to retire so I could have some good opportunities. When I tell this story to students today they respond by saying I was lucky to have started my career in 1964 when computer science was just beginning. The message I am giving is that those starting today are starting at a time of fundamental change and if they position themselves for the future they will have great careers.

In the past 30 years computer science was concerned with making computers useful: developing programming languages, compilers, operating systems, databases and so on. Today it is focused on how computers are being used. We need to develop the science base to support the new directions and of course update curriculums so that our students are trained in the relevant aspects.

Some of the topics for the future will be:

Tracking the flow of ideas in scientific literature
Tracking evolution of communities in social networks
Extracting information from unstructured data sources
Processing massive data sets and streams
Extracting signals from noise
Dealing with high dimensional data and dimension reduction

The field will become much more application oriented. The theory to support the new directions will include:

Large graphs
Spectral analysis
High dimensions and dimension reduction
Clustering
Collaborative filtering
Extracting signal from noise
Sparse vectors
Learning theory

This post is based on a talk I gave at the Heidelberg Laureates Form in September 2013. In that talk I discussed the science base for topics such as sparse vectors, digitization of medical records, communities in social networks, large graphs, high dimensional data, and dimension reduction. One example of the science base for high dimensions is the fact that the volume of a sphere in high dimensions goes to zero as the dimension increases. If you consider data generated by a unit variance Gaussian centered at the origin there will be no probability mass within a sphere of radius one centered at the origin even though the probability distribution has its maximum value there. When one integrates the probability distribution over the unit radius sphere, the integral is zero since the sphere has no volume. In fact, there will be no probability mass until one goes out a distance equal to the square root of the dimension, since that is the radius of a sphere where the sphere has nonzero volume. The gist of this example is that intuition can fail us in high dimensions and we cannot always generalize results from low dimensions to high dimensions.

A video of my talk can be viewed here and the slides can be downloaded from “Future Directions in Computer Science Research” (Heidelberg (1)). For more details of many of the aspects mentioned in this post and the talk, see the text book “Foundations of Data Science” coauthored with Ravi Kannan, which can be freely downloaded from this link.

John Hopcroft is the IBM Professor of Engineering and Applied Mathematics in Computer Science at Cornell University.