The Evolution of SDM, a Premier Data Mining Conference

By Chandrika Kamath

Now in its 14th year, the SIAM International Data Mining Conference has evolved into a well-established and highly regarded meeting. A recent ranking of data mining conferences lists SDM as second only to the longer running, and much larger, Knowledge Discovery and Data Mining (ACM KDD) conference, in terms of the average number of citations per paper. KDD, which has accepted 1212 papers since its start in 1995, has 18.8 citations per paper, while SDM, which held its inaugural meeting in 2001, has 611 papers, with an average of 11.0 citations per paper.

This is indeed good news for both SDM and SIAM. The credit goes largely to the authors and the tireless efforts of the program chairs and program committee members, who each year carefully review the nearly 400 submissions and select the best for presentation at the meeting. As the conference papers are available online, they are widely accessible through search engines, making the latest research in the field accessible at one’s fingertips.

SDM is unique among data mining conferences in focusing on the mathematical and statistical aspects of the field. Held in cooperation with the American Statistical Association, it is smaller than other data mining conferences, typically attracting 250–300 participants from universities, industry, and national laboratories, in a multitude of countries. Student-oriented activities, such as the doctoral forum, have enabled many students to attend their first conference, where they have the opportunity to present their work and receive feedback from a friendly audience. The size of the meeting is conducive to informal interactions among participants; an evening reception, held in conjunction with the poster session, provides a venue for networking, as many participants prefer posters to talks as a medium for exchanging ideas. Travel support from the National Science Foundation, SIAM Student Travel Awards, IBM Research, and Google have been invaluable in making SDM accessible to students.

From a technical viewpoint, SDM has evolved over the years, keeping current with a rapidly changing field. The focus in the early years was on techniques like association rules and on applications in science and business. This evolved into an emphasis on the mining of text documents, prompted by the need to search web pages, and bio-informatics, motivated by the sequencing of the human genome. More recently, algorithms for analyzing complex networks have become popular, and certain application domains, such as health care and cybersecurity, are providing a rich set of problems that cannot be solved without sophisticated data mining techniques.

Srinivasan Parthasarathy.

Interestingly, classical algorithms, including classification and clustering techniques, remain popular research areas, perhaps reflecting their timelessness and utility in practical problems.

Of course, in this age of Big Data, scalable techniques are an active area of research, and the success of IBM’s Watson has prompted the coupling of data analysis with decision support, while placing the conclusions drawn from data on a sound mathematical footing. To help the community keep up with this constantly changing field, SDM provides tutorials on the latest techniques and application areas, as well as workshops, where participants interested in focused topics can present their on-going work for discussion.

The new year marks several changes for SDM: A new steering committee, with Srinivasan Parthasarathy (Ohio State University) as chair, will oversee the selection and management of the organizing committee for the conference each year. Joining him will be the newly elected officers of the SIAG on Data Mining and Analytics—Zoran Obradovic, Jeremy Kepner, Kirk Borne, and Takashi Washio. They will help ensure that SDM remains a well-regarded, smoothly running conference and continues to meet the technical needs of the more mathematically inclined members of the data mining community. SIAM was prescient in starting the SDM conference in 2001, long before “data mining” became a household term. Its strong support of the conference ensures a bright future indeed for SDM. We welcome you to join us at SDM14, in Philadelphia, April 24–26, to learn about the latest in the field from highly cited authors.

Chandrika Kamath is a researcher at Lawrence Livermore National Laboratory, where she is involved in the analysis of data from scientific simulations, observations, and experiments. She chaired the SDM steering committee from 2007 to 2014 and from 2011 to 2013 was chair of SIAG/DMA.