SIAM News Blog
SIAM News
Print

Gene Golub SIAM Summer School 2019

By Giulia Guidi

 

In recent years, rapid growth in data volume/complexity has led to the emergence of large-scale data analytics, a point at the intersection of data analytics algorithms and high-performance computing systems.

 

Data analytics is the science of examining raw data for the purpose of drawing conclusions and insight from underlying information. Unsurprisingly, this process relies on applied mathematical methods and approximation algorithms. For example, one might be interested in analyzing two different data sets to possibly identify a correlation between them. The goal then is to infer a relevant and straightforward interpretation using complex, high-volume, and high-dimensional data. The complexity and volume of data have made the exploitation of high-performance computing systems mandatory, thus highly efficient algorithm implementations are crucial to obtaining results in a reasonable amount of time and having an impact in real-world scenarios. 

 

The Gene Golub SIAM Summer School 2019 focused on the synergy between data analytics and high-performance computing systems while highlighting data-driven applications from scientific computing and machine learning.

 

Lecturers, organizers, and students during the first week of the summer school.

The program maintained a balance of academic and industry perspectives to create a multifaceted atmosphere essential for student growth opportunities. Consequently, students were given chances to step out from their comfort zones, challenge themselves, and take granular approaches without disregard for overarching systems. The multi-disciplinary approach characterizing this summer school was, perhaps, the most valuable aspect of the experience. I learned about methods and algorithms that are not commonly applied in my current research but that might have relevant outcomes, such as tensors and tensor decomposition techniques. This multi-disciplinary aspect was also materialized via the students’ backgrounds which encouraged fascinating discussions that could grow into future collaborations and friendships. The poster session that took place on the first evening played an important role in this direction. Students learned about each other’s research at the very beginning, thus having two entire weeks to discuss common interests.

 

Left: Monica Dessole (University of Padova, Italy) during the poster session. Right: Khaled Kelany (University of Victoria, Canada) and Jan-Willem Buurlage (CWI Amsterdam, The Netherlands) during the poster session. Photo credits: Richard Vuduc.

 

The extra-curricular activities, such as the Saturday visit to villages of the Rhône-Alpes, hikes, and after-dinner games, were great opportunities for team-building. I cannot speak for everyone, but I am sure most students would agree that summer school should have lasted another week.

 

Caption: Lecturers, organizers, and students visiting Fort Victor Emmanuel and typical villages around Aussois during the weekend. Photo credits: Richard Vuduc.

 

During the first week, Professor Haesun Park from Georgia Tech and Dr. Jack Poulson from Hodge Star Scientific focused on methods and algorithms for analyzing large volumes of data (e.g. low-rank approximation, dimension reduction, and determinant point processes). More precisely, Prof. Park talked about her work on symmetric non-negative matrix factorization (NMF) and hybrid methods for clustering and community detection as well as hierarchical methods and joint NMF for information fusion, semi-supervised clustering, and hypergraph. Dr. Poulson talked about determinant point processes (DPPs) and solvers for Gaussian low-rank inference and how his company applies these concepts to recommender systems. 

 

The following week, Dr. Michele Dolfi from IBM introduced languages and tools for implementing those algorithms on large scale computers, such as OpenMP, MPI, Docker, and Kubernetes. Dr. Peter Staar, also from IBM, provided an overview of Deep Learning (DL) techniques and how his team applies them for text recognition. Finally, Dr. Tammy Kolda from Sandia National Laboratories taught tensor decomposition and provided examples of possible applications, such as neurosciences. More specifically, Dr. Kolda discussed alternating least squares (ALS) for canonical polyadic (CP) tensor decomposition (CP-ALS), direct optimization for CP (CP-OPT), missing data (CP-WOPT), and generalized CP for binary, count, and non-negative data using direct optimization and stochastic gradient descent (GCP-OPT).

Each day was composed of laboratories and theoretical lectures. The “hands-on” sessions were crucial for fixing the theoretical notions in mind and learning how they can be applied in research. Jan-Willem Buurlage, a Ph.D. student from CWI Amsterdam, made his laboratory solutions available on GitHub and all lecture materials can be found here.

 

Left: Nurbek Tazhimbetov (Stanford University, USA) explaining his solution during one of the lab sessions. Right: Group of students discussing during a lab session. From the left: Tim Werthmann (Aachen University, Germany), Ruhui Jin (University of Texas at Austin, USA), Anna Rörich (University of Stuttgart, Germany), and Carla Schenker (Simula Metropolitan Center for Digital Engineering, Norway). Photo credits: Richard Vuduc.

 

The Gene Golub SIAM Summer School 2019 has been one of the best experiences of my Ph.D. so far. I have new, fascinating knowledge that I am thrilled to apply to my research and new, exceptional colleagues that I hope to preserve forever. I want to thank the organizers, Laura Grigori (Inria and Sorbonne University), Matthew Knepley (University at Buffalo), Olaf Schenk (Università della Svizzera Italiana), and Rich Vuduc (Georgia Institute of Technology), for a great job in designing this summer school. 

 

Giulia Guidi is a first-year Ph.D. student in Computer Science at UC Berkeley and a graduate research assistant at the Computational Research Division of Lawrence Berkeley National Laboratory advised by Aydın Buluç and Kathy Yelick. Giulia's research is focused on the development of a novel efficient algorithm to de novo assemble genomes using long-read sequencing data. She is collaborating with another student on the distributed memory parallelization of her algorithm. She received both her M.Sc. in Biomedical Engineering (2018) and B.Sc. (2016) in Biomedical Engineering at Politecnico di Milano. Her research interests are High-Performance Computing, Computational Genomics, and Applied Mathematics.
blog comments powered by Disqus