| October 01, 2019

Exploring New Avenues for Engagement

Data Mining in/for Africa at SDM19

The 2019 SIAM International Conference on Data Mining (SDM19), which took place earlier this year in Calgary, Canada, introduced a first-of-its-kind simulcast workshop on data mining in Africa. The workshop was simultaneously livestreamed between Nairobi, Kenya—home of IBM Research-Africa—and Calgary, and supported multiple other locations since workshop attendees were co-located in Japan, the U.K., and the U.S. The event featured several research sessions, a keynote presentation, and two panels.

The organizing committee consisted of Reginald E. Bryant, Sekou L. Remy, Tonya Nyakeya, and Evalyn Kemunto of IBM. Attendees at the Nairobi site represented various organizations from academia and industry, and traveled from as far away as Uganda. Prior to the simulcast portion, they networked and experienced two on-site demonstrations, courtesy of Remy and Joreen Arigye of Fenix International.

IBM researcher Aisha Walcott officially opened the event with a brief discussion about her career as a scientist and her recent work in applying artificial intelligence to Africa’s challenges. Featured speaker Michael Gitau of the Universitat Autònoma de Barcelona then presented his research with autoencoder data representation — a machine learning/deep learning technique. Using data from 220 patients with end-stage renal disease, Gitau reconstructed the condition’s temporal evolution to predict patient mortality.

A Panel of Domain Experts

During the “Panel of Domain Experts” segment of the workshop, discussion ensued in the areas of finance and healthcare — fields in which researchers are currently applying data science in East Africa. The two panelists, Moses Alobo of the African Academy of Sciences and John Olukuru of Strathmore University and @iLabAfrica, acquired their expertise by overseeing and commissioning data science projects.

Aisha Walcott of IBM Research delivers the opening address at the simulcast workshop on data mining in Africa, which took place in Nairobi, Kenya and was livestreamed during the 2019 SIAM International Conference on Data Mining in Calgary, Canada, earlier this year. Photo courtesy of Tonya Nyakeya.

In the open-question portion of the panel, Alobo highlighted a particular instance in which patient outcomes incentivize the collection and sharing of genetic health data. He referenced a dataset presently being amassed by the Human Heredity & Health in Africa consortium. Large western pharmaceutical companies can currently only collect genetic data with small variability from people of African descent (the data is closely tied to West Africa due to historic circumstances). Such limited data becomes problematic when doctors engineer and globally administer personalized drugs. Since this information does not account for the genetic variability on the African continent, individualized, gene-based drugs may have unintended consequences.

As an example, Alobo spoke about captopril, an antihypertension medication administered to a particular population in Kenya. While Western studies indicated that black patients require higher doses of captopril to manage hypertension (compared to their white counterparts), those higher doses proved to be too high for Kenyans and unintentionally initiated hypotension. While the aforementioned situation is just one case, other adverse drug reactions could occur if research studies continue to draw from non-representative data samples.

Alobo’s comments also exposed incentives for individuals and companies in the healthcare sector to share data for mutual benefit. However, he urged caution when establishing sharing mechanisms; people own their genetic data and should receive proper remuneration when pharmaceutical companies leverage said data to create medications that may ultimately be sold back to the patients.

Spurred by an audience question, the panel then broached the topic of politics in regards to technology concerning societal good. An attendee expressed frustration at resistance to the development of a water-credit technology for the disadvantaged community in Kibera, Kenya amidst a shifting political climate. Olukuru acknowledged the challenges of interacting with government officials, and encouraged individuals to be patient and work on convincing decision-makers of the technology’s value. He further emphasized the importance of establishing trust as a technology producer; one must demonstrate and prove rather than simply build and tell.

Specific to Kenya, innovation that uses or incorporates assets from large companies like Safaricom (a major mobile network operator) can bias officials against startup technology. Without proper differentiation, the tools offered by startups may be indistinguishable from Safaricom technologies.

A Panel of Data Science Practitioners

“A Panel of Data Science Practitioners” convened following hors d’oeuvres and one-on-one conversations among Kenyan participants. Panelists included Leonida Mutuku of Intelipro, Samuel Kamande of Ajua (formerly mSurvey), and Chris Orwa of I&M Bank Kenya.

In response to the opening question that asked whether financial institutions in emerging markets are investing their energy into “front offices” (financial/loan products) or “back offices” (customer due diligence), Orwa offered insight into the operations of local and correspondent banks in East Africa. Most banks in the region are currently focused on improving back-office operations with machine learning and data science, rather than concentrating on front-office operations like customer experiences and new product offerings.

Kamande took the audience on Ajua’s naturally-evolving data science journey. Initially established as a way to collect data from disenfranchised, often-overlooked communities, the company has since grown into a customer management platform that effectively serves the needs of consumers throughout the socioeconomic strata in Kenya and other emerging economies around the globe.

Mutuku stressed the importance of data science education. Future competitive data scientists must have complementary business skills and be part of a team that is capable of building a machine learning/data science pipeline to address real industry issues, not just fleeting desires.

Final Observations

Late into the evening hours as the workshop drew to a close, people continued to arrive. It is worth noting that the event was held on the day after a state holiday, and traffic from town was at an all-time high.

Many participants would likely agree that this pioneering workshop was a learning experience for the organizers, attendees, and SDM19 organizing committee. During several points throughout the program, vibrant conversations resonated with themes that resurfaced from SDM19’s agenda.

Moving forward, an additional goal for the workshop is to stimulate communities of practice and learning, and allow distributed participants to contribute more seamlessly to the meeting objectives. We hope to explore approaches that will accomplish these tasks more effectively and support increasingly personal interactions. However, the ball is rolling; awareness is the step towards this aim. As members of the organizing committee, we were happy to facilitate one of three featured workshops at SDM19, but even more excited to report on developments in Africa on the world stage.

Reginald E. Bryant and Sekou L. Remy are research scientists at IBM Research-Africa. They were members of the organizing committee for the Simulcast Workshop on Dating Mining in/for Africa at the 2019 SIAM International Conference on Data Mining.