About the Author

Catching Social Butterflies

By Jonathan Popa

A social network is a graph that models a population with social relations. The vertices of the network represent individuals, and edges represent social links between two individuals; these social links can be relationships between family members, friends, co-workers, or other parties. Diffusion modeling across social networks has applications in various sciences, such as predicting the spread of contagious diseases and developing marketing strategies. In order to create a social communication network, researchers need data to understand how people interact.

I participated in a study that examined a dataset from the social networking service Meetup to analyze online and offline networks and to study the behavior of social butterflies, the most socially active individuals. Meetup allows users to create online groups and RSVP to offline events. The resulting dataset consists of user interaction data in both online and offline settings. For this study we filtered a Meetup dataset for six U.S. cities: Chicago, Dallas, Los Angeles, Miami, New York, and Philadelphia. For each city, we first filtered residents via the longitude/latitude data of users. Then we constructed an online and offline network for each city. For the sake of simplicity, we used the assumption of complete mixing when connecting users in the on and offline networks, meaning that all users interact with all other users present. This assumption results in an online network formed by creating an edge between users who share membership in an online group. Analogously, the offline network is formed by creating an edge between users who attend the same events.

When comparing the degree distributions of the cities, we observed that the shape of each distribution was similar; however, the online networks exhibit some noticeable scattering. In our analysis of Meetup users’ social activity and interaction patterns, we next examined the metric of active membership. A user has active membership in a group if he/she attends an offline event hosted by the group. Since participation in events is not required for users to join online groups, the active membership metric provides a comparison of online and offline user behavior. From our statistical analysis, we observed that there were less users with many active memberships than users with many group memberships. This result seems natural, considering the ease of joining an online group compared to traveling to events.

In the interest of identifying social butterflies, we measured each user’s social activity by the number of events attended. From analyzing the percentiles of user attendance, we observed that—in contrast to the majority of users (who have attended less than three events)—the social butterflies have attended hundreds of events. To study clustering of social butterflies, we measured event activity by defining an event score (ES) and robust event score (RES). The ES is defined as the mean of users’ social activity measure, and the RES is the median of the users’ social activity measures. Figure 1 compares the 95th percentile of ES and RES versus attendance for each of the six aforementioned cities in this study. We observed that events with less attendees were more likely to have higher event scores than large events. This indicates that small events are attended by a relatively large number of social butterflies.

Figure 1. Event score (ES) versus attendance and robust event score (RES) versus attendance.

In the future, we can conduct further analysis in a variety of directions. Evaluating event scores with respect to particular interest groups and examining evolution of network attributes in time would be of interest. Additionally, we could make comparisons between other online social media datasets, such as those from Twitter and Facebook, with Meetup via graph matching procedures to provide a deeper overall understanding of social network structures.

Acknowledgments: Kusha Nezafati, Yulia R. Gel, John Zweck (all of the Department of Mathematical Sciences at the University of Texas at Dallas), and Georgiy Bobashev (RTI International) co-authored this research. The research presented here was conducted as part of the Enriched Doctoral Training Program, funded by the National Science Foundation.

 Jonathan Popa is a Ph.D. student at the University of Texas at Dallas studying applied mathematics.