Skip to main content

Monday, August 4 – Friday, August 8
University of North Carolina at Chapel Hill

Overview

The research collaboration workshop, “Women in Data Science and Mathematics”, WiSDM 2025, will be held on the UNC-Chapel Hill campus on August 4-8, 2025. This workshop builds on successes of previous workshops such as WiSDM 2017 and 2019 held at the Institute for Computational and Experimental Research in Mathematics at Brown University and WiSDM 2023 held at the Institute for Pure and Applied Mathematics at UCLA.

Attendees will engage deeply with interdisciplinary, data-oriented research, tackling real-world challenges that demand collaborative, cross-disciplinary expertise. Participants will have the opportunity to work closely with experts from fields such as statistics, optimization, machine learning, and domain-specific areas, fostering a dynamic exchange of ideas.  

This one-week workshop consists primarily of time spent in small research groups actively working to solve problems in data science. Participants typically range from senior researchers to early graduate students, collaborating as equals and building relationships centered on shared research interests and complementary research skills. Details of each research project can be found in the Projects tab.  

Participation

Information about the application process and criteria will be posted soon. Applications will be due in January 2025.

All junior and senior researchers who support the mission of WiSDM – fostering an atmosphere that encourages the free expression and exchange of scientific ideas – are welcome to apply.

Organizing Committee:

  • Karolyn Babalola, Booz Allen Hamilton, Chief Engineer at Chief Technology Office AI IMT 
  • Cristina Garcia-Cardona, Los Alamos National Laboratory (LANL), Scientist at Computer, Computational and Statistical Sciences Division 
  • Harlin Lee, UNC-CH, Assistant Professor at School of Data Science and Society 
  • Kathryn Leonard, Occidental College, Professor of Computer Science, President of Association for Women in Mathematics 
  • Yifei Lou, UNC-CH, Associate Professor at the Department of Mathematics & the School of Data Science and Society 
  • Linda Ness, Visiting Researcher at the DIMACS Center Rutgers University, Chief Research Scientist at Vencore Labs (retired) 

2025 Schedule

Monday, August 4

Morning Session
9:00 – 9:30 a.m. Welcome
9:30 – 11:30 a.m. Project Introduction
11:30 a.m. – 1:00 p.m. Lunch
Afternoon Session
1:00 – 5:00 p.m. Individual Group Work

Tuesday, August 5

Morning Session
9:00 – 11:30 a.m. Individual Group Work
11:30 a.m. – 1:00 p.m. Lunch
Afternoon Session
1:00 – 5:00 p.m. Individual Group Work

Wednesday, August 6

Morning Session
9:00 – 11:30 a.m. Individual Group Work
11:30 a.m. – 1:00 p.m. Lunch
Afternoon Session
1:00 – 4:00 p.m. Present Work to Date
4:00 – 7:00 p.m. Outing

Thursday, August 7

Morning Session
9:00 – 11:30 a.m. Individual Group Work
11:30 – 1:00 p.m. Lunch
Afternoon Session
1:00 – 5:00 p.m. Individual Group Work

Friday, August 8

Morning Session
9:00 – 11:30 a.m. Group Presentations
11:30 – 1:00 p.m. Lunch

 

2025 Research Projects

Redistricting is the process by which every ten years, a state is partitioned into zones, called districts, in which elections are held to select representatives. Gerrymandering is the practice of manipulating the lines for redistricting in such a way that overtly advantages a particular political party or disadvantages a class of people. Given the stakes involved in ensuring fair representation, there has been significant interest over the past two decades by mathematicians and other computational scientists to quantify gerrymandering–that is to utilize advanced mathematical, statistical, and computational tools to study notions of fairness broadly construed in electoral redistricting. One dominant approach is ensemble analysis, which utilizes Markov chain Monte Carlo methods to generate large collections of redistricting plans. In this project, we explore the topological and geometric properties of the space of all valid maps.

Optimal transport (OT) studies the most cost-effective way of moving mass from one location to another. In recent years, OT has emerged as a fundamental tool in machine learning and data science and has proved valuable for analyzing single-cell data, enabling more nuanced comparisons of cell populations. This project aims to advance the application of optimal transport (OT) in single-cell biology. We will develop OT-based algorithms for analyzing high-dimensional omics data, focusing on improving cell type classification and trajectory inference. Our approaches will seek to enhance the accuracy of comparisons between cellular populations, potentially revealing new insights into developmental processes and disease progression at the single-cell level. 

Variational Model and Deep Learning are two distinct approaches used in image analysis and computer vision, each with its own strengths and limitations. Variational models minimize energy functionals which encode desired properties like smoothness, boundary sharpness, etc. The models provide a clear interpretation and control over image features by picking one or multiple regularization terms. Variational methods do not require large datasets for training. They work well with limited data and are based on mathematical priors rather than learned features. Instead of relying on handcrafted features, deep learning models learn image features from data during the training process. These models can directly map input images to desired outputs, making the process highly automated and efficient. Deep learning excels in tasks that involve large-scale datasets and complex patterns, outperforming traditional models in most image processing and computer vision tasks. However, DL requires a large amount of labeled data for training, which may not always be available. DL are black box models that lack interpretability in terms of how exactly the model is making its decisions. Training deep models is computationally expensive and requires substantial hardware resources. Without enough data or proper regularization, models may overfit to training data and the results miss regularity. We plan to focus on fusing the strength of variational models and deep learning models for methods that have better mathematical interpretability, less overfitting and better regularity. For example, deep learning models can be used to learn better regularization terms or priors in variational formulations, or variational models can be used to impose constraints on neural networks. 

Diffusion generative models (DGMs) are a re- cent machine learning technique that is able to learn to represent highly complex, high-dimensional data distributions, and are capable of generating or drawing new instances from this distribution. They have been successful in a wide range of applications, including face image generation, denoising, inpainting and medical image reconstruction. In this project we will explore new approaches to incorporate DGMs into a regularized inverse problems framework, to allow for a rapid exploration of the solution space. Furthermore, DGMs offer a natural mechanism for uncertainty quantification (UQ) by allowing the evaluation of a distribution of solutions rather than a single realization. To enhance our understanding of the internal mechanisms of DGMs, we will use probing methods to investigate how DGMs encode data structures, further refining UQ estimations. We will deploy DGMs in imaging inverse problems, such as computed tomography, and evaluate their UQ estimations while comparing their performance with other data-driven model-based methods, including unrolling and plug-and-play. 

Modern computational problems increasingly originate in large-scale regimes where data cannot be read into or held entirely in working memory. In this regime, randomized and iterative methods which use a small portion of the data in each iteration can shine. These methods can offer a small memory footprint and good convergence guarantees. Recently, there has been a significant push to develop and analyze randomized iterative methods for computational problems defined for input data naturally formatted as a tensor – a multi-modal generalization of a matrix. This project will study randomized iterative methods for linear computational problems defined with tensor data and tensor variables, such as the matrix or tensor linear discriminant analysis models. 

The proposed research aims to develop novel techniques for robust manifold estimation and inference in the presence of noisy and high-dimensional data. Manifolds, which offer a low-dimensional representation of complex data, are increasingly used in fields such as genetics, computer vision, medical imaging, and signal processing. For such applications, leveraging the underlying manifold structure is essential for exploratory visualization, scalability, and accurate inference. However, real-world data often suffer from noise, missing values, and measurement errors, which can significantly affect the accuracy and reliability of manifold-based models. Various approaches have been developed to mitigate this challenge including diffusion, manifold fitting methods, projection onto local tangent planes, landmark kernels, shared nearest neighbor graphs, and local averaging. However, how noise impacts manifold learning algorithms is still not well understood, even for the relatively easy case of isotropic Gaussian noise, much less realistic noise models. The proposed research will investigate best practices for denoising and analyze how noise impacts manifold learning algorithms and prediction on noisy manifolds. By integrating tools from differential geometry and statistical learning theory, this project will explore novel mathematical frameworks and scalable algorithms which enhance the robustness of manifold learning methods.