Minh Tang

a person wearing glasses and a checkered button-down shirt, facing the camera against a gray studio background. The person has short dark hair and a neutral expression.

Statistics

Associate Professor

Statistical Network Analysis

SAS Hall 5236

919.515.1923 [email protected]

Bio

Minh Tang is an associate professor in the Department of Statistics at North Carolina State University. He earned a Ph.D. in Computer Science from Indiana University Bloomington. Before joining NC State, he held research and teaching positions at Johns Hopkins University. His research focuses on statistical pattern recognition, dimensionality reduction and graph-based statistical inference. In particular, he develops methods that help researchers analyze complex networks and large datasets. As a result, his work advances statistical learning and data science. He has published widely in leading journals in statistics and machine learning. In addition, he has received research support from the National Science Foundation, DARPA and Microsoft Research. He also teaches courses in probability, statistical inference, data science and graph analytics while mentoring graduate students and early-career researchers.

Education

Ph.D. Computer Science Indiana University, Bloomington 2010

M.S. Computer Science Univeristy of Wisconsin-Milwaukee 2004

B.S. Computer Science Assumption University 2001

Area(s) of Expertise

Minh Tang specializes in statistical pattern recognition, dimensionality reduction, and statistical inference on graphs. He develops methods that identify patterns in complex datasets and improve data analysis. In addition, he creates techniques that simplify high-dimensional data while preserving important information. He also studies graph-structured data to uncover relationships and support reliable statistical conclusions. As a result, his work helps researchers better understand and analyze large, complex networks.

Publications

An omnibus embedding of multiple random graphs and implications for multiscale network inference , Electronic Journal of Statistics (2026)
Nonparametric two-sample hypothesis testing for low-rank random graphs of differing sizes , Electronic Journal of Statistics (2026)
Out-of-Sample Embedding with Proximity Data: Projection Versus Restricted Reconstruction , Journal of Computational and Graphical Statistics (2026)
Perturbation Analysis of Randomized SVD and its Applications to Statistics , Journal of the American Statistical Association (2026)
Chain-Linked Multiple Matrix Integration via Embedding Alignment , Journal of the American Statistical Association (2025)
Eigenvector fluctuations and limit results for random graphs with infinite rank kernels , arXiv (Cornell University) (2025)
Novel network trimming for robust vertex nomination in contaminated networks , Electronic Journal of Statistics (2025)
Chain-linked Multiple Matrix Integration via Embedding Alignment , arXiv (Cornell University) (2024)
Regression for matrix-valued data via Kronecker products factorization , arXiv (Cornell University) (2024)
A Theoretical Analysis of DeepWalk and Node2vec for Exact Recovery of Community Structures in Stochastic Blockmodels , IEEE Transactions on Pattern Analysis and Machine Intelligence (2023)

View all publications

Grants

Date: 08/01/22 - 7/31/25

Amount: $150,000.00

Funding Agencies: National Science Foundation (NSF)

Accurate statistical inference on large, complex networks is a vitally important, inter-disciplinary research area that has witnessed exponential growth over the last several years, ranging from the construction of a plethora of random graph models themselves to a host of approaches for inference of graph model parameters Nevertheless, many graph estimation techniques are somewhat ad-hoc: maximum likelihood estimates for certain exponential random graph models, for instance or spectral methods for combinatorial graph analysis. But a mere regression coefficient here, a parametric estimate there, a clustering here, and an upper bound there do not constitute a unified, parsimonious approach to random graph inference. Thus the synthesis of disparate models and methods into a more comprehensive and familiar paradigm for graph inference is both necessary and welcome. This proposal address the need for such foundational approach to graph inference. We focus on the development of a unified spectral framework for mathematical statistics on graphs, itself inspired by cornerstones of classical Euclidean inference. In particular, for random graphs with independent edges, we use low-rank approximation of their adjacency matrices to build estimates of underlying model parameters. We then systematically address the graph-inferential analogues of the central tenets of Euclidean inference: consistency of estimators; asymptotic normality or appropriate limit distributions of estimators; asymptotic relative efficiency and optimality; one-, two- and multi-sample graph hypothesis testing; and robustness.

Date: 04/11/20 - 4/10/21

Amount: $19,795.00

Funding Agencies: Defense Advanced Research Projects Agency (DARPA)

This proposal aims to develop methodologies for automated inference in high-dimensional and complex data. The proposal is part of the D3M (Data Driven Discovery of Models) program in which we have just raw data as input and we need to discover primitives -- simple yet robust and agile procedures that can be easily combined to form sophisticated framework/methodologies -- and generate models for presentation to domain experts for feedback & selection, all of this done without a data scientist assistance. As an example of the applicability such a framework, consider our experience with linear regression where there is a well-understood pipeline to take multivariate linear regression data and automatically generate plots and diagnostics that assist the non-expert user. For thir proposal we will consider datasets such as (a) multivariate time series together with event-of-interest time points (t1,t2,�� ,tn), (b) multispectral imagery together with event-of-interest locations (x1, x2,�� , xn), and (c) a relational network together with event-of-interest nodes (v1, v2,�� , vn). We will first develop methodologies to automatically discover primitives for these type of data. We will then develop methodologies to automatically compose these discovered primitives into a collection of models for performing subsequent inference. The final result is a discoverable archive of data modeling primitives, procedures for automatic selection of primitives, and frameworks for composition of primitives into complex modeling pipelines.

View all grants

Minh Tang

Bio

Education

Area(s) of Expertise

Publications

Grants

Groups

Tags

Find NC State websites, locations and people

MyPack Portal

University Libraries

Academic Calendar

Majors and Careers