Skip to main content

Subhashis Ghoshal

SG
A headshot of Subhashis Ghoshal standing in front of a gray background.

Statistics

Goodnight Distinguished Professor

Statistics

SAS Hall 4276

919.513.0190 Website

Bio

Subhashis Ghoshal is a Goodnight Distinguished Professor of Statistics and Operations Research at North Carolina State University.

Education

Ph.D. Statistics Indian Statistical Institute 1995

M.S. Statistics Indian Statistical Institute 1990

B.S. Statistics Indian Statistical Institute 1988

Area(s) of Expertise

High and Infinite Dimensional Models, Bayesian Inference, Asymptotic Statistics, Image Processing, Functional Data Analysis and Multiple Hypotheses Testing

Publications

View all publications

Grants

Date: 09/01/22 - 8/31/25
Amount: $180,000.00
Funding Agencies: National Science Foundation (NSF)

Overview: High-dimensional time series data often occur in modern applications to various fields including brain imaging, finance, and satellite images. A suitable lower-dimensional structure in the time series must be utilized for a sensible inference. In this proposal, the observed series will be represented as an appropriate linear combination of several independent stationary latent processes. The individual latent time series are flexible with unspecified spectral densities. Full conditional independence of the process will also be obtained by modeling a sparse spectral density matrix and putting prior on the corresponding objects. The causality of the time series in the temporal sense will be addressed for vector autoregressive processes by characterizing the condition in terms of the coefficients and putting priors accordingly. The causality over nodes will be addressed by a Direct Acyclic Graph and modeling the causal residual process. The causal model will be applied to analyze the resting state of the human brain. In the research team, PI Dr. Subhashis Ghoshal, who will provide expertise on Bayesian nonparametric and graphical models, PI Dr. Anindya Roy will provide expertise on time series analysis, and Dr. Arkaprava Roy will provide expertise on Bayesian computation, modeling, and neuroimaging. Intellectual Merit: The proposed construction of the high-dimensional time series through a set of independent latent processes is novel, and suits very well for the Bayesian framework. Markov chain Monte Carlo methods that take advantage of the decoupling by using the Whittle likelihood. The formulation seamlessly addresses a mixed frequency sampling situation, difficult to incorporate in competing methods. The proposed framework efficiently addresses both temporal and nodal causality respectively by characterization in terms of the Schur-complementation and using a directed acyclic graph, allowing a natural interpretation. The full conditional independence is also addressed by through a deep characterization as a mixture of Markov processes. Broader impacts: The nodal causality structure will be used to analyze the resting state network of the human brain. Other applications include finance, where a high-dimensional time series with an anticipated lower-dimensional structure often appears. The PIs will disseminate results through statistics journals, and talks at various conferences. They will develop free software packages for easy use by future researchers. Graduate students will be trained on Bayesian statistics, time-series modeling, and neuroimaging. The PIs have consistent record of commitment to doctoral student advising, supporting young researchers, and promoting diversity, through graduate student support, obtaining conference grants to fund young researchers' travel to conferences, and involvement in the REU program. The proposed research will continue and strengthen such activities.

Date: 06/01/21 - 5/31/24
Amount: $331,858.00
Funding Agencies: US Army - Army Research Office

Advances in technology have resulted in massive datasets collected from all aspects of modern life such as internet search, mobile apps, social networking, cloud-computing, wearable devices as well as from more traditional sources such as bar-code scanning, satellite imaging, air traffic control, banking and finance, and genomics. The complexity of such data warrants the use of flexible models involving many parameters. New challenges arise in the computation and statistical analysis of such data. Finding a lower-dimensional structure in the data is key to analyzing these complex data. Specifically, in regression problems with a large number of predictors, selection of relevant variables, making accurate inference and prediction and quantifying uncertainty are problems of interest. Learning the structure of a graph in a graphical model describing dependence among variables conditional on other variables is extremely important. Bayesian methods have the ability to naturally quantifying uncertainty in prediction and structure learning along with estimates, which is very desirable. However Bayesian methods can be computationally intensive and their theoretical properties are less understood. Recently, Bayesian methods for finding lower-dimensional structure have seen a lot of interest, and their properties are being studied theoretically. Questions that arise are whether the posterior distribution contracts near the true values of the parameters at the minimax optimal rate, and whether the correct lower-dimensional structure is selected with high posterior probability. Further of special interest is to know if a credible region constructed from the posterior distribution has adequate coverage in the frequentist sense. Results of such types can identify good Bayesian methods, potentially detect possible pitfalls and can also guide to appropriate choices of the prior distributions to avoid undesirable properties. In this proposed research, results on properties of posterior distributions will be extended to cover a wide variety of models useful in practical applications such as high dimensional linear models with added complications like measurement error, missing values, generalized linear models, graphical models with measurement errors, hidden Gaussian graphical models, exponential trace-class models, hub-and-spoke graphical models, multiple change-point models and recovery of functional signals varying over a graph. New technical tools, as well as computer packages for ready use, will be developed through the proposed research. Graduate students will be trained and will write doctoral theses. The methods will be applied to datasets potentially interesting for defense applications.

Date: 08/01/19 - 7/31/23
Amount: $200,000.00
Funding Agencies: National Science Foundation (NSF)

In many contexts of statistical modeling, the shape of a function used in modeling plays a key role. Shape restrictions like monotonicity, convexity, log-concavity or unimodality may arise naturally. For instance, monotone functions are used in modeling the melting of the Arctic ice sheet and the rising sea levels under climate change. Many inverse problems such as deconvolution, or estimation under censoring also lead to shape restrictions on the concerned functions. Shape restricted inference has been studied well from the maximum likelihood perspective, but Bayesian methods have been less developed. In the Bayesian approach, additional information in the form of the qualitative shape restriction may be naturally blended in the prior. Uncertainty in the concerned functions can be quantified by Bayesian credible regions, which are relatively easy to obtain from posterior sampling. The frequentist coverage of such sets is important to know. In this proposal, a new computationally advantageous Bayesian approach based on a ``projection posterior'' will be adopted, which will also be easier to analyze theoretically. Suitable priors for shape restricted inference such as those obtained from step functions and B-splines series will be developed for both univariate and multivariate shape restrictions, and the projection posterior will be studied. Local and global posterior contraction rates will be established. Asymptotic frequentist coverage of Bayesian credible intervals for a regression or density function at a point under monotonicity or other shape constraints will be obtained. A recalibration step will be used to adjust the coverage to meet a targeted value. Asymptotically optimal and computationally advantageous Bayesian tests for shape restrictions will be developed. Results will be extended to other types of univariate shape restrictions like convexity or log-concavity and to multivariate monotonicity and convexity settings in regression, density estimation, and survival analysis. The methods developed will be applied in diverse contexts including climate change and medical data. The proposed research may open up a completely new path for the Bayesian approach in shape-restricted inference and reconcile Bayesian and frequentist uncertainty quantification under shape restriction and may serve as a seed for further development in the years to come. The results will be applied in various fields of interest. The proposed research, apart from developing new ideas, methods and computational techniques for answering related mathematical questions, will provide a significant impact on making decisions in various application such as climate change, tumor size monitoring, and censored data. Research findings will be disseminated through arXiv preprints, journal publications, talks in conferences and various institutions and through special topics courses. The software will be developed and distributed for free through CRAN and PI's website. The PI is highly committed to doctoral student advising and promoting diversity, especially from women and underrepresented groups. Twenty-five doctoral students already graduated and five are currently working with him. The PI's NSF grants also supported his doctoral students to travel to conferences. The PI also has the track record of promoting the representation of women and minorities through the conference support grants he obtained. In total 21 women researchers and 4 from under-represented groups and many young U.S. participants were supported. The PI will continue promoting diversity in research related to this proposal.

Date: 07/01/15 - 8/31/19
Amount: $240,000.00
Funding Agencies: National Science Foundation (NSF)

Statistical data in modern context appear in increasing size, form and complexity such as images, videos, functions, trees from diverse sources including barcodes, internet searches, social networks, mobile devices, satellites, genomics, medical scans etc. Size of such data sometimes reach terabytes and are invariably very high dimensional. Nevertheless, such data typically have lower dimensional structures within them. For instance, in a regression model, only a handful of predictors may have effect, or a covariance matrix or its inverse may have a large number of zeros in the off-diagonal positions. This valuable sparsity allows valid model based inference even with comparatively lower sample size. Recent years have seen a surge in research on analysis of high dimensional data, but an overwhelming majority of them use non-Bayesian techniques. Bayesian methods use qualitative information on sparsity in the form of a prior, and automatically provide a measure of model uncertainty in inference. However, theoretical and computational challenges remain. Commonly used approaches such as those based on Markov chain Monte Carlo (MCMC) methods do not scale well in the high dimensional setting, especially when a large number of models are involved in the analysis, since the Monte Carlo runs can only cover a limited area of the huge model space. Newer methods such as continuous shrinkage and those involving asymptotic posterior approximations are emerging to handle these problems. Convergence properties of the Bayesian procedures and frequentist interpretation of the corresponding uncertainty quantification are extremely challenging in the high dimensional context. Only a very few recent research have attempted to answer some of these unanswered questions. The proposed research will have all round involvement in theory, computation and application concerning Bayesian analysis of high dimensional data of various types. Both parametric and nonparametric models will be considered and important issues of estimation, prediction, clustering and assessing model uncertainty will be addressed for a variety of data types including graphs, networks, pathways and trees. Techniques of prior construction, scalable computation and uncertainty quantification will be developed and study of frequentist convergence properties of the resulting procedures will be initiated. Construction of a prior distribution which will lead to computable posterior and desirable posterior convergence properties is extremely delicate in nonparametric and high dimensional models. The past research of the PI supported by his last three NSF grants and results of some other researchers identified some flexible families of default priors which lead to desirable properties of the posterior distributions for nonparametric models in terms of computation and convergence. Many of these ideas can be used in analyzing high dimensional models. The proposed research will connect various concepts together and synthesizes into a powerful Bayesian approach for analyzing high dimensional data appropriate for subject specific and interdisciplinary research in STEM disciplines. Proposed methods will be applied on real datasets arising in different contexts. The proposed research will have significant impact on studying relations between variables in human brain development, gene-pathway analysis and other applications. Computational packages will be developed and will be posted to allow users free access to them. Results will be disseminated through articles, seminars and talks given at various places. The forthcoming 10th Conference on Bayesian Nonparametrics in Raleigh in June 2015, for which the PI is the chair of the local organizing committee, will be instrumental in exchanging ideas and developing collaboration. The educational component of the proposal will impact human resource development in the form of graduate student advising and offering of special topics courses. The PI is committed to involving female students and students from under-represented groups to promote diversity.

Date: 06/01/15 - 5/31/16
Amount: $9,199.00
Funding Agencies: US Navy-Office Of Naval Research

Bayesian nonparametrics has evolved as one of the fastest growing areas of research in modern statistics. Its applications areas include genetics, finance, survival analysis, sociology, networks and machine learning. The 10th Conference on Bayesian Nonparametrics is going to be held in Raleigh, NC, from June 22 to 26, 2015. Bayesian nonparametrics is at the cutting edge of research in Bayesian statistics, mathematical data sciences and machine learning. The conference is the most important meeting of researchers working in theory, methodology and all types of applications of Bayesian nonparametrics all over the world. This proposal seeks funding to support registration and local hospitality of some invited speakers and junior researchers working in U.S. institutions (graduate students, postdoctoral researchers and junior faculty generally within three years of completion of their terminal degree) to participate in the conference. Participation in this meeting is critical for junior researchers working in this area. The primary objective of this conference is to bring together experts and young researchers, as well as theoreticians and practitioners, who use Bayesian nonparametric techniques. The conference is supported by the International Society for Bayesian Analysis and the Institute of Mathematical Statistics. The conference has a well-structured balanced program covering various areas of the subject. The scientific committee of the conference consists of renowned international experts on Bayesian nonparametrics and related topics. The meeting will include four overview plenary talks, twenty four invited talks, several contributed talks and two large contributed poster sessions. Many of the invited speakers including a plenary speaker are women. Providing support for junior researchers who do not have access to other sources of funding to attend the most important international gathering of scientists working on one the fastest growing areas of statistical sciences is key to maintaining the current leadership of American institutions in this field. These conferences in the past were always characterized by a congenial atmosphere particularly supportive of junior researchers. The conference will include a series of activities especially designed to maximize the active participation of young researchers and to provide them with many opportunities for interaction with other young researchers and with more senior colleagues. The conference will also provide American researchers opportunity to exchange ideas with leading researchers from elsewhere in the world such as Europe, Asia and Latin America. In addition, the conference will provide opportunities for young researchers to disseminate widely the results of their work, not only through contributed talks and posters, but also by facilitating the publication of peer-reviewed papers and a proposed special issue of a leading statistics journal. The extensive poster session and some slots for contributed talks are especially reserved for young researchers. Women and minorities are highly encouraged to take part in the conference by providing them registration and local support obtained through this grant proposal.

Date: 05/01/15 - 4/30/16
Amount: $10,000.00
Funding Agencies: US Army - Army Research Office

Bayesian nonparametrics has evolved as one of the fastest growing areas of research in modern statistics. Its applications areas include genetics, finance, survival analysis, sociology, networks and machine learning. Bayesian nonparametrics is at the cutting edge of research in Bayesian statistics, mathematical data sciences and machine learning. The conference is the most important meeting of researchers working in theory, methodology and all types of applications of Bayesian nonparametrics all over the world. The conference is supported by the International Society for Bayesian Analysis and the Institute of Mathematical Statistics. The conference has a well-structured balanced program covering various areas of the subject. The scientific committee of the conference consists of renowned international experts on Bayesian nonparametrics and related topics. The meeting will include four overview plenary talks, twenty four invited talks, several contributed talks and two large contributed poster sessions. These conferences in the past were always characterized by a congenial atmosphere particularly supportive of junior researchers. The conference will include a series of activities especially designed to maximize the active participation of young researchers and to provide them with many opportunities for interaction with other young researchers and with more senior colleagues. This proposal seeks funding to pay registration fee and provide accommodation for some invited speakers many of whom are young researchers. It is customary in Bayesian nonparametrics meeting to provide such hospitality to invited speakers. Organizing this conference is key to maintaining the current leadership of American institutions in this field. The conference will also provide American researchers opportunity to exchange ideas with leading researchers from elsewhere in the world such as Europe, Asia and Latin America. In addition, the conference will provide opportunities for young researchers to disseminate widely the results of their work, not only through contributed talks and posters, but also by facilitating the publication of peer-reviewed papers and a proposed special issue of a leading statistics journal. The extensive poster session and some slots for contributed talks are especially reserved for young researchers.

Date: 05/01/15 - 4/30/16
Amount: $15,000.00
Funding Agencies: National Science Foundation (NSF)

Bayesian nonparametrics has evolved as one of the fastest growing areas of research in modern statistics. Its applications areas include genetics, finance, survival analysis, sociology, networks and machine learning. The 10th Conference on Bayesian Nonparametrics is going to be held in Raleigh, NC, from June 22 to 26, 2015. Bayesian nonparametrics is at the cutting edge of research in Bayesian statistics, mathematical data sciences and machine learning. The conference is the most important meeting of researchers working in theory, methodology and all types of applications of Bayesian nonparametrics all over the world. Thisproposal seeks funding to support registration and local hospitality of some invited speakers and junior researchers working in U.S. institutions (graduate students, postdoctoral researchers and junior faculty generally within three years of completion of their terminal degree) to participate in the conference. Participation in this meeting is critical for junior researchers working in this area. The primary objective of this conference is to bring together experts and young researchers, as well as theoreticians and practitioners, who use Bayesian nonparametric techniques. The conference is supported by the International Society for Bayesian Analysis and the Institute of Mathematical Statistics. The conference has a well-structured balanced program covering various areas of the subject. The scientific committee of the conference consists of renowned international experts on Bayesian nonparametrics and related topics. The meeting will include four overview plenary talks, twenty four invited talks, several contributed talks and two large contributed poster sessions. Many of the invited speakers including a plenary speaker are women. Providing support for junior researchers who do not have access to other sources of funding to attend the most important international gathering of scientists working on one the fastest growing areas of statistical sciences is key to maintaining the current leadership of American institutions in this field. These conferences in the past were always characterized by a congenial atmosphere particularly supportive of junior researchers. The conference will include a series of activities especially designed to maximize the active participation of young researchers and to provide them with many opportunities for interaction with other young researchers and with more senior colleagues. The conference will also provide American researchers opportunity to exchange ideas with leading researchers from elsewhere in the world such as Europe, Asia and Latin America. In addition, the conference will provide opportunities for young researchers to disseminate widely the results of their work, not only through contributed talks and posters, but also by facilitating the publication of peer-reviewed papers and a proposed special issue of a leading statistics journal. The extensive poster session and some slots for contributed talks are especially reserved for young researchers. Women and minorities are highly encouraged to take part in the conference by providing them registration and local support obtained through this grant proposal.

Date: 06/01/11 - 5/31/15
Amount: $250,000.00
Funding Agencies: National Science Foundation (NSF)

This proposal establishes a comprehensive framework for finding meaningful structures in the analysis of object data following a Bayesian approach. By object data, we mean data types which go beyond univariate and multivariate data. Images, sets, functional data, shape data, random graphs and trees are common instances of object data. In this proposal, we shall mainly study images, functional data and set data. The project will provide definitive guidelines for constructing prior distributions on the parameters controlling the distributions of object data, and obtain the resulting posterior distributions in an efficient manner. Since the distribution of an object data is fairly complex due to the complex nature of the data, finding meaningful structures is important. The structure may come in the form of sparsity, setting many parameter values to a null value like zero, or by making adjacent values equal, thus effectively reducing the number of parameters to handle. Moreover, the structure describes qualitative features of the data very well. In the context of images, finding a structure means finding meaningful objects in the image in presence of noise and other aberrations. In a Bayesian setting, this may be accomplished by encouraging the underlying values of the neighboring intensities to be equal with positive probability through an appropriate prior specification. The main idea of the proposed research is to use an auxiliary stochastic process such as the Chinese restaurant process or the Indian buffet process to control the ties in intensity parameters of images or coefficients in basis expansions of functional data. The proposed method not only is able to produce clusters, but also determine to what extent a pair of values will be tied up. The proposed research will have significant impact on processing of astronomical images or medical scans. The educational component of the proposal includes graduate student advising and offering of special topics courses.

Date: 01/27/12 - 9/30/14
Amount: $68,998.00
Funding Agencies: National Security Agency

"Variable selection in linear models is a major statistical issue in contemporary data analysis because modern data typically involve a lot of predictors, many of which are nearly irrelevant. Such a sparse structure of the regression function actually allows us to estimate the regression function fairly accurately even when the number of predictors far exceeds the number of available observations. Removing irrelevant variables from the predictive model is essential since presence of too many variables may cause overfitting and multicollinearity, which leads to poor prediction of future outcomes. Moreover, the presence of too many variables in the regression function makes the relation hard to interpret. In this project, we focus on variable selection for challenging situations where the number of predictors p is much larger than the size of the available sample n, which can be referred to as ``large p small n problems'' or ``high dimensional problems''. High dimensional data is more and more encountered in the real world, due to the rapid advance of scientific technologies and computer power, such as in image data analysis, biomedical data, and financial data. However, the curse of dimensionality makes it hard for both model estimation and inferences. Other main difficulties associated with high dimensional modeling are that: the predictors are necessarily correlated when p is greater than n, and the computation time is typically prohibitive. In this work, we propose a new class of statistical framework for building sparse and highly predictive models for high dimensional data. A recursive method is suggested to identify important variables effectively, and the new scheme is quite general and can be applied to various problems. We plan to study the performance and properties of the new estimator under for four important statistical contexts: linear regression, classification, generalized linear regression, and learning with multiple-type data. Preliminary study suggests that, compared to existing approaches, the new methods is more accurate in variable selection and computationally efficient when the dataset is massive."

Date: 04/01/13 - 3/31/14
Amount: $20,000.00
Funding Agencies: National Science Foundation (NSF)

9th Conference on Bayesian Nonparametrics (BNP 2013), Amsterdam, The Netherlands, June 10--14, 2013 This proposal seeks funding to support junior researchers currently working in U.S. institutions (graduate students, postdoctoral researchers and junior faculty generally within three years of completion of their terminal degree) to participate in the 9th Conference on Bayesian Nonparametrics, to be held in Amsterdam, The Netherlands, from June 10 to 14, 2013. The conference is the most important meeting of researchers working in theory, methodology and all types of applications of Bayesian nonparametrics all over the world. Participation in this meeting is critical for junior researchers in this area. he primary objective of this conference is to bring together experts and young researchers, as well as theoreticians and practitioners, who use Bayesian nonparametric techniques. The conference is supported by the International Society for Bayesian Analysis (ISBA). The conference has a well-structured balanced program covering various areas of the subject. The scientific committee of the conference consists of renowned international experts on Bayesian nonparametrics and related topics. The meeting will include four overview plenary talks, forty-two invited talks, six contributed talks and a contributed poster session. Many of the invited speakers including a plenary speaker are women. Intellectual Merit: Bayesian nonparametrics (BNP) has evolved as one of the fastest growing areas of research in modern statistics. The advent of powerful computing methods like Markov chain Monte-Carlo rapidly developed since the mid-nineties, landmark results on convergence properties of posterior distributions studied in the last fifteen years and construction of elegant stochastic processes suitable for modeling complex random structures have made BNP methods remarkably accessible and attractive for researchers. BNP methods combine the advantages of Bayesian modeling (e.g., ability to incorporate prior information, full and exact inference for any sample size, ready extensions to hierarchical settings) with the appeal of nonparametric inference (e.g., limited number of modeling assumptions). Applications include areas as diverse as genetics, finance, survival analysis, sociology, networks and machine learning. The strong growth of BNP led to the formation of the BNP chapter within the ISBA. The conference being organized is the 9th in a series of (mostly) biannual meetings held internationally since 1997, and is the premier venue for dissemination of research in Bayesian nonparametrics. Broader Impacts: Providing support for junior researchers who do not have access to other sources of funding to attend the most important international gathering of scientists working on one the fastest growing areas of statistical sciences is key to maintaining the current leadership of American institutions in this field. BNP workshops in the past were always characterized by a ongenial atmosphere particularly supportive of junior researchers. The conference will include a series of activities specially designed to maximize the active participation of young researchers and to provide them with many opportunities for interaction with other young researchers and with more senior colleagues. The conference will also provide American researchers opportunity to exchange ideas with leading researchers from elsewhere in the world such as Europe, Asia and Latin America. In addition, the conference will provide opportunities for young researchers to disseminate widely the results of their work, not only through contributed talks and posters, but also by facilitating the publication of peer-reviewed papers. The extensive poster session and some slots for contributed talks are especially reserved for young researchers. Women and minorities will be particularly encouraged to apply for the travel support.


View all grants