# Department of Statistics

Chair

- Yali Amit

Professors

- Yali Amit
- Mihai Anitescu, Argonne National Laboratory
- Nicolas Brunel
- Paul Fischer, Argonne National Laboratory
- Lars Hansen, Economics
- John Lafferty
- Steven P. Lalley
- Gregory Lawler, Mathematics
- Peter McCullagh
- Mary Sara McPeek
- Per Mykland
- Dan Liviu Nicolae, Medicine
- John Reinitz
- Michael Leonard Stein
- Matthew Stephens
- Stephen M. Stigler
- Ronald Thisted, Health Studies
- Kirk Wolter
- Wei Biao Wu

Assistant Professors

- Jian Ding
- Nina Hinrichs
- Imre Risi Kondor, Computer Science
- Lek-Heng Lim
- Debashis Mondal

Senior Lecturers

- Linda Brant Collins
- Mei Wang

The Department of Statistics offers an exciting and revamped graduate program that prepares students for cutting-edge interdisciplinary research in a wide variety of fields. The field of statistics has become a core component of research in the biological, physical, and social sciences, as well as in traditional computer science domains such as artificial intelligence. In light of this, the Department of Statistics is currently undergoing a major expansion of approximately ten new faculty into fields of Computational and Applied Mathematics. The massive increase in the data acquired, through scientific measurement on one hand and through web-based collection on the other, makes the development of statistical analysis and prediction methodologies more relevant than ever. Our graduate program aims to prepare students to address these issues through rigorous training in theory, methodology, and applications of statistics; rigorous training in scientific computation; and research projects in core methodology of statistics and computation as well as in a wide variety of interdisciplinary fields.

The Department of Statistics offers two tracks of graduate study, one leading to the Master of Science (M.S.) degree, the other to the Doctorate of Philosophy (Ph.D.). The M.S. degree is a professional degree. Students who receive this degree are prepared for nonacademic careers in which the use of advanced statistical and computational methods is of central importance. The program also prepares students for possible further graduate study.

During the first year of the Ph.D. program, students are given a thorough grounding in material that forms the foundations of modern statistics and scientific computation, including data analysis, mathematical statistics, probability theory, applied probability and modeling, and computational methods. Throughout the entire program, students attend a weekly consulting seminar where researchers from across the University come to get advice on modeling, statistical analysis, and computation. This seminar is often the source of interesting and ongoing research projects.

In the second year, students have a wide range of choices of topics they can pursue further, based on their interests, through advanced courses and reading courses with faculty. During the second year, students will typically identify their subfield of interest, take some advanced courses in the subject, and interact with the relevant faculty members. The Department maintains very strong connections to numerous other units on campus, either through joint appointments of the faculty or through ongoing collaborations. Students have easy access to faculty in other departments, which allows them to expand their interactions and develop new interdisciplinary research projects. Examples include joint projects with Human Genetics, Ecology and Evolution, Neurobiology, Chemistry, Economics, Health Studies, and Astronomy.

**Programs and Requirements for the Ph.D.**

All sufficiently well-prepared students take 3 of 4 sequences in their first year:

- Applied Statistics
- Theoretical Statistics
- Probability
- Computation and Machine Learning

All students pass prelim exams in 2 of the 4 subjects by the beginning of their second year. Well-prepared students may be allowed to pass one or both of their exams upon arrival. Students should take a distribution requirement of up to two courses in their second year and are otherwise encouraged to explore the great variety of graduate courses on offer, both inside the department and in other departments.

Starting in their second year, students should find a topic for a Ph.D. dissertation and establish a relationship with a Ph.D. adviser. Taking courses with potential advisers is part of this process. The detailed process is listed here
.

**The Ph.D.: Training in Teaching, Presentation, and Consulting**

Part of every statistician's job is to evaluate the work of others and to communicate knowledge, experience, and insights. Every statistician is, to some extent, an educator, and the department provides graduate students with training for this aspect of their professional lives. The department expects all doctoral students, regardless of their professional objectives and sources of financial support, to take part in a graduated program of participation in some or all phases of instruction, from grading, course assisting, and conducting discussion sections, to being a lecturer with responsibility for an entire course.

Students also receive training in how to present research in short seminars in the first and second years of study. Later, students present their own work in a dissertation proposal and, eventually, in a thesis defense. The student seminars are listed here
.

Ph.D. students should also participate in the department's consulting program
, which is led by faculty members and exposes the students to empirical projects inside the university. Projects are carried out by groups of students under the guidance of a faculty member. The client is a researcher in an applied area, usually associated with the university. An informal seminar meets regularly over lunch to provide a forum for presenting and discussing problems, solutions, and topics in statistical consultation. Students present interesting or difficult consulting problems to the seminar as a way of stimulating wider consideration of the problem and as a means of developing familiarity with the kinds of problems and lines of attack involved. Often the client will participate in the presentation and discussion.

**Programs and Requirements for the M.S. degree**

The main requirements of the M.S. program are a sequence of at least nine approved courses plus a Master's paper. Students may take up to two years of courses. A detailed set of regulations can be found here
. A substantial fraction of available courses are the same as for the Ph.D. degree.

**Facilities **

Almost all departmental activities–classes, seminars
, computation
, and student and faculty offices
–are located in Eckhart Hall or neighboring Ryerson Hall. Each student is assigned a desk in one of several offices. A small departmental library and conference room is a common meeting place for formal and informal gatherings of students and faculty. The major computing facilities of the department are based upon a network of PCs running mainly Linux. One computer room currently houses many of these PCs; these rooms are directly and primarily for graduate students in the Statistics Department. In addition, all student offices have limited computer facilities. For further information, consult the department’s computing policies
.

**Statistics Throughout the University**

In addition to the courses, seminars, and programs in the Department of Statistics, courses and workshops of direct interest to statisticians occur throughout the University, most notably in the programs in statistics and econometrics in the Booth School of Business
and in the research programs in Health Studies
, Human Genetics
, Financial Mathematics and Econometrics
, Computer Science
, Economics
and NORC
(formerly the National Opinion Research Center). The large number of statistics related seminars
is perhaps the best indication of the vibrancy of the statistics research community here at the University of Chicago.

### Statistics Courses

**STAT 30100. Mathematical Statistics I. 100 Units.**

This course is part of a two-quarter sequence on the theory of statistics. Topics will include exponential, curved exponential, and location-scale families; mixtures, hierarchical and conditional modeling including compatibility of conditional distributions; multivariate normal and joint distributions of quadratic forms of multivariate normal; principles of estimation; identifiability, sufficiency, minimal sufficiency, ancillarity, completeness; properties of the likelihood function and likelihood-based inference, both univariate and multivariate, including examples in which the usual regularity conditions do not hold; multivariate information inequality. Part of the course will be devoted to elementary asymptotic methods that are useful in the practice of statistics, including methods to derive asymptotic distributions of various estimators and test statistics, such as Pearson's chi-square, standard and nonstandard asymptotics of maximum likelihood estimators, asymptotics of order statistics and extreme order statistics, Cramer’s theorem including situations in which the second-order term is needed, asymptotic efficiency. Other topics (e.g., methods for dependent observations) may be covered if time permits.

Terms Offered: Winter

Prerequisite(s): STAT 30400 or consent of instructor

**STAT 30200. Mathematical Statistics II. 100 Units.**

This course continues the development of Mathematical Statistics, with an emphasis on Bayesian inference. Topics include Bayesian Inference and Computation, Frequentist Inference, Decision theory, admissibility and Stein’s paradox, the Likelihood principle, Exchangeability and De Finetti’s theorem, multiple comparisons and False Discovery Rates. The mathematical level will generally be at that of an easy advanced calculus course. We will assume familiarity with standard statistical distributions (e.g., Normal, Poisson, Binomial, Exponential), with the laws of probability, expectation, conditional expectation, etc. Concepts will be illustrated mainly by instructive “toy” examples, where calculations can be done by hand. However, we will also study more complex, practical applications of Bayesian statistics. Although some basic methods of computation will be discussed, the primary focus will be on concepts and not on computation.

Terms Offered: Spring

Prerequisite(s): STAT 30400 or consent of instructor

**STAT 30400. Distribution Theory. 100 Units.**

This course is a systematic introduction to random variables and probability distributions. Topics include standard distributions (i.e., uniform, normal, beta, gamma, *F, t,* Cauchy, Poisson, binomial, and hypergeometric); moments and cumulants; characteristic functions; exponential families; modes of convergence; central limit theorem; and Laplace’s method.

Terms Offered: Autumn

Prerequisite(s): STAT 24500 and MATH 20500, or consent of instructor

**STAT 30600. Advanced Statistical Inference-1. 100 Units.**

The course is concerned with statistical inference in high-dimensional settings, which has been one of the dominant themes of statistical research in the last decade. The objective is to get an understanding of theoretical underpinnings as well as computational aspects of the methods that have been developed. Additional topics include other regularization methods and other models (e.g., generalized linear models, graphical models).

Terms Offered: Autumn

Prerequisite(s): Consent of instructor

**STAT 30750. Numerical Linear Algebra. 100 Units.**

This course is devoted to the basic theory of linear algebra and its significant applications in scientific computing. The main objective is to provide a working knowledge of linear algebra and matrix computation suitable for advanced studies in which numerical methods are in demand, such as in statistics, econometrics, and scientific data organization and computation. Topics covered will include: Gaussian elimination, LU decomposition, vector spaces, linear transformations and their matrix representations, orthogonality and projections, QR factorization, eigenvectors and eigenvalues, diagonalization of real symmetric and complex Hermitian matrices, the spectral theorem, Cholesky decomposition, and Singular Value Decomposition. In addition, students will program in MATLAB or R using basic algorithms for linear systems, eigenvalue problem, matrix factorization, and sensitivity analysis.

Terms Offered: Autumn

Prerequisite(s): Multivariate calculus (MATH 19520 or 20000, or equivalent)

Equivalent Course(s): STAT 24300

**STAT 30800. Advanced Statistical Inference-2. 100 Units.**

This course will discuss the following topics in high-dimensional statistical inference: random matrix theory and asymptotics of its eigen-decompositions, estimation and inference of high-dimensional covariance matrices, large dimensional factor models, multiple testing and false discovery control and high-dimensional semiparametrics. On the methodological side, probability inequalities, including exponential, Nagaev, and Rosenthal-type inequalities will be introduced.

Terms Offered: Winter

Prerequisite(s): STAT 30200 or consent of instructor

**STAT 30900. Mathematical Computation I: Matrix Computation Course. 100 Units.**

This course covers the theory and practice of matrix computation, starting with the LU and Cholesky decompositions, the QR decompositions with applications to least squares, iterative methods for solving eigenvalue problems, iterative methods for solving large systems of equations, and (time permitting) the basics of the fast Fourier and fast wavelet transforms. The mathematical theory underlying the algorithms is emphasized, as well as their implementation in code.

Terms Offered: Autumn

Prerequisite(s): Linear algebra (STAT 24300 or equivalent) and some previous experience with statistics

Equivalent Course(s): CMSC 37810

**STAT 31000. Mathematical Computation II: Optimization and Simulation. 100 Units.**

This course covers the fundamentals of continuous optimization, including constrained optimization, and introduces the use of Monte Carlo methods in computer simulation and combinatorial optimization problems. Several substantial programming projects (using MATLAB) are completed during the course.

Terms Offered: Winter

Prerequisite(s): Solid grounding in multivariate calculus, linear algebra, and probability theory

Equivalent Course(s): CMSC 37811

**STAT 31100. Mathematical Computation III: Numerical Methods for PDE's. 100 Units.**

The first part of this course introduces basic properties of PDE’s; finite difference discretizations; and stability, consistency, convergence, and Lax’s equivalence theorem. We also cover examples of finite difference schemes; simple stability analysis; convergence analysis and order of accuracy; consistency analysis and errors (i.e., dissipative and dispersive errors); and unconditional stability and implicit schemes. The second part of this course includes solution of stiff systems in 1, 2, and 3D; direct vs. iterative methods (i.e., banded and sparse LU factorizations); and Jacobi, Gauss-Seidel, multigrid, conjugate gradient, and GMRES iterations..

Terms Offered: Spring

Prerequisite(s): Some prior exposure to differential equations and linear algebra

Equivalent Course(s): CMSC 37812

**STAT 31200. Introduction to Stochastic Processes I. 100 Units.**

This course introduces stochastic processes not requiring measure theory. Topics include branching processes, recurrent events, renewal theory, random walks, Markov chains, Poisson, and birth-and-death processes.

Terms Offered: Winter

Prerequisite(s): STAT 25100 and MATH 20500; STAT 30400 or consent of instructor

**STAT 31300. Introduction to Stochastic Processes II. 100 Units.**

Topics include continuous-time Markov chains, Markov chain Monte Carlo, discrete-time martingales, and Brownian motion and diffusions. Our emphasis is on defining the processes and calculating or approximating various related probabilities. The measure theoretic aspects of these processes are not covered rigorously.

Terms Offered: Spring

Prerequisite(s): STAT 31200 or consent of instructor

**STAT 31700. Introduction to Probability Models. 100 Units.**

This course introduces stochastic processes as models for a variety of phenomena in the physical and biological sciences. Following a brief review of basic concepts in probability, we introduce stochastic processes that are popular in applications in sciences (e.g., discrete time Markov chain, the Poisson process, continuous time Markov process, renewal process and Brownian motion).

Terms Offered: Winter

Prerequisite(s): STAT 24400 or 25100

Equivalent Course(s): STAT 25300

**STAT 31900. Causal Inference. 100 Units.**

This course is designed for graduate students and advanced undergraduate students from social sciences, health science, public policy, and social services administration who will be or are currently involved in quantitative research and are interested in studying causality. The course begins by introducing Rubin’s causal model. A major emphasis will be placed on conceptualizing causal questions including intent-to-treat effect, differential treatment effect, mediated treatment effect, and cumulative treatment effect. In addition to comparing alternative experimental, quasi-experimental, and non-experimental designs, we will clarify the assumptions under which a causal effect can be identified and estimated from non-experimental data. Students will become familiar with causal inference techniques suitable for evaluating binary treatments, concurrent multi-valued treatments, continuous treatments, or time-varying treatments in quasi-experimental or non-experimental data. These include propensity score matching and stratification, inverse-probability-of-treatment weighting (IPTW) and marginal mean weighting through stratification (MMW-S), regression discontinuity design, and the instrumental variable (IV) method. The course is aimed at equipping students with preliminary knowledge and skills necessary for appraising and conducting causal comparative studies. (M)

Instructor(s): G. Hong Terms Offered: Autumn

Prerequisite(s): Intermediate Statistics

Equivalent Course(s): CHDV 30102

**STAT 33100. Sample Surveys. 100 Units.**

This course covers random sampling methods; stratification, cluster sampling, and ratio estimation; and methods for dealing with nonresponse and partial response.

Terms Offered: Autumn

Prerequisite(s): Consent of instructor

**STAT 33560. Chaos and Predictability. 100 Units.**

This course provides an introduction to the analysis both of nonlinear dynamical systems and of actual systems best described by nonlinear models. A geometric view of linear and nonlinear time series analysis is developed. Mathematical chaos will be defined and then used to exemplify the strengths, weaknesses and risks of applying linear intuitions in a nonlinear context. Prediction, predictability, forecast evaluation will also be considered in this context. The student will develop a software toolkit for the analysis and modelling, questions of which methods to employ (linear/non-linear, deterministic/stochastic). The efficacy of modern methods applied to more tractable mathematical systems is contrasted with their application to the analysis and prediction of actual time series of observations. Options for dealing with the fundamental limitations of applied analysis due to model inadequacy are compared.

Instructor(s): Leonard A. Smith Terms Offered: Winter

Prerequisite(s): STAT 24500 or equivalent (can be taken concurrently)

**STAT 33600. Time Dependent Data. 100 Units.**

This course considers the modeling and analysis of data that are ordered in time. The main focus is on quantitative observations taken at evenly spaced intervals and includes both time-domain and spectral approaches.

Terms Offered: Spring

Prerequisite(s): MATH 15300 and STAT 24400, STAT 24500 or 22400, or consent of instructor

Note(s): Some previous exposure to Fourier series is helpful but not required.

Equivalent Course(s): STAT 26100

**STAT 33610. Asymptotics for Time Series. 100 Units.**

This course will present a systematic asymptotic theory for time series analysis. In particular, the class will discuss asymptotics for sample mean, sample variances, banded covariance matrices estimates, inference of trends, periodograms, spectral density estimates, quantile estimation, nonparametric estimates, VaR and long-range dependent processes. Some asymptotic theory for non-stationary processes and functional linear models will also be presented.

Terms Offered: Autumn

Prerequisite(s): BUSF 30200 and STAT 31300 or consent of instructor

**STAT 33970. Statistics of High-Frequency Financial Data. 100 Units.**

This course is an introduction to the econometric analysis of high-frequency financial data. This is where the stochastic models of quantitative finance meet the reality of how the process really evolves. The course is focused on the statistical theory of how to connect the two, but there will also be some data analysis. With some additional statistical background (which can be acquired after the course), the participants will be able to read articles in the area. The statistical theory is longitudinal, and it thus complements cross-sectional calibration methods (implied volatility, etc.). The course also discusses volatility clustering and market microstructure.

Terms Offered: Spring

Prerequisite(s): STAT 39000/FINM 34500, also some statistics/econometrics background as in STAT 24400–24500, or FINM 33150 and FINM 33400, or equivalent, or consent of instructor.

Equivalent Course(s): FINM 33170

**STAT 34300. Applied Linear Statistical Methods. 100 Units.**

This course introduces the theory, methods, and applications of fitting and interpreting multiple regression models. Topics include the examination of residuals, the transformation of data, strategies and criteria for the selection of a regression equation, nonlinear models, biases due to excluded variables and measurement error, and the use and interpretation of computer package regression programs. The theoretical basis of the methods, the relation to linear algebra, and the effects of violations of assumptions are studied. Techniques discussed are illustrated by examples involving both physical and social sciences data.

Terms Offered: Autumn

Prerequisite(s): STAT 24500 or equivalent, and linear algebra (STAT 24300 or equivalent)

**STAT 34500. Design and Analysis of Experiments. 100 Units.**

This course introduces the methodology and application of linear models in experimental design. We emphasize the basic principles of experimental design (e.g., blocking, randomization, incomplete layouts). Many of the standard designs (e.g., fractional factorial, incomplete block, split unit designs) are studied within this context. The analysis of these experiments is developed as well, with particular emphasis on the role of fixed and random effects. Additional topics may include response surface analysis, the use of covariates in the analysis of designed experiments, and spatial analysis of field trials.

Terms Offered: Winter

Prerequisite(s): STAT 34300

**STAT 34700. Generalized Linear Models. 100 Units.**

This applied course covers factors, variates, contrasts, and interactions; exponential-family models (i.e., variance function); definition of a generalized linear model (i.e., link functions); specific examples of GLMs; logistic and probit regression; cumulative logistic models; log-linear models and contingency tables; inverse linear models; Quasi-likelihood and least squares; estimating functions; and partially linear models.

Terms Offered: Spring

Prerequisite(s): STAT 34300 or consent of instructor

**STAT 35000. Principles of Epidemiology. 100 Units.**

This course does not meet requirements for the biological sciences major. Epidemiology is the study of the distribution and determinants of health and disease in human populations. This course introduces the basic principles of epidemiologic study design, analysis, and interpretation through lectures, assignments, and critical appraisal of both classic and contemporary research articles.

Instructor(s): L. Kurina Terms Offered: Autumn

Prerequisite(s): Introductory statistics recommended or Consent of Instructor

Equivalent Course(s): HSTD 30900,BIOS 29318,ENST 27400,PPHA 36400

**STAT 35201. Introduction to Clinical Trials. 100 Units.**

This course will review major components of clinical trial conduct, including the formulation of clinical hypotheses and study endpoints, trial design, development of the research protocol, trial progress monitoring, analysis, and the summary and reporting of results. Other aspects of clinical trials to be discussed include ethical and regulatory issues in human subjects research, data quality control, meta-analytic overviews and consensus in treatment strategy resulting from clinical trials, and the broader impact of clinical trials on public health.

Instructor(s): J. Dignam Terms Offered: Spring

Prerequisite(s): HSTD 32100 or STAT 22000; Introductory Statistics or Consent of Instructor

Note(s): Not offered in 2012-13

**STAT 35400. Gene Regulation. 100 Units.**

This course covers the fundamental theory of gene expression in prokaryotes and eukaryotes through lectures and readings in the primary literature. Natural and synthetic genetic systems arising in the context of *E. coli* physiology and Drosophila development will be used to illustrate fundamental biological problems together with the computational and theoretical tools required for their solution. These tools include large-scale optimization, image processing, ordinary and partial differential equations, the chemical Langevin and Fokker-Planck equations, and the chemical master equation. A central theme of the class is the art of identifying biological problems which require theoretical analysis and choosing the correct mathematical framework with which to solve the problem.

Terms Offered: Winter

Prerequisite(s): Consent of instructor

Note(s): Not offered in 2012-13

Equivalent Course(s): ECEV 35400,MGCB 35401

**STAT 35500. Statistical Genetics. 100 Units.**

This is an advanced course in statistical genetics. It is recommended that students have either Human Genetics 47100 or both STAT 24400 and 24500 as prerequisites. This is a discussion course and student presentations will be required. Topics vary and may include, but are not limited to, statistical problems in genetic association mapping, population genetics, microarray analysis, and genetic models for complex traits.

Terms Offered: Spring

Prerequisite(s): HGEN 47100, STAT 24400–24500 or equivalent recommended. Students without this background should consult instructor.

**STAT 35600. Applied Survival Analysis. 100 Units.**

This course will provide an introduction to the principles and methods for the analysis of time-to-event data. This type of data occurs extensively in both observational and experimental biomedical and public health studies, as well as in industrial applications. While some theoretical statistical detail is given (at the level appropriate for a Master's student in statistics), the primary focus will be on data analysis. Problems will be motivated from an epidemiologic and clinical perspective, concentrating on the analysis of cohort data and time-to-event data from controlled clinical trials.

Instructor(s): H. Cao Terms Offered: Autumn

Prerequisite(s): HSTD 32100 or Stat 22000; introductory statistics or consent of instructor

Equivalent Course(s): HSTD 33100

**STAT 35700. Epidemiologic Methods. 100 Units.**

This course expands on the material presented in "Principles of Epidemiology," further exploring issues in the conduct of epidemiologic studies. The student will learn the application of both stratified and multivariate methods to the analysis of epidemiologic data. The final project will be to write the "specific aims" and "methods" sections of a research proposal on a topic of the student's choice.

Instructor(s): D. Huo Terms Offered: Winter

Prerequisite(s): HSTD 30700 or HSTD 30900 AND HSTD 32400 or applied statistics courses through multivariate regression.

Equivalent Course(s): HSTD 31001

**STAT 35800. Statistical Applications. 100 Units.**

This course provides a transition between statistical theory and practice. The course will cover statistical applications in medicine, mental health, environmental science, analytical chemistry, and public policy.

,Lectures are oriented around specific examples from a variety of content areas. Opportunities for the class to work on interesting applied problems presented by U of C faculty will be provided. Although an overview

,of relevant statistical theory will be presented, emphasis is on the development of statistical solutions to interesting applied problems.

Instructor(s): R. Gibbons Terms Offered: Spring

Prerequisite(s): HSTD 32700/STAT 22700 or STAT 34700 or consent of instructor.

Note(s): Not offered in 2012-13

Equivalent Course(s): HSTD 33500

**STAT 36700. History of Statistics. 100 Units.**

This course covers topics in the history of statistics, from the eleventh century to the middle of the twentieth century. We focus on the period from 1650 to 1950, with an emphasis on the mathematical developments in the theory of probability and how they came to be used in the sciences. Our goals are both to quantify uncertainty in observational data and to develop a conceptual framework for scientific theories. This course includes broad views of the development of the subject and closer looks at specific people and investigations, including reanalyses of historical data.

Instructor(s): S. Stigler Terms Offered: Spring

Prerequisite(s): Prior statistics course

Equivalent Course(s): STAT 26700,CHSS 32900,HIPS 25600

**STAT 36900. Applied Longitudinal Data Analysis. 100 Units.**

Longitudinal data consist of multiple measures over time on a sample of individuals. This type of data occurs extensively in both observational and experimental biomedical and public health studies, as well as in studies in sociology and applied economics. This course will provide an introduction to the principles and methods for the analysis of longitudinal data. Whereas some supporting statistical theory will be given, emphasis will be on data analysis and interpretation of models for longitudinal data. Problems will be motivated by applications in epidemiology, clinical medicine, health services research, and disease natural history studies.

Instructor(s): R. Thisted Terms Offered: Autumn

Prerequisite(s): HSTD 32400/STAT 22400 or equivalent, AND HSTD 32600/STAT 22600 or HSTD 32700/STAT 22700 or equivalent; or consent of instructor.

Equivalent Course(s): HSTD 33300

**STAT 37400. Nonparametric Inference. 100 Units.**

Nonparametric inference is about developing statistical methods and models that make weak assumptions. A typical nonparametric approach estimates a nonlinear function from an infinite dimensional space rather than a linear model from a finite dimensional space. This course gives an introduction to nonparametric inference, with a focus on density estimation, regression, confidence sets, orthogonal functions, random processes, and kernels. The course treats nonparametric methodology and its use, together with theory that explains the statistical properties of the methods.

Terms Offered: Winter

Prerequisite(s): STAT 22400 or 24400

Equivalent Course(s): STAT 27400

**STAT 37500. Pattern Recognition. 100 Units.**

This course treats statistical models and methods for pattern recognition and machine learning. Topics include a review of the multivariate normal distribution, graphical models, computational methods for inference in graphical models in particular the EM algorithm for mixture models and HMM’s, and the sum-product algorithm. Linear discriminative analysis and other discriminative methods, such as decision trees and SVM’s are covered as well.

Terms Offered: Spring

Prerequisite(s): Linear algebra at the level of STAT 24300. Knowledge of probability and statistical estimation techniques (e.g., maximum likelihood and linear regression) at the level of STAT 24400-24500

Equivalent Course(s): STAT 24610

**STAT 37601. Machine Learning and Large-Scale Data Analysis. 100 Units.**

This course is an introduction to machine learning and the analysis of large data sets using distributed computation and storage infrastructure. Basic machine learning methodology and relevant statistical theory will be presented in lectures. Homework exercises will give students hands-on experience with the methods on different types of data. Methods include algorithms for clustering, binary classification, and hierarchical Bayesian modeling. Data types include images, archives of scientific articles, online ad clickthrough logs, and public records of the City of Chicago. Programming will be based on Python and R, but previous exposure to these languages is not assumed.

Instructor(s): J. Lafferty Terms Offered: Spring

Prerequisite(s): CMSC 15400 (or CMSC 12200 and either STAT 22400 or STAT 24400), or consent of the instructor

Equivalent Course(s): CMSC 25025

**STAT 37710. Machine Learning. 100 Units.**

This course introduces the theory and practice of machine learning, emphasizing statistical approaches to the problem. Topics include pattern recognition, empirical risk minimization and the Vapnik Chervonenkis theory, neural networks, decision trees, genetic algorithms, unsupervised learning, and multiple classifiers.

Instructor(s): J. Lafferty Terms Offered: Winter

Prerequisite(s): Consent of department counselor. CMSC 25010 or consent of instructor.

Equivalent Course(s): CMSC 35400

**STAT 37900. Computer Vision. 100 Units.**

This course covers deformable models for detecting objects in images. Topics include one-dimensional models to identify object contours and boundaries; two-dimensional models for image matching; and sparse models for efficient detection of objects in complex scenes. Mathematical tools needed to define the models and associated algorithms are developed. Applications include detecting contours in medical images, matching brains, and detecting faces in images. Neural network implementations of some of the algorithms are presented, and connections to the functions of the biological visual system are discussed.

Instructor(s): Y. Amit Terms Offered: Winter. Not offered 2012–13.

Prerequisite(s): Consent of department counselor and instructor

Equivalent Course(s): CMSC 35500,CMSC 25050

**STAT 38100. Measure-Theoretic Probability I. 100 Units.**

This course provides a detailed, rigorous treatment of probability from the point of view of measure theory, as well as existence theorems, integration and expected values, characteristic functions, moment problems, limit laws, Radon-Nikodym derivatives, and conditional probabilities.

Terms Offered: Autumn

Prerequisite(s): STAT 31300 or consent of instructor

**STAT 38300. Measure-Theoretic Probability III. 100 Units.**

This course continues material covered in STAT 38100, with topics that include Lp spaces, Radon-Nikodym theorem, conditional expectation, and martingale theory.

Terms Offered: Winter

Prerequisite(s): STAT 38100

**STAT 38500. Advanced Topics: Probability. 100 Units.**

This course will include the following topics: continuous-time martingales, Brownian motion, Levy processes, Ito integral and stochastic calculus, and stochastic differential equations and diffusions. Topics may vary.

Terms Offered: Spring

Equivalent Course(s): MATH 38509

**STAT 38600. Topics in Stochastic Processes. 100 Units.**

This will be a course in “high-dimensional” probability aimed at introducing some of the mathematics of empirical processes, concentration, Gaussian random fields, large random matrices, and compressed sensing.

Terms Offered: Winter

Prerequisite(s): Basic probability and analysis, discrete-time martingales (STAT 30400 and 31300)

**STAT 38650. Random Matrices and Related Topics. 100 Units.**

This course will be an introduction to the spectral theory of large random matrices and related topics in probability. The first part of the course will be devoted to \bulk spectral properties of Wigner and sample covariance matrices (that is, the empirical distribution of their eigenvalues), leading to the Wigner semi-circle law and the Marchenko-Pastur theorem. The second part will focus on the Gaussian orthogonal and unitary ensembles and on the distribution theory of the top eigenvalue (Tracy-Widom theory). This will lead to the study of orthogonal polynomials, Fredholm determinants, determinantal point processes, and Toeplitz matrices. Relationships to various combinatorial problems in probability, including asymmetric exclusion processes, last-passage percolation, and various stochastic models of growth and deposition, will be studied. Several other related topics may be discussed, depending on the interests and backgrounds of the audience and the instructor.

**STAT 39000. Stochastic Calculus. 100 Units.**

The course starts with a quick introduction to martingales in discrete time, and then Brownian motion and the Ito integral are defined carefully. The main tools of stochastic calculus (Ito's formula, Feynman-Kac formula, Girsanov theorem, etc.) are developed. The treatment includes discussions of simulation and the relationship with partial differential equations. Some applications are given to option pricing, but much more on this is done in other courses. The course ends with an introduction to jump process (Levy processes) and the corresponding integration theory.

Instructor(s): Greg Lawler Terms Offered: Winter

Equivalent Course(s): FINM 34500

**STAT 39800. Field Research. Variable Units.**

This Summer Quarter course offers graduate students in the Statistics Department the opportunity to apply statistics knowledge that they have acquired to a real industry or business situation. During the summer quarter in which they are registered for the course, students complete a paid or unpaid internship of at least six weeks. Prior to the start of the work experience, students secure faculty consent for an independent study project to be completed during the internship quarter.

Terms Offered: Summer only

Prerequisite(s): Consent of instructor and faculty advisor

**STAT 39900. Master's Seminar. Variable Units.**

This course is for Statistics Master's students to carry out directed reading or guided work on topics related to their Master's papers.

**STAT 40100. Reading/Research: Statistics. Variable Units.**

This course allows doctoral students to receive credit for advanced work related to their dissertation topics. Students register for one of the listed faculty sections with prior consent from the respective instructor. Students may work with faculty from other departments; however, they still must obtain permission from and register with one of the listed faculty members in the Department of Statistics.

Terms Offered: All quarters

Prerequisite(s): Consent of instructor

**STAT 42500. Theoretical Neuroscience: Dynamics of Neurons and Networks. 100 Units.**

This course will introduce students to basic models of neurons and neural networks. It will cover basic mathematical tools that are useful to analyze such models. The course will start by models of single neurons and synapses. It will then move to network models, and describe how external inputs, single neuron and synaptic dynamics shape the collective dynamics at the network level in various types of network architectures. The last part of the course will focus on how learning shapes the dynamics at the neuron and network levels.

Instructor(s): N. Brunel Terms Offered: Winter

Note(s): Consent of Instructor required

Equivalent Course(s): CPNS 35500

**STAT 42600. Theoretical Neuroscience: Statistics and Information Theory. 100 Units.**

This course will introduce students to basic models of neurons and neural networks. It will cover basic mathematical tools that are useful to analyze such models. The course will start by models of single neurons and synapses. It will then move to network models, and describe how external inputs, single neuron and synaptic dynamics shape the collective dynamics at the network level in various types of network architectures. The last part of the course will focus on how learning shapes the dynamics at the neuron and network levels.

Instructor(s): N. Brunel, S. Palmer Terms Offered: Spring.

Prerequisite(s): CPNS 35500

Equivalent Course(s): CPNS 35600

**STAT 45800. Workshop on Collaborative Research in Statistics, Computing, and Science. 100 Units.**

This course aims to bring together researchers with expertise in a variety of disciplines (statistics, computing, biology) to work together to produce solutions to a particular scientific problem. The problem we will focus on is identifying differences in the results of a high-throughput sequencing assay between groups of samples. No knowledge of this problem is assumed: it will be introduced in full at the start of the class, together with an outline for an initial proposed approach to addressing the problem. We will work together to implement, test, document and improve this proposed approach. It is expected that each student will bring one or more relevant skills to the table (see list below), as well as an enthusiasm to learn new relevant skills. An ambitious goal is that by the end of the class we will have functional and well-documented software implementing methods that work for the problem in hand. A less ambitious goal is that we will have learned something about the benefits and challenges of working together with people with different skill sets, as well as being exposed to an important type of data (high-throughput sequencing) that is likely to play a major role in biological sciences during the next decade.

Questions to the instructor: Matthew Stephens, mstephens@uchicago.edu

Here's a nonexhaustive list of relevant skills. It is expected that each student will have expertise in one or more of these, and enthusiasm to learn others (from each other!).

Statistics:

Wavelets

Generalized linear (mixed) models

Shrinkage

Hierarchical models

Bayesian methods

Statistical Computing:

R programming

R package writing

R vignettes interfacing R with C++

Computing:

scripting languages (e.g., Perl, Python)

C++

Version control and software sharing (git)

Other software engineering practices I may not know about!

Bioinformatics:

Tools for dealing with high-throughput sequence data

BAM file, SAM files etc

Biological Assays:

DNase-seq

ChIP-seq

RNA-seq

Terms Offered: Winter