|
|
|
September
October
November
December
I'll talk
about some recent work with E. Allman and C. Matias on identifiability of
statistical models with hidden variables. Many statistical models incorporate hidden
(unobservable) variables with some sort of structure in which observed
variables are independent when conditioned on the hidden ones. Examples include
simple latent class models, including those with continuous observed variables,
hidden Markov models, phylogenetic models, and a random graph model. Proving
such models are identifiable is generally difficult. However, a theorem of J.
Kruskal provides a good algebraic tool which can be used to address the
question for all these models (and presumably more). After an introduction to
why the identifiability question is difficult for such models, and a
presentation of Kruskal's theorem, I'll explain the method and illustrate its
use on HMMs. Though we mostly recover known theorems for HMMs, the general method
is quite powerful and gives both stronger results and easier proofs for many
other models.
|
|
|
October 20, |
We
will provide a full semi-algebraic description of graphical models for binary
data in the case when the graph is a tree and all the inner nodes are hidden.
It occurs that the inequalities are just simple constraints on
correlations between the manifest variables. Our work suggests that exploring
the semi-algebraic structure of statistical models will probably involve using
coordinate systems other than the raw probabilities.
|
October 27 |
Techniques from algebraic geometry were introduced in design of experiments by Pistone and Wynn (1996). The techniques they introduced identify models of a low degree and provide a natural extension to basic concepts such as confounding and fractions of designs.
I intend to describe the basic ideas of the algebraic method, as well as presenting some of our recent work on aberration which leads to Nyquist-type lower bounds for models identified with algebra.
|
November 10 |
I will describe a range of new
finiteness results for statistical models, in particular, results which say
that, up to symmetry, many models in random variables with state space
"tending to infinity" have finite algebraic descriptions. The focus
will be on applications of these ideas to Markov bases (that is, generating
sets of toric ideals), but the techniques apply to many other statistical models.
The results follow from studying polynomial rings in infinitely many
indeterminates under the action of the infinite symmetric group, which leads to
a theory that is of independent interest. I will also mention a bunch of
open problems and research directions. This is joint work with Chris
Hillar.
|
November 17 |
The p1 model describes
dyadic interactions in a social network, summarized in the form of a directed
graph. It can be represented as a log-linear model. The main goal
of this project (joint work with S. Fienberg and A. Rinaldo) is to
understand this model from the viewpoint of algebraic statistics: describe
Markov bases, study the MLE etc.
In this talk, we will recall what Markov bases are and why we might want to
compute them. Then, we will reveal an algebraic-geometric description of
the Markov bases. Finally, to extract the statistically meaningful part
of this formal description is still work in progress, and we might see a
glimpse of combinatorics coming into play.
December 2008
|
December 8 |
I will aim to connect these three notions as a means to study properties of multidimensional contingency tables, with attention to statistical disclosure limitation (SDL). Mathematical models will be given for the principal SDL methods for tables: controlled rounding and perturbation, complementary cell suppression, and controlled tabular adjustment. Each method relies on the existence of sufficiently many alternative tables (solutions) to the original table, or existence of alternative tables sufficiently far from the original, or both. We are interested in moves between solutions and Markov bases, but instead of dealing directly with Markov or Groebner bases I will introduce an equivalent notion in linear programming. I will discuss a class of tables modeled as networks and, consequently, whose minimal Markov moves are square-free and easily computed. I will characterize Markov moves in general tables as linear programs restricted to a unit hypercube, and illustrate this notion for complete independence log linear models, two-way tables with subtotal constraints, and other non-network examples. Based on the De Loera-Onn characterization of rational polytopes as transportation polytopes, I will open discussion on the utility (or not) of focusing study of Markov moves in tables to thin (network) and slim (non-network) 3-dimensional planar transportation problems (aka no 3-way interaction log linear models). Small (3x3x3) examples will be used as illustrations.
|
February 23 |
A partially specified contingency
table is the set of all contingency tables satisfying fixed marginal totals
(minimal sufficient statistics). We present theory and methods for drawing a
random sample of tables from this set based on linear programming. Diaconis and
Sturmfels (1998) presented the original method which is based on uniform random
sampling from a Markov basis for the set of all feasible moves between tables.
The Markov basis is obtained from a corresponding Groebner basis, which
consequently must be available prior to sampling. Our approach is to construct
random moves one at a time based on a linear programming formulation that
assures irreducibility and reversibility of the Markov chain. Consequently, the
set of all moves constructible by our procedure is a Markov basis. The method
has the following features: it does not require a Groebner basis; it is not
affected by complexities of table design or unduly limited by table size; and,
by virtue of an additional linear constraint specific to the current solution,
it does not create infeasible moves that must be immediately discarded. We
illustrate related points through specific examples.
April 2009
|
April 6 |
The main focus of this
talk is on partial information release strategy in which data owners due to
confidentiality reasons may opt to release
relevant marginal and conditional tables along with sample size instead of a
full contingency table. We will discuss some results on calculation of bounds
on cell entries, and on problems of counting and sampling of the tables from a fiber
defined by given observed conditional values. We will draw comparisons between
marginal and conditional table releases, and discuss implications for
disclosure risk assessment and data utility.
Entire site © 2002-2008, Statistical and Applied Mathematical Sciences Institute. All Rights Reserved.
This page updated on September 19, 2008 10:57 AM