From: Cox, Lawrence H. [lgc9@cdc.gov] Sent: Thursday, March 23, 2006 9:06 AM To: Bahjat F. Qaqish; Data Confidentiality Group Subject: RE: initial thoughts on utility and risk Agree A slight twist on the orthogonality: in perturbation, orthogonality is important, viz, keeping noise perpendicular to original data so that original and perturbed data correlate (perfectly), as in controlled tabular adjustment, and similarly for perturbing microdata So, both user and intruder are interested in making inferences, agency uses orthogonality to mask, and user can obtain his inferences despite orthogonality whereas orthogonality thwarts intruder -----Original Message----- From: Bahjat F. Qaqish [mailto:qaqish@bios.unc.edu] Sent: Tuesday, March 21, 2006 4:04 PM To: Data Confidentiality Group Subject: initial thoughts on utility and risk Hello DC Group, I've been taking a general look at utility and risk. I haven't gotten very far, but here it is. In some sense, both the legitimate user and the intruder are, or will be, generally trying to "make some statistical inferences" from the released data. Suppose that a legitimate user is trying to make inference about L while an intruder is interested in I. At this point, leave I and L vague. Here we have one specific instance of "user" and "intruder" (or L and I). To what extent is it possible to help one while foiling the other? The extreme cases are easy. If I==L, anything we do to help the legitimate user will help the intruder and vice versa. But is I==L a realistic scenario? Another extreme case is that L is (in some sense) "orthogonal" to I. Here, we _probably_ can do very well. Further, there is the vague notion that I may be "sharper" than L. "I" may involve prediction of random variables while L may involve estimation of parameters. For example, (in terms of linear regression) estimation of the population mean at X is less sharp of an inference than prediction of an outcome Y at that X. If var(Y) is large, even knowing the population mean exactly doesn't help much. Estimation of pr(income > 50K | X) is less sharp than predicting a binary Y at X. So, we have "orthogonality" and "sharpness". All the above is for one instance of L and I. Going to a whole ensemble of I's and L's makes it harder.