2014 LUDIAAnAggregateConstrainedLowR

Jump to: navigation, search

Subject Headings:


Cited By


Author Keywords


In the past few years, the government and other agencies have publicly released a prodigious amount of data that can be potentially mined to benefit the society at large. However, data such as health records are typically only provided at aggregated levels (e.g. per State, per Hospital Referral Region, etc.) to protect privacy. Unfortunately aggregation can severely diminish the utility of such data when modeling or analysis is desired at a per-individual basis. So, not surprisingly, despite the increasing abundance of aggregate data, there have been very few successful attempts in exploiting them for individual-level analyses. This paper introduces LUDIA, a novel low-rank approximation algorithm that utilizes aggregation constraints in addition to auxiliary information in order to estimate or “reconstruct" the original individual-level values from aggregate data. If the reconstructed data are statistically similar to the original individual-level data, off-the-shelf individual-level models can be readily and reliably applied for subsequent predictive or descriptive analytics. LUDIA is more robust to nonlinear estimates and random effects than other reconstruction algorithms. It solves a Sylvester equation and leverages multi-level (also known as hierarchical or mixed-effect) modeling approaches efficiently. A novel graphical model is also introduced to provide a probabilistic viewpoint of LUDIA. Experimental results using a Texas inpatient dataset show that individual-level data can be reasonably reconstructed from county -, hospital -, and zip code-level aggregate data. Several factors affecting the reconstruction quality are discussed, along with the implications of this work for current aggregation guidelines.



 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2014 LUDIAAnAggregateConstrainedLowRYubin Park
Joydeep Ghosh
LUDIA: An Aggregate-constrained Low-rank Reconstruction Algorithm to Leverage Publicly Released Health Data10.1145/2623330.26236592014
AuthorYubin Park + and Joydeep Ghosh +
conferenceKDD-2014 +
doi10.1145/2623330.2623659 +
proceedingsProceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining +
titleLUDIA: An Aggregate-constrained Low-rank Reconstruction Algorithm to Leverage Publicly Released Health Data +
year2014 +