Abstract for Niklas Norén's master thesis

A Monte Carlo Method for Bayesian Dependency Derivation

Author: Niklas Norén

Abstract

Dependency derivation is the search for combinations of variables (or states of variables) in a database, that co-occur unexpectedly often. In Bayesian dependency derivation, indications are ranked primarily by their estimated strengths, but an adjustment is made to account for uncertainty when data is scarce. This reduces the risk of highlighting spurious associations.

This report presents refined methods for IC analysis---one method for Bayesian dependency derivation. The disproportionality measure in IC analysis is the Information Component (IC) [EJCP,54(4):315-321,1998]. It relates the observed joint frequency of two particular states of two different variables to the frequency expected under the assumption of independence.

In the current implementation of IC analysis, estimates for the lower 95% credibility interval limit are derived based on a normal approximation to the posterior IC distribution [CSDA,34(4):473-493,2000]. In this report, the validity of these approximations is examined through Monte Carlo simulation. Monte Carlo simulation is also proposed and used as a general tool to study the IC distribution.

For accurate lower credibility interval limit derivation over the entire domain of possible parameter values, two Monte Carlo based approaches are proposed: brute force simulation and a tabular method. These methods vary in execution time and the ranges in which they give accurate results. The optimal combination and implementation of the known approaches is highly dependent on characteristics of the database of interest.

Furthermore, this report shows that for a certain choice of non-informative priors the multinomial and the Poisson data models yield equivalent posterior IC distributions and that Monte Carlo simulation under these circumstances is equivalent to the Bayesian bootstrap.

Relevant aspects of the multiple comparisons issue and problems related to stratification and confounding variables are also discussed.

Niklas Norén

Last modified: Sun Feb 16 23:26:30 CET 2003