Dependency derivation is the search for combinations of variables (or states of variables) in a database, that co-occur unexpectedly often. In Bayesian dependency derivation, indications are ranked primarily by their estimated strengths, but an adjustment is made to account for uncertainty when data is scarce. This reduces the risk of highlighting spurious associations.

This report presents refined methods for *IC* analysis---one method
for Bayesian dependency derivation. The disproportionality measure in
*IC* analysis is the
Information Component (*IC*)
[EJCP,54(4):315-321,1998].
It relates the observed joint frequency of two particular states of two
different variables to the frequency expected under the assumption of
independence.

In the current implementation of *IC* analysis, estimates for the
lower 95% credibility interval limit are derived based on a normal
approximation to the posterior *IC* distribution
[CSDA,34(4):473-493,2000].
In this report, the validity of these approximations is examined
through Monte Carlo simulation. Monte Carlo simulation is also
proposed and used as a general tool to study the *IC*
distribution.

For accurate lower credibility interval limit derivation over the entire domain of possible parameter values, two Monte Carlo based approaches are proposed: brute force simulation and a tabular method. These methods vary in execution time and the ranges in which they give accurate results. The optimal combination and implementation of the known approaches is highly dependent on characteristics of the database of interest.

Furthermore, this report shows that for a certain choice of
non-informative priors the multinomial and the Poisson data models
yield equivalent posterior *IC* distributions and that Monte Carlo
simulation under these circumstances is equivalent to the Bayesian
bootstrap.

Relevant aspects of the multiple comparisons issue and problems related to stratification and confounding variables are also discussed.

Niklas Norén Last modified: Sun Feb 16 23:26:30 CET 2003