Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Statistical dependency analysis is the basis of all empirical science. A commonly occurring problem is to find the most significant dependency rules, which describe either positive or negative dependencies between categorical attributes. For example, in medical science one is interested in genetic factors, which can either predispose or prevent diseases. The requirement of statistical significance is essential, because the discoveries should hold also in the future data. Typically, the significance is estimated either by Fisher’s exact test or the χ2 -measure. The problem is computationally very difficult, because the number of all possible dependency rules increases exponentially with the number of attributes. As a solution,
different kinds of restrictions and heuristics have been applied, but a general, scalable search method has been missing.
In this paper, we introduce an efficient algorithm, called Kingfisher, for searching for the best non-redundant dependency rules with statistical significance measures. The rules can express either positive or negative dependencies between a set of positive attributes and a single consequent attribute. The algorithm itself is independent from the used goodness measure, but we concentrate on Fisher’s exact test and the χ2 -measure. The algorithm is based on an application of the branch-and-bound search strategy, supplemented by several pruning properties. Especially, we prove a new lower-bound for Fisher’s p, and introduce a new effective pruning principle. According to our experiments on classical benchmark data, the algorithm is well scalable and can efficiently handle even dense and high-dimensional data sets. An interesting observation was that Fisher’s exact test does not only produce more reliable
rules than the χ2-measure, but it also performs the search much faster.
Original languageEnglish
Pages (from-to)383-414
JournalKnowledge and Information Systems
Volume32
Issue number2
Publication statusPublished - 2012
MoE publication typeA1 Journal article-refereed

Keywords

  • Association rule
  • statistical dependence
  • significance testing
  • search algorithms
  • Fisher's exact test
  • chi-squared

Fingerprint

Dive into the research topics of 'Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures'. Together they form a unique fingerprint.

Cite this