Phd-Defense Wim Le Page: "Mining Patterns in Relational Databases"

Public defense of the PhD-thesis: “Mining Patterns in Relational Databases”, by Wim Le Page.

The defense is public and takes place in aula Jan Fabre (G0.10) of building G, Middelheimlaan 1, 2020 Antwerpen.

Abstract:

The Information Age has provided us with huge data repositories which cannot longer be analysed manually. The potential high business value of the knowledge that can be gained, drives the research for automated analysis methods that can handle large amounts of data. Most of the data of industry, education and government is stored in relational database management systems (RDBMSs). This motivates the need for data mining algorithms that can work with arbitrary relational
databases, without the need for manual transformation and preprocessing of the data. In this dissertation we introduce two data mining approaches towards the goal of mining interesting patterns in arbitrary relational databases.

We first examine frequent query mining. Within this setting we propose a novel algorithm Conqueror, that is capable of efficiently generating and pruning the search space of all simple conjunctive queries, a pattern class capable of expressing
many different kinds of interesting patterns, such as, but not limited to, functional and inclusion dependencies. We present experiments, showing the feasibility of our approach, as well as the expressiveness of simple conjunctive queries.
Using the knowledge that our algorithm is capable of detecting functional dependencies and foreign keys that were not given initially, we extend our algorithm, enabling it to use these newly discovered, previously unknown functional dependencies and foreign keys to produce a concise set of interesting patterns, free of redundant information. We implement and test our updated algorithm, Conqueror+, and show that it efficiently reduces the number of queries generated. Furthermore we experimentally show that this reduction has the added benefit that Conqueror+ also outperforms Conqueror in timing.
As a second approach we investigate frequent relational itemset mining. We introduce a novel efficient propagation-based depth-first algorithm, called SMuRFIG that is capable of mining frequent relational itemsets as well as confident relational association rules. We introduce a new support measure and show its usefulness in expressing interesting patterns. In addition we define the deviation measure to address the statistical pattern blow-up intrinsic to the relational case, and show that it works as promised. We theoretically and experimentally show that the SMuRFIG algorithm is scalable with respect to time and memory. Moreover, we generalise some popular redundancy measures – closure-based redundancy and minimal improvement – to the multi-relational setting, and confirm that they can reduce the output when complete results are not desired.
This dissertation shows that both approaches provide us with practical algorithms towards the ultimate goal of discovering patterns in arbitrary relational databases.

Event date and time: 
December 16, 2009 - 16:00 - 18:00
Last updated on 05/12/2009, 15:28.