Public defense of the PhD-thesis: "An Inductive Database System Based on Virtual Mining Views", by Adriana Prado.
The defense is public and takes place in aula Jan Fabre (G0.10) of building G, Middelheimlaan 1, 2020 Antwerpen.
Abstract: Data mining is an interactive process in which different tasks may be performed sequentially; the output of different tasks may be combined to be used as input for subsequent ones. In order to effectively support this knowledge discovery process, the integration of data mining into database systems has become necessary. The concept of inductive database systems has been proposed so as to achieve this integration. Contrary to the numerous proposals of data mining query languages, in this thesis, we present an inductive database system in which the query language is standard SQL. We propose a system in which the user can query the collection of all possible patterns as if they were stored in traditional relational tables.
The main challenge is how this storage can be implemented effectively, since the number of all possible patterns can be extremely high and impractical to store. For example, in the concrete case of itemsets, an exponential number of itemsets would need to be stored. In order to solve this problem, we propose to keep these tables virtual; as far as the user is concerned, all possible patterns are stored, but on the physical layer, no such complete tables exist. Whenever the user queries such a table, or virtual mining view, an efficient data mining algorithm is triggered by the database system, which materializes this table with at least those tuples needed to answer the query. Afterwards, the query can be executed as if the patterns were there all the time. Note that this querying approach assumes the user imposes certain constraints in his or her query, asking for only a subset of all possible patterns, which in turn should be detected by the system and exploited by the data mining algorithms. To this end, we propose an algorithm to extract constraints from SQL queries.
The system was implemented into the database system PostgreSQL. Currently, it gives to the user the ability to mine frequent itemsets, association rules and decision trees. We illustrate the interactive and iterative capabilities of our system with two data mining scenarios.