I´m currently working on my bachelor thesis about the usage of Sequential Pattern / Sequential Rule Mining-techniques for automotive predicitve maintenance.
– >I want to answer my research question what amount of data is necessary to extract meaningful / significant Sequential patterns / rules in a sequential database.
I don´t have access to real-life automotive predictive maintenance data, so i thought about taking an webpage click-stream dataset. So my consideration is to take an sequential dataset and then gradually minimize it.
Let‘s assume that with an dataset of 1000 sequences it´s possible to extract 50 sequential rules with confidence 0,8, and with half of the data (500 sequences) it´s only able to extract 35 rules with confidence 0,8, with 50 sequences only 10 rules, and so on. I think i will only change size and variability of the data set.
Do I think correctly or are my thoughts wrong?
I would very appreciate any answer,