I have sales data of a company suitable for multi-label association, which is available with temporal and location information. I want to find optimal rules.
I think in multi-label data set frequent item set based association mining can be applied and has been used in past in parallel and distributed and can be improved using B+-Tree data structure.
link to similar paper but it uses Trie Data Structures
I think in multi-label data set frequent item set based association mining can be applied and has been used in past in parallel and distributed and can be improved using B+-Tree data structure.
link to similar paper but it uses Trie Data Structures
Apriori algorithm successfully finds the frequent elements from the database. But as the dimensionality of the database increase with the number of items then:
• More search space is needed and I/O cost will increase.
• Number of database scan is increased thus candidate generation will increase results in increase in computational cost.
Therefore many variations have been takes place in the Apriori algorithm to minimize the above limitations arises due to increase in size of database. These subsequently proposed algorithms adopt similar database scan level by level as in Apriori algorithm, while the methods of candidate generation and pruning, support counting and candidate representation may differ. The algorithms improve the Apriori algorithms by:
• Reduce passes of transaction database scans
• Shrink number of candidates
• Facilitate support counting of candidates
Direct Hashing and Pruning (DHP):
It is absorbed that reducing the candidate items from the database is one of the important task for increasing the efficiency. Thus a DHP technique was proposed [7] to reduce the number of candidates in the early passes C# for k 9 1 and thus the size of database. In this method, support is counted by mapping the items from the candidate list into the buckets which is divided according to support known as Hash table structure. As the new item set is encountered if item exist earlier then increase the bucket count else insert into new bucket. Thus in the end the bucket whose support count is less the minimum support is removed from the candidate set. In this way it reduce the generation of candidate sets in the earlier stages but as the level increase the size of bucket also increase thus difficult to manage hash table as well candidate set.
Partitioning Algorithm:
Partitioning algorithm is based to find the frequent elements on the basis partitioning
of database in n parts. It overcomes the memory problem for large database which do not fit into main memory because small parts of database easily fit into main memory. This algorithm divides into two passes,
1 In the first pass whole database is divided into n number of parts.
2 Each partitioned database is loaded into main memory one by one and local frequent elements are found.
3 Combine the all locally frequent elements and make it globally candidate set.
4 Find the globally frequent elements from this candidate set.
It should be noted that if the minimum support for transactions in whole database is
min_sup then the minimum support for partitioned transactions is min-sup number of transaction in that partition. A local frequent itemset may or may not be frequent with respect to the entire database thus any itemset which is potentially frequent must include in any one of the frequent partition
Bharat Gupta (2011), A BETTER APPROACH TO MINE FREQUENT ITEM SETS USING APRIORI AND FP-TREE APPROACH, COMPUTER SCIENCE AND ENGINEERING DEPARTMENT, THAPAR UNIVERSITY, PATIALA