An Optimized Framework for Precise Motif Discovery by Merging SBF with Pruning Method

An Optimized Framework for Precise Motif Discovery by Merging SBF with Pruning Method

Abstract:-Motif mining became more popular and got more attention towards data mining field due to its real world applications such as health prediction, locating previous patterns in time series database. Motifs are the most correlated pair of subsequences in sequence objects. Motif discovery is hard on emerging applications which have long sequences or applications where queries arrive rapidly. Since correlation computations and prune subsequence techniques requires different ordering on examining subsequence pairs, Existing works cannot bring faster computation of correlations and prune subsequence pairs at the same time. In this paper we propose a new framework called FMotif (Fast-Motif) which comprises two level approaches for pruning subsequence at outer level and fast correlation computation at the inner level. In our Experimental results, our framework performed 3X faster than existing methods.

Keywords – Motif Discovery, Motif mining, Smart Brute Force, Reference Indices.

I. INTRODUCTION

Motif discovery has large set utilities on data mining technology which can be used for rules discovery, classifications mining, clustering datasets and sequence summarization. Motifs are the most correlated pair of subsequences in sequence objects. Correlations between two subsequences are measured by correlation metrics. Below figure shows the motif that discovered from power consumption dataset. Motif Discovery problem has got attention from data mining communities [1-5]. Motif discovery is the backbone of human and animal’s activity discovery and also useful with surveillance and sports training [3] etc. Enumerated clustering of motifs is more meaning full than clustering all subsequences in large sequence of datasets [7]. Motif discovery is a time consuming problem for Sequence of object Length m contains (m- l + 1) subsequences of length l. Brute force method will compute all pair of sequences and computes correlation of each pair. This method takes more time which is hard for high length of sequences. Most of the existing algorithms concentrates on fast motif discovery [1],[2],[8] but lacks in accuracy and they not provide guaranteed accurate result.
Read More