Twitter Text Objectionable Content Detection using Domain Based Probability and Correlation Model

Twitter Text Objectionable Content Detection using Domain Based Probability and Correlation Model

Abstract— Text mining technique is used to detect the required patterns from the text documents. The nature of text document is an unstructured format where the data is represented in an unstructured manner. The text mining can be used for information retrieval, information extraction, search, classification, and categorization. In this context, an application of text mining is proposed in this work. That effectively analyzes the context of word utilization and provides their context as class label. The proposed work is a model of text classification for detection of illegal use of words in text communication. Thus the proposed technique works in two modules first it trained with the different context of the text and then uses the features to classify the upcoming text as testing. During training, the word probability and the word’s domain wise probability is estimated. Additionally, this information keeps preserved in a database for testing purpose. In the next, phase the testing of the system initiated through the training database and a test set supplied by the experimenter. During this process, all the sentences in a testing datasets are evaluated for computing the sentence probability and correlation estimation. Further, both the parameters are used to compute the weights. These weights are converted into a different indicator named as weight transform. Finally, a threshold is computed for making a decision. The proposed objectionable content detection technique using probability model and correlation is developed using JAVA environment. The implemented model is evaluated and compared with respect to their classical version of objectionable content detection. Results show the improvement made on traditional work improves their ability in terms of accuracy. Thus the model is acceptable for real world applications too.

Keywords- text mining; content detection; objectionable content detection; pattern matching; text classification

I. INTRODUCTION

Text mining techniques are used in various applications for discovering the valuable patterns [1]. These applications are not only used for categorizing and classifying the content but also used in various other applications such as terror attack detection, user’s sentiment analysis, user’s review about products [2] and services [3]. In this work, the text mining technique is studied in order to find the objectionable contents of the text communication. Basically, in text mining, the data mining techniques are used to their basic functionality, but before processing the data it is required to be transformed and to be converted to such a format by which algorithm can accept the data and process it.
Read More