Business Problem: With increased usage of social media, moderating conversations and removing abusive input is becoming more demanding. A predictive model for automating the process of removing and blocking insulting content was needed
Solution: A predictive model was built to identify insulting / abusive content in a conversation or twitter input, for automatic removal to optimize moderation effort.
The solution used “Text Mining”. The text data was cleansed by removing common English words, punctuation, digits etc. and converted it into a matrix using a frequency term approach and Term Frequency – Inverse Document Frequency (TF-IDF).
Technology: R2.15.1, Textir and Liblinear packages.