Machine Learing Models
Crossmod is currently using Fasttext as its backend model.
Now we have 100 models that were trained on 100 subreddits on Reddit and 8 models that were trained on Macro Norm. These models are pre-trained by the data we collected before. We would use the predication from all of them and take actions according to user's manual configuration.
The main prediction process is
subreddit_monitor.py. It could listen to mulitple subreddits and take actions on some of them. The list could be changed in the database. The main steps are:
- Crossmod will use praw to fetch comments in mulitple subreddits;
- Then it will check the whitelisted option and also filtered the comments by
helpers/filters.py(which could get rid of links, emojis and other symbols that could not be used for detecting violations)
- Then it will use our API service to use our backend models to get the prediction. It will generate two scores
agreement_scorefrom 100 subreddit models and
norm_violation_scorefrom 8 macro norm models.
- Crossmod will store the data into database. It could do retraining and data analysis with these data in the future.
- (Optional) If the user chooses to take action on the subreddit of this comment, crossmod will use praw to report it to the moderators on Reddit.
Because we have collected many comments from different subreddits, we could do data retraining to improve the accuracy of our backend model. Our plan is:
- We could use our new data to train a new subreddit model and add it to our model list.