Researchers at Queensland University of Technology in Australia have developed an algorithm to detect posts with misogynistic content on Twitter.
The team mined a dataset of one million tweets, searching for keywords like whore, slut and rape. The over million tweets were filtered down to 5000 and categorised as misogynistic, or not based on context and intent.
These were inputted into the machine learning classifier, which used these labelled samples to build its classification model.
“We developed a text mining system where the algorithm learns the language as it goes, first by developing a base-level understanding then augmenting that knowledge with both tweet-specific and abusive language,” Associate Professor Richi Nayak said.
Researchers used a deep learning algorithm that understood the terminologies and changed its model as it learnt.
While the system started with a base dictionary and built its vocabulary, context and intent had to be carefully monitored to ensure that the algorithm could differentiate between abuse, sarcasm and friendly use of aggressive terminologies.
“The key challenge in misogynistic tweet detection is understanding the context of a tweet due to its complex and noisy nature,” Nayak said.
Teaching an algorithm to understand natural language was a hefty task as the language changes and evolves constantly, and much of meaning depends on context and tone.
When the algorithm identified ‘go back to the kitchen’ as misogynistic, the team was happy to see that context learning worked.
Currently, the responsibility is on the user to report abuse, but Professor Nayak and the team hope their research could translate into a platform-level policy that would see Twitter remove any tweets identified by the algorithm as misogynistic.
Researchers said that their model identifies misogynistic content with 75% accuracy, and that it could be used in other contexts like racism, homophobia, or abuse towards people with disabilities.