Tina Yazdizadeh and Wei Shi, Carleton University, Canada
Communication using modern internet technologies has revolutionized the ways humans exchange information.. Despite the numerous advantages offered by such technology, its applicability is still limited due to problems stemming from personal attacks and pseudoattacks. On social media platforms, these toxic contents may take the form of texts (e.g., online chats, emails), speech, and even images and movie clips. Because the cyberbullying of an individual via the use of such toxic digital content may have severe consequences, it is essential to design and implement, among others, various techniques to automatically detect, using machine learning approaches, cyberbullying on social media. It is important to use word embedding techniques to represent words for text analysis, typically in the form of a real-valued vector that encodes the meaning of words. The extracted embeddings are used to decide if a digital input contains cyberbullying contents. Supplying strong word representations to classification methods is a key facet of such detection approaches. In this paper, we evaluate the ELMo word embedding against three other word embeddings, namely, TF-IDF, Word2Vec, and BERT, using three basic machine learning models and four deep learning models. The results show that the ELMo word embeddings have the best results when combined with neural network-based machine learning models.
Cyberbullying, Natural Language Processing, Word Embeddings, ELMo, Machine Learning.