Knowledge Agora



Similar Articles

Title The Effect of Training Data Size on Disaster Classification from Twitter
ID_Doc 65149
Authors Effrosynidis, D; Sylaios, G; Arampatzis, A
Title The Effect of Training Data Size on Disaster Classification from Twitter
Year 2024
Published Information, 15.0, 7
Abstract In the realm of disaster-related tweet classification, this study presents a comprehensive analysis of various machine learning algorithms, shedding light on crucial factors influencing algorithm performance. The exceptional efficacy of simpler models is attributed to the quality and size of the dataset, enabling them to discern meaningful patterns. While powerful, complex models are time-consuming and prone to overfitting, particularly with smaller or noisier datasets. Hyperparameter tuning, notably through Bayesian optimization, emerges as a pivotal tool for enhancing the performance of simpler models. A practical guideline for algorithm selection based on dataset size is proposed, consisting of Bernoulli Naive Bayes for datasets below 5000 tweets and Logistic Regression for larger datasets exceeding 5000 tweets. Notably, Logistic Regression shines with 20,000 tweets, delivering an impressive combination of performance, speed, and interpretability. A further improvement of 0.5% is achieved by applying ensemble and stacking methods.
PDF https://doi.org/10.3390/info15070393
No similar articles found.
Scroll