Understanding whether numerical ratings reliably reflect the sentiment expressed in user-generated product reviews is critical for accurate interpretation of online feedback. Although star ratings provide immediate, quantifiable signals to consumers and businesses, they may not fully convey the nuanced sentiment contained in text. Thus, we investigate the relationship between review ratings and underlying sentiment using a large corpus of Italian online product reviews. Since review corpora typically lack explicit sentiment labels, we develop a predictive framework for sentiment. We use a BERT-based encoder (specifically, AlBERTo), fine-tuned on our large, domain-specific corpus, and a multi-task CORAL ordinal regression trained on a sample with multiple human annotations. Finally, we utilize Correspondence Analysis to compare user ratings with the predicted sentiment scores. Our sentiment model shows strong performance on the validation set when evaluated on a five-point ordinal scale, achieving MAE below 0.62 and RMSE below 0.82. The comparison between ratings and sentiment predictions shows that ratings and textual sentiment are generally aligned at extreme and neutral points, but notable discrepancies exist for mid-scale evaluations, where ratings often fail to capture underlying textual nuances.
How well do ratings reflect sentiment? Evidence from a large Italian review corpus
Biasetton Nicolò
;Salmaso Luigi;
2026
Abstract
Understanding whether numerical ratings reliably reflect the sentiment expressed in user-generated product reviews is critical for accurate interpretation of online feedback. Although star ratings provide immediate, quantifiable signals to consumers and businesses, they may not fully convey the nuanced sentiment contained in text. Thus, we investigate the relationship between review ratings and underlying sentiment using a large corpus of Italian online product reviews. Since review corpora typically lack explicit sentiment labels, we develop a predictive framework for sentiment. We use a BERT-based encoder (specifically, AlBERTo), fine-tuned on our large, domain-specific corpus, and a multi-task CORAL ordinal regression trained on a sample with multiple human annotations. Finally, we utilize Correspondence Analysis to compare user ratings with the predicted sentiment scores. Our sentiment model shows strong performance on the validation set when evaluated on a five-point ordinal scale, achieving MAE below 0.62 and RMSE below 0.82. The comparison between ratings and sentiment predictions shows that ratings and textual sentiment are generally aligned at extreme and neutral points, but notable discrepancies exist for mid-scale evaluations, where ratings often fail to capture underlying textual nuances.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




