In this paper we propose a new method for human action categorization by using an effective combination of novel gradient and optic flow descriptors, and creating a more effective codebook modeling the ambiguity of feature assignment in the traditional bag-of-words model. Recent approaches have represented video sequences using a bag of spatio-temporal visual words, following the successful results achieved in object and scene classification. Codebooks are usually obtained by k-means clustering and hard assignment of visual features to the best representing codeword. Our main contribution is two-fold. First, we define a new 3D gradient descriptor that combined with optic flow outperforms the state-of-the-art, without requiring fine parameter tuning. Second, we show that for spatio-temporal features the popular k-means algorithm is insufficient because cluster centers are attracted by the denser regions of the sample distribution, providing a non-uniform description of the feature space and thus failing to code other informative regions. Therefore, we apply a radius-based clustering method and a soft assignment that considers the information of two or more relevant candidates. This approach generates a more effective codebook resulting in a further improvement of classification performances. We extensively test our approach on standard KTH and Weizmann action datasets showing its validity and outperforming other recent approaches.

Effective Codebooks for Human Action Categorization

BALLAN, LAMBERTO;
2009

Abstract

In this paper we propose a new method for human action categorization by using an effective combination of novel gradient and optic flow descriptors, and creating a more effective codebook modeling the ambiguity of feature assignment in the traditional bag-of-words model. Recent approaches have represented video sequences using a bag of spatio-temporal visual words, following the successful results achieved in object and scene classification. Codebooks are usually obtained by k-means clustering and hard assignment of visual features to the best representing codeword. Our main contribution is two-fold. First, we define a new 3D gradient descriptor that combined with optic flow outperforms the state-of-the-art, without requiring fine parameter tuning. Second, we show that for spatio-temporal features the popular k-means algorithm is insufficient because cluster centers are attracted by the denser regions of the sample distribution, providing a non-uniform description of the feature space and thus failing to code other informative regions. Therefore, we apply a radius-based clustering method and a soft assignment that considers the information of two or more relevant candidates. This approach generates a more effective codebook resulting in a further improvement of classification performances. We extensively test our approach on standard KTH and Weizmann action datasets showing its validity and outperforming other recent approaches.
2009
Proc. of IEEE International Conference on Computer Vision Workshops (ICCV-W)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3242021
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 29
  • ???jsp.display-item.citation.isi??? ND
social impact