2007 FilteringProductReviewsFromWebSearchResults

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Text Categorization Algorithm, Product Review Classification Task

Notes

Cited By

Quotes

Abstract

This study seeks to develop an automatic method to identify product reviews on the Web using the snippets (summary information) returned by search engines. Determining whether a snippet is a review or non-review is a challenging task, since the snippet usually does not contain many useful features for identifying review documents. Firstly we applied a common machine learning technique, SVM (Support Vector Machine), to investigate which features of snippets are useful for the classification. Then we employed a heuristic approach utilizing domain knowledge and found that the heuristic approach performs equally well as the machine learning approach. A hybrid approach which combines the machine learning technique and domain knowledge performs slightly better than the machine learning approach alone.

References

  • Choi, B. and Yao, Z. Web Page Classification, Foundations and Advances in Data Mining, Studies in Fuzziness and Soft Computing 180, 2005, 221--274, Springer Berlin/Heidelberg.
  • Aidan Finn, Nicholas Kushmerick, Barry Smyth, Genre Classification and Domain Transfer for Information Filtering, Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval, p.353-362, March 25-27, 2002
  • Thorsten Joachims, Text Categorization with Suport Vector Machines: Learning with Many Relevant Features, Proceedings of the 10th European Conference on Machine Learning, p.137-142, April 21-23, 1998
  • Brett Kessler, Geoffrey Numberg, Hinrich Schütze, Automatic detection of text genre, Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, p.32-38, July 07-12, 1997, Madrid, Spain
  • Jin-Cheon Na, Christopher S. G. Khoo, Syin Chan, Norraihan Bte Hamzah, Sentiment-based search in digital libraries, Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, June 07-11, 2005, Denver, CO, USA doi:10.1145/1065385.1065416
  • Jones, K. S. and Willet, P. Readings in Information Retrieval, Morgan Kaufman, 1997.
  • Bo Pang, Lillian Lee, Shivakumar Vaithyanathan, Thumbs up?: sentiment classification using machine learning techniques, Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, p.79-86, July 06, 2002 doi:10.3115/1118693.1118704
  • J. Ross Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco, CA, 1993
  • Fabrizio Sebastiani, Machine learning in automated text categorization, ACM Computing Surveys (CSUR), v.34 n.1, p.1-47, March 2002 doi:10.1145/505282.505283,


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 FilteringProductReviewsFromWebSearchResultsTun Thura Thet
Jin-Cheon Na
Christopher S. G. Khoo
Filtering Product Reviews from Web Search ResultsProceedings of the 2007 ACM symposium on Document Engineering10.1145/1284420.12844672007