2016 EnrichingProductAdswithMetadata

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Product Offer Title Parsing.

Notes

Cited By

Quotes

Abstract

Product ads are a popular form of search advertizing offered by major search engines, including Yahoo, Google and Bing. Unlike traditional search ads, product ads include structured product specifications, which allow search engine providers to perform better keyword-based ad retrieval. However, the level of completeness of the product specifications varies and strongly influences the performance of ad retrieval.

On the other hand, online shops are increasing adopting semantic markup languages such as Microformats, RDFa and Microdata, to annotate their content, making large amounts of product description data publicly available. In this paper, we present an approach for enriching product ads with structured data extracted from thousands of online shops offering Microdata annotations. In our approach we use structured product ads as supervision for training feature extraction models able toextract attribute-value pairs from unstructured product descriptions. We use these features to identify matching products across different online shops and enrich product ads with the extracted data. Our evaluation on three product categories related to electronics show promising results in terms of enriching product ads with useful product data.

Similar to our CRF feature extraction approach, the authors in (Melli, 2014) propose an approach for annotating products descriptions based on a sequence BIO tagging model, following an NLP text chunking process. Specifically, the authors train a linear-chain conditional random field model on a manually annotated training dataset, to identify only 8 general classes of terms. However, the approach is not able to extract explicit attribute-value pairs.

References

  • 1. Marnix De Bakker, Flavius Frasincar, Damir Vandic, A Hybrid Model Words-driven Approach for Web Product Duplicate Detection, Proceedings of the 25th International Conference on Advanced Information Systems Engineering, June 17-21, 2013, Valencia, Spain doi:10.1007/978-3-642-38709-8_10
  • 2. van Bezu, R., Borst, S., Rijkse, R., Verhagen, J., Vandic, D., Frasincar, F.: Multi-component Similarity Method for Web Product Duplicate Detection 2015
  • 3. Sayan Bhattacharya, Sreenivas Gollapudi, Kamesh Munagala, Consideration Set Generation in Commerce Search, Proceedings of the 20th International Conference on World Wide Web, March 28-April 01, 2011, Hyderabad, India doi:10.1145/1963405.1963452
  • 4. Chawla, N.V.: Data Mining for Imbalanced Datasets: An Overview. In: Maimon, O., Rokach, L. Eds. Data Mining and Knowledge Discovery Handbook, Pp. 853---867. Springer, Heidelberg 2005
  • 5. Jenny Rose Finkel, Trond Grenager, Christopher Manning, Incorporating Non-local Information Into Information Extraction Systems by Gibbs Sampling, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, p.363-370, June 25-30, 2005, Ann Arbor, Michigan doi:10.3115/1219840.1219885
  • 6. Rayid Ghani, Katharina Probst, Yan Liu, Marko Krema, Andrew Fano, Text Mining for Product Attribute Extraction, ACM SIGKDD Explorations Newsletter, v.8 n.1, p.41-48, June 2006 doi:10.1145/1147234.1147241
  • 7. Robert Isele, Christian Bizer, Learning Linkage Rules Using Genetic Programming, Proceedings of the 6th International Conference on Ontology Matching, p.13-24, October 24, 2011, Bonn, Germany
  • 8. Anitha Kannan, Inmar E. Givoni, Rakesh Agrawal, Ariel Fuxman, Matching Unstructured Product Offers to Structured Product Specifications, Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 21-24, 2011, San Diego, California, USA doi:10.1145/2020408.2020474
  • 9. (Melli, 2014) ⇒ Gabor Melli, Shallow Semantic Parsing of Product Offering Titles (for Better Automatic Hyperlink Insertion), Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 24-27, 2014, New York, New York, USA doi:10.1145/2623330.2623343
  • 10. Robert Meusel, Petar Petrovski, Christian Bizer, The WebDataCommons Microdata, RDFa and Microformat Dataset Series, Proceedings of the 13th International Semantic Web Conference - Part I, October 19-23, 2014 doi:10.1007/978-3-319-11964-9_18
  • 11. Meusel, R., Primpeli, A., Meilicke, C., Paulheim, H., Bizer, C.: Exploiting Microdata Annotations to Consistently Categorize Product Offers at Web Scale. In: Stuckenschmidt, H., Jannach, D. Eds. EC-Web 2015. LNBIP, Vol. 239, Pp. 83---93. Springer, Heidelberg 2015
  • 12. Hoa Nguyen, Ariel Fuxman, Stelios Paparizos, Juliana Freire, Rakesh Agrawal, Synthesizing Products for Online Catalogs, Proceedings of the VLDB Endowment, v.4 n.7, p.409-418, April 2011 doi:10.14778/1988776.1988777
  • 13. Petar Petrovski, Volha Bryl, Christian Bizer, Integrating Product Data from Websites Offering Microdata Markup, Proceedings of the 23rd International Conference on World Wide Web, April 07-11, 2014, Seoul, Korea doi:10.1145/2567948.2579704
  • 14. Petrovski, P., Bryl, V., Bizer, C.: Learning Regular Expressions for the Extraction of Product Attributes from E-commerce Microdata 2014
  • 15. Disheng Qiu, Luciano Barbosa, Xin Luna Dong, Yanyan Shen, Divesh Srivastava, Dexter: Large-scale Discovery and Extraction of Product Specifications on the Web, Proceedings of the VLDB Endowment, v.8 n.13, p.2194-2205, September 2015 doi:10.14778/2831360.2831372
  • 16. Damir Vandic, Jan-Willem Van Dam, Flavius Frasincar, Faceted Product Search Powered by the Semantic Web, Decision Support Systems, v.53 n.3, p.425-437, June, 2012 doi:10.1016/j.dss.2012.02.010

}};


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2016 EnrichingProductAdswithMetadataPeter Mika
Petar Ristoski
Enriching Product Ads with Metadata from HTML Annotations10.1007/978-3-319-34129-3_102016