HTML Hyperlink Extraction System

From GM-RKB
Jump to navigation Jump to search

An HTML Hyperlink Extraction System is an information extraction system that can extract HTML hyperlinks from an HTML item.

  • Example(s):
    • cat index.html | iconv -c -f utf-8 -t ascii | perl -ne 'chomp; print' | perl -ne 's/<font.*?>//gi; s/<span.*?>//gi; s/<td.*?>//gi; s/<img.*?>//gi; print ;' | perl -ne 's/<\/a.*?>/<\/A>/gi; s/(.*?)<a(.*) /<A$2/i; s/^(.*)<\/a(.*)/$1<\/A>/i; s/\/A>(.*?)<A/\/A>\n<A/g; print $_'
  • Counter-Example(s):
  • See: Web Crawler.