专利内容由知识产权出版社提供
专利名称:Method and system for identifying targeted
data on a web page
发明人:Bradley John Perry,Nancy Ann Perry,Daniel
Carl Marriott
申请号:US11234026申请日:20050923
公开号:US20070073758A1公开日:20070329
专利附图:
摘要:A method and system is provided that in a fully automated manner crawls websites and identifies specific types of web pages, then extracts targeted data from those
web pages. One or more text nodes containing product-related information on a firstweb page are first identified, and the locations of those text nodes are described usingone or more vectors. The vectors are then analyzed to identify one or more patterns andto generate a model from those patterns that discriminates between text nodes thatcontain product-related information and text nodes that do not contain product-relatedinformation on a second web page. The model can then be used to crawl web sites toidentify and extract targeted data, or the model can be installed on a user's computer toidentify and extract targeted information from web sites as the user is browsing.
申请人:Bradley John Perry,Nancy Ann Perry,Daniel Carl Marriott
地址:Boulder CO US,Boulder CO US,New York NY US
国籍:US,US,US
更多信息请下载全文后查看