User: wdtsf -- 2011-02-01 << 710 712 >> |
Hits: 4527 |
Type: Text file parser |
Search all Text file parser examples |
Description: |
How to extract all specified links from a html file? thanks! |
Input Sample: |
<div class="main_w"> <div class="content_a"> <div class="rankTitle"> <h1>Ìì½òÃÀʳµãÆÀ(×î½üºÃÆÀ)</h1> <div class="right">ÅÅÐò: <strong><a href="javascript:void(0);" id="orderTitleDiv">×î½üºÃÆÀ</a></strong> <div id="odrop"><ul><li><a href="/reviewlist/10/10_ac1" class="B">»ØÓ¦Êý</a></li><li><a href="/reviewlist/10/10_bc1" class="B">ÏÊ»¨Êý</a></li><li><a href="/reviewlist/10/10_cc1" class="B">ʱ¼ä</a></li></ul></div> </div> </div> <dl id="rev_25979207" class="contList"><dt><div cla |
Output Sample: |
http://www.thankyou.com/shop/4212402 http://www.thankyou.com/shop/3445258 http://www.thankyou.com/shop/2192851 http://www.thankyou.com/shop/3369571 http://www.thankyou.com/shop/4282263 http://www.thankyou.com/shop/4129080 http://www.thankyou.com/shop/4193263 http://www.thankyou.com/shop/4281592 http://www.thankyou.com/shop/2339239 http://www.thankyou.com/shop/1945840 |
Answer: |
Hint: You need to Download and install "Replace Pioneer" on windows platform to finish following steps. |
Following procedure extract all links that contain "shop": 1. ctrl-o open html file 2. ctrl-h open 'replace' window * set 'replace with pattern' to: 3. click 'replace', done. 4. ctrl-s save to file. Note: if you need to remove # mark after http address, and remove duplicated address, use: |
Screenshot 1: Replace_Window |
Similar Examples: |
How to extract all image links from a html file? (85%) How to batch extract specified lines from a text file? (79%) Need to extract all links from html file (79%) How to extract all specific links from webpage? (78%) How to extract all image links from multiple html files? (76%) How to extract all specified date format from a text file? (76%) How to extract specified text from pdf files? (72%) How to extract specified lines in multiple excel(csv) files? (72%) |
Check Demo of Text file parser |
Keywords: |
mark grep duplicated extract all links remove duplicate remove duplicat remove dupl remove dup links duplicate remove links remove after html save links remove address extract http links extract links from html file extract links from html extract all http links |