Replace Pioneer Home   All Examples   Free Download

 New request --free  RSS: Replace Pioneer Examples

852.Text file parser -- How to filter out all web pages that does not contain specified words?

User: bruce lee -- 2011-09-09          << 851  853 >>
Hits: 1707
Type: Text file parser   
Search all Text file parser examples
I have many web URL and want to know if there are "about","product"and "contact"
hyperlink in every web and if there is "chemical" in "about" page.If not then delete it.
Input Sample:
Output Sample:
Hint: You need to Download and install "Replace Pioneer" on windows platform to finish following steps.
The question is a little complicated, here we only provide solution to remove all web pages that does not contain some words, like 'about', 'product', 'contact'.
1. prepare a webpage list file, each line contain an address start with 'http'
2. open 'Tools->Batch Runner' menu, and select 'import list' to import webpage list file
3. click 'fast replace' button open 'fast replace' window
4. click 'add' to add new rule:
* set 'search' to:

* set 'replace' to:

click ok
5. check option of 'reg exp' and 'extract'
6. click 'start' and select 'output to single file' and select a file for output, done.

Screenshot 1:  Fast_Replace_Window

Similar Examples:
How to extract all sentences that contain specified words? (73%)
How to findout all lines that contain specified words in multiple files? (68%)
How to extract all lines that contain specific words in a file? (67%)
How to extract all lines that contain specified words or phrases? (67%)
How to remove all the lines that do not contain any of words in a list? (65%)
How to extract all records that contains specified field values? (63%)
How to list out all the lines which has having specified keyword.  (63%)
How to extract all lines do not contain a list of keywords? (62%)

Check Demo of Text file parser
remove all web pages that does not contain some words like about product contact  web pages that does not contain  contact  hyperlink  about  question  perl  web pages  url  pages  delete specified words  delete address  delete list  add list of words  delete specified  remove address  add words each line  many rule