LINUX - termsearch - search terms script - using grep

This is a working program that I use to scour through lots of text for keywords. I put the keywords in the terms file, one line at a time (keywords or key phrases so they can have spaces). Each launches a recursive grep through whatever folder I need to look for those phrases. Best part is each search/grep is launched at same time (so you dont have to wait for one to finish before next begins)
This is based off: http://www.kossboss.com/linux---grep-for-words-at-the-same-time
The main script which has everything is look.sh (it contains the other scripts in it, they are just commented away for redundancy). It needs terms file with it (which you edit to include your terms). The other helper scripts are given as well (also as comments in look.sh)
look.sh <folder or file to search thru> that launches the script which makes the result files
Each result file looks like this _allSS_<term looked for>_<folder or file name given>.txt - you get as many result files as lines in terms that had valid terms.
Valid terms files are like:
term1
term2
term3
this is a phrase1 that has term4
phrase2 has a space as well
Then use any of the 3 monitoring scripts to monitor progress:
./monitor1-tail.sh - this shows the processes and also the last few lines of each result file (live as terms/phrases are being found)
./monitor2-lines.sh - this shows the processes and also the number of lines found in each result file (number of times each term was found, also live info)
./monitor3-processes.sh - this just shows the processes information that you get in monitor1 and monitor2 scripts (also it has system load and memory)
When your done you can concat all of your results to one file with ./together.sh
Move them to a folder (that will be made) - simple mkdir and mv script: ./move.sh <folder name>
Lets say you launched a whole group of grep commands and they are making result files already, need to clean up? run ./cleanup.sh
./print.script just outputs what you see below (completely optional) - it formats output with seperators and gives statistics about each file (word count, line count, etc...)
./backup.script is my own script that I run after I modify any of the source code - makes it easy to bundle everything up



NOTE: look.sh program below has every script. its the main script, and it has all of the optional scripts commented out (for redundancy)



TERMSEARCH SOURCE CODE
#######################
#######################

SOURCE CODE PRINTED USING print.script ON Fri Jan 10 00:01:24 PST 2014
Printing Code From the following files:
cleanup.sh extract.sh look.sh monitor1-tail.sh monitor2-lines.sh monitor3-processes.sh move.sh nicelook-highpriority.sh nicelook.sh renice-high-priority.sh renice-normal.sh renice.sh together.sh terms backup.script print.script



***************************************************
***************************************************
SOURCE CODE OF FILE: cleanup.sh
---------------------------------------------------
Date of code: Fri Jan 10 00:01:24 PST 2014
# of lines: 5 cleanup.sh
# of bytes: 66 cleanup.sh
# of chars: 66 cleanup.sh
# of words: 11 cleanup.sh
Longer line length: 23 cleanup.sh
***************************************************
***************************************************

#!/bin/bash
# last update: 1/3/2014

killall -9 grep
rm -rf _all*



***************************************************
***************************************************
SOURCE CODE OF FILE: look.sh
---------------------------------------------------
Date of code: Fri Jan 10 00:01:25 PST 2014
# of lines: 227 look.sh
# of bytes: 10725 look.sh
# of chars: 10725 look.sh
# of words: 1767 look.sh
Longer line length: 239 look.sh
***************************************************
***************************************************

#!/bin/bash
# last update 1/9/2014
# as per: http://www.kossboss.com/linux---grep-for-words-at-the-same-time

# make sure to have a terms file called "terms" in the a directory before, 1 above, the directory you want to search - just like in this layout
# /some/path/search/this/<everything here>
# /some/path/search/look.sh - look.sh will look for the keywords listed in the "terms" file and only thru everything inside the folder "this" (/some/path/search/this/)
# note the folder that needs to be specified has to not search thru the current working directory, so either search something 1 level deep from the current working directory, or some other branch in the filesystem tree
# the search location is specified with the variabler RELDIR1, in the above example I would set RELDIR1="this" or RELDIR1="/some/path/search/this" - note i didnt put the slash / at end (its optional)
# also note it can be relative or absolute path
# in program below (the runnable part not commented out - aka the main part of the look program) the only part that is supposed to be edited (again thats the RELDIR1 variable)...
# is set to a folder that I have been looking through 12-11-2013, thats the folders actual relative path.
# It sits right next to the look.sh and terms file (so terms file and look.sh and results files and the serach directory are all in the same folder
# the folder that they are all in doesnt matter - just like the part /some/path/search/ doesnt matter in the whole example - unless of course I gave the full path to RELDIR1). Just like in this example
# /some/path/search/terms - note term file has keyword per line (no spaces per keyword, also realize that search is case insensitive)
# /some/path/search/<results go here>
# /some/path/search/monitor1-tail.sh - optional used to just monitor the long look.sh operation - shows the end of each result file on the go (as terms are being searched thru)
# /some/path/search/monitor2-lines.sh - optional used to just monitor the long look.sh operation - shows the line count in each result file on the go
# /some/path/search/monitor3-processes.sh - optional used to just monitor the long look.sh operation - just shows the processes and memory load (this info included in both of the above monitor scripts)
# /some/path/search/cleanup.sh - optional, cleans (kills all grep commands - even ones not started by look.sh, also deletes all files that start with _all aka the result files) - good for starting fresh if messed something up
# /some/path/search/together.sh - optional, concats all _all files in current directory - only run once as the file will grow exponentially everytime its run - so make sure a cleanup is run first - it still keeps originals
# /some/path/search/move.sh - optional, this moves all of the results (all files with _all*) and copies terms file into a new directory (directory is made as well)
# if you do this alot, i recommend doing the runs like this: edit terms, run clean up, run look, run monitor (immediately after or slightly after look script), then run the together script if you want it all together, then the move script.
# you will notice nice and priority scripts:
# ./nicelook-highpriority.sh <path>: runs look and all greps with nice of -19 so high priority good for short ops
# ./nicelook.sh <path>: runs look and all grep with nice of 19 so low priority good for long ops
# ./renice-high-priority.sh: if job already running can change prioritys to high if taking too long (note might impact cpu alot)
# ./renice-normal.sh: changes all jobs to nice 0 (as if was run without nice or prioty so its like running app with just ./look.sh)
# ./renice.sh: if job already running can change prioritys to super low (Aka nice) (good for long ops)

################################
# THE LOOK PROGRAM             #
#                              #
# - the heart of the program   #
#                              #
#                              #
################################

#
# check if argument is there
# if not show usage
# check if argument is a directory or a file
# if not show usage
#

# usage function
usage123 () {
ME123=`basename $0`
echo "termsearch - by kostia - 2014"
echo "Your using ./look.sh - for you it has the name of ./$ME123"
echo "Usage:"
echo "./$ME123 <directory or file to look thru>"
echo "RESULTS: It will output files with name _allSS_<term>_<filename or directory given>"
echo "REQUIREMENT: The terms it will look thru are listed SINGLE LINE at a time in a \'terms\' file that sits in the same folder as this file"
echo "NOTE: each term needs to be seperated with a new line"
echo "--DO NOT HAVE SPACES LIKE term1 term2 term3--"
echo "Example of terms file:"
echo "# cat terms"
echo "term1"
echo "term2"
echo "* After running this, all search tasks - ran with grep - will be in the background"
echo "* You can monitor your operation with MONITOR1 or MONITOR2 script"
echo "* You can clean up everything with the CLEAN UP script - stops all grep operations and deletes all result files"
echo "* You can concat all of the results together into 1 file - still keeping the originals - into one file with TOGETHER script"
echo "* Finally you can move all of your results and together file - if you made one - into a folder using the move script"
exit 1
}
# main code

if [ $# -eq 0 ] # 0 args
then
usage123
fi

if [ $# -gt 1 ] # more then 1 arg
then
usage123
fi

if [ -z "$1" ] # no argument in first arg
then
usage123
fi

if [ -d "$1" ] ; then # if directory
echo "Will look thru DIRECTORY: $1"
else
if [ -f "$1" ] ; then
echo "Will look thru FILE: $1"
else
usage123
fi
fi

if [ -f "terms" ] ; then
logger "termsearch thru $RELDIR1 for `wc -l terms` term[s]"
else
echo "ERROR: \'terms\' file is missing - it has to be in this directory and nowhere else"
echo "More info on terms below"
echo "---"
usage123
fi

# cant have file names with /, or else thats a directory - this doesnt affect filenames
DIR123=${1%/}
# change / to - for filename
FILESUF123=`echo ${DIR123} | tr / -`
echo "**** the file suf is kostia__${FILESUF123}__kostia ****"

# IFS controls for loop new line char usually its space, but for now its new line
# This way terms can have spaces in each line
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")

RELDIR1=${DIR123};
for i in `cat terms`; do
echo "=======================";
date;
echo "Looking for: ${i}";
echo "=======================";
TERM4FILE=`echo -n ${i} | tr ' ' -`
(time (grep -nir ${i} ${RELDIR1}) >& _allSS_${TERM4FILE}_${FILESUF123}.txt &);
echo "----------------------";
echo "Started PID: $!";
(ps awfuxx | egrep 'grep|USER' | nl;);
echo;
done;
echo "##############################";
echo "FINAL JOBS:";
(ps awfuxx | egrep 'grep|USER' | nl;);
echo
echo "Looking for `wc -l terms` at the same time!"

IFS=$SAVEIFS

exit 0

# the monitor scripts
# the are below

# the clean up script
# cleanup.sh (copy whole section to new file and remove only the first hash mark)
##!/bin/bash
#killall -9 grep
#rm -rf _all*

# the put together script - puts all results together to 1 file
# together.sh (copy whole section to new file and remove only the first has mark)
##!/bin/bash
#DST1="_allTOGETHER_.txt"; for i in `ls | grep _allSS`; do echo "CONCATTING FILE: $i >> $DST1"; echo -e "TERM: $i\n###################" >> $DST1; cat $i >> $DST1; echo >> $DST1; done;

# The move script, this will move all of the results (including the one if you used together script) but not the terms file(terms file is just copied - all into one directory with the given name. You can use relative or absolute path.
# example ./move.sh search-nodeleaf - then all of the
# this is the only script that needs an input arguments
##!/bin/bash
## USE: ./move.sh <path-to-move-to>
##$ last update: 01-05-2014
#mkdir $1
#mv _allSS_* $1
## note only copy terms, so terms files stays for next round
#cp terms $1

##############################
# UPDATES ON MONITOR SCRIPTS #
##############################

# just remove 1 hash mark to make em work

#MONITOR SCRIPT: monitor1-tail.sh
##!/bin/bash
## last update: 1/7/2014
#watch 'top -c -b -n1 | head -n5; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep | egrep -v egrep;); echo; echo "RESULTS:"; echo "==========="; tail _all*;'
## if you want to do with while loop:
## while true; do clear; "=====`date`======";  echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep;); echo; echo "RESULTS:"; echo "==========="; tail _all*; sleep 1; done;
#
#MONITOR SCRIPT: monitor2-lines.sh
##!/bin/bash
## last update 1/7/2014
#watch 'top -c -b -n1 | head -n5; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep | egrep -v egrep;); echo; echo "RESULTS (number of lines):"; echo "==========================";  wc -l _allS*;'
## if you wann do with while loop:
## while true; do clear; "=====`date`======";  echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep;); echo; echo "RESULTS (number of lines):"; echo "=========================="; wc -l _allS*; sleep 1; done;

#MONITOR SCRIPT: monitor3-processes.sh
##!/bin/bash
## last update 1/7/2014
#watch 'top -c -b -n1 | head -n5; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep | egrep -v egrep;);'
## if you wann do with while loop:
## while true; do clear; "=====`date`======";  echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep;); echo; echo "RESULTS (number of lines):"; echo "=========================="; wc -l _allS*; sleep 1; done;

##iSCRIPT FOR CPU CONTROL: nicelook-highpriority.sh
###############################################
##!/bin/bash
## update: 1/9/2014
## usage: ./nicelook-highpriority.sh <path>
## just run look with more priority
#nice -n -19 ./look.sh ${1}

##SCRIPT FOR CPU CONTROL: nicelook.sh
###############################################
##!/bin/bash
## update: 1/9/2014
## usage: ./nicelook.sh <path>
## just run look with nice
#nice -n 19 ./look.sh ${1}

##SCRIPT FOR CPU CONTROL: renice-high-priority.sh
###############################################
##!/bin/bash
## update: 1/9/2014
## make nice of the program more cpu
## prioritize
#for i in `ps ax | egrep grep | egrep -v egrep | awk '{print $1}'`; do renice -n -19 -p $i; done;

##SCRIPT FOR CPU CONTROL: renice-normal.sh
###############################################
##!/bin/bash
## update: 1/9/2014
## this put nice back to 0 default so it runs as if was run by ./look and not ./nicelook*
#for i in `ps ax | egrep grep | egrep -v egrep | awk '{print $1}'`; do renice -n 0 -p $i; done;

##SCRIPT FOR CPU CONTROL: renice.sh
###############################################
##!/bin/bash
## update: 1/9/2014
## make nice of the program less cpu
## good for huge ops
#for i in `ps ax | egrep grep | egrep -v egrep | awk '{print $1}'`; do renice -n 19 -p $i; done;



***************************************************
***************************************************
SOURCE CODE OF FILE: monitor1-tail.sh
---------------------------------------------------
Date of code: Fri Jan 10 00:01:25 PST 2014
# of lines: 5 monitor1-tail.sh
# of bytes: 427 monitor1-tail.sh
# of chars: 427 monitor1-tail.sh
# of words: 67 monitor1-tail.sh
Longer line length: 181 monitor1-tail.sh
***************************************************
***************************************************

#!/bin/bash
# last update: 1/7/2014
watch 'top -c -b -n1 | head -n5; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep | egrep -v egrep;); echo; echo "RESULTS:"; echo "==========="; tail _all*;'
# if you want to do with while loop:
# while true; do clear; "=====`date`======";  echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep;); echo; echo "RESULTS:"; echo "==========="; tail _all*; sleep 1; done;



***************************************************
***************************************************
SOURCE CODE OF FILE: monitor2-lines.sh
---------------------------------------------------
Date of code: Fri Jan 10 00:01:25 PST 2014
# of lines: 5 monitor2-lines.sh
# of bytes: 494 monitor2-lines.sh
# of chars: 494 monitor2-lines.sh
# of words: 74 monitor2-lines.sh
Longer line length: 216 monitor2-lines.sh
***************************************************
***************************************************

#!/bin/bash
# last update 1/7/2014
watch 'top -c -b -n1 | head -n5; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep | egrep -v egrep;); echo; echo "RESULTS (number of lines):"; echo "==========================";  wc -l _allS*;'
# if you wann do with while loop:
# while true; do clear; "=====`date`======";  echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep;); echo; echo "RESULTS (number of lines):"; echo "=========================="; wc -l _allS*; sleep 1; done;



***************************************************
***************************************************
SOURCE CODE OF FILE: monitor3-processes.sh
---------------------------------------------------
Date of code: Fri Jan 10 00:01:25 PST 2014
# of lines: 5 monitor3-processes.sh
# of bytes: 403 monitor3-processes.sh
# of chars: 403 monitor3-processes.sh
# of words: 63 monitor3-processes.sh
Longer line length: 216 monitor3-processes.sh
***************************************************
***************************************************

#!/bin/bash
# last update 1/7/2014
watch 'top -c -b -n1 | head -n5; echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep | egrep -v egrep;);'
# if you wann do with while loop:
# while true; do clear; "=====`date`======";  echo "PROCESSES:"; echo "==========="; (ps awfuxx | egrep grep;); echo; echo "RESULTS (number of lines):"; echo "=========================="; wc -l _allS*; sleep 1; done;



***************************************************
***************************************************
SOURCE CODE OF FILE: move.sh
---------------------------------------------------
Date of code: Fri Jan 10 00:01:25 PST 2014
# of lines: 8 move.sh
# of bytes: 172 move.sh
# of chars: 172 move.sh
# of words: 29 move.sh
Longer line length: 59 move.sh
***************************************************
***************************************************

#!/bin/bash
# USE: ./move.sh <path-to-move-to>
#$ last update: 01-05-2014
mkdir $1
mv _allSS_* $1
# note only copy terms, so terms files stays for next round
cp terms $1




***************************************************
***************************************************
SOURCE CODE OF FILE: together.sh
---------------------------------------------------
Date of code: Fri Jan 10 00:01:25 PST 2014
# of lines: 3 together.sh
# of bytes: 218 together.sh
# of chars: 218 together.sh
# of words: 34 together.sh
Longer line length: 182 together.sh
***************************************************
***************************************************

#!/bin/bash
# last update 1/3/2014
DST1="_allTOGETHER_.txt"; for i in `ls | grep _allSS`; do echo "CONCATTING FILE: $i >> $DST1"; echo -e "TERM: $i\n###################" >> $DST1; cat $i >> $DST1; echo >> $DST1; done;



***************************************************
***************************************************
SOURCE CODE OF FILE: nicelook-highpriority.sh
---------------------------------------------------
Date of code: Fri Jan 10 00:01:25 PST 2014
# of lines: 5 nicelook-highpriority.sh
# of bytes: 136 nicelook-highpriority.sh
# of chars: 136 nicelook-highpriority.sh
# of words: 20 nicelook-highpriority.sh
Longer line length: 42 nicelook-highpriority.sh
***************************************************
***************************************************

#!/bin/bash
# update: 1/9/2014
# usage: ./nicelook-highpriority.sh <path>
# just run look with more priority
nice -n -19 ./look.sh ${1}



***************************************************
***************************************************
SOURCE CODE OF FILE: nicelook.sh
---------------------------------------------------
Date of code: Fri Jan 10 00:01:25 PST 2014
# of lines: 5 nicelook.sh
# of bytes: 113 nicelook.sh
# of chars: 113 nicelook.sh
# of words: 19 nicelook.sh
Longer line length: 29 nicelook.sh
***************************************************
***************************************************

#!/bin/bash
# update: 1/9/2014
# usage: ./nicelook.sh <path>
# just run look with nice
nice -n 19 ./look.sh ${1}



***************************************************
***************************************************
SOURCE CODE OF FILE: renice-high-priority.sh
---------------------------------------------------
Date of code: Fri Jan 10 00:01:25 PST 2014
# of lines: 5 renice-high-priority.sh
# of bytes: 177 renice-high-priority.sh
# of chars: 177 renice-high-priority.sh
# of words: 37 renice-high-priority.sh
Longer line length: 96 renice-high-priority.sh
***************************************************
***************************************************

#!/bin/bash
# update: 1/9/2014
# make nice of the program more cpu
# prioritize
for i in `ps ax | egrep grep | egrep -v egrep | awk '{print $1}'`; do renice -n -19 -p $i; done;



***************************************************
***************************************************
SOURCE CODE OF FILE: renice-normal.sh
---------------------------------------------------
Date of code: Fri Jan 10 00:01:25 PST 2014
# of lines: 4 renice-normal.sh
# of bytes: 215 renice-normal.sh
# of chars: 215 renice-normal.sh
# of words: 47 renice-normal.sh
Longer line length: 94 renice-normal.sh
***************************************************
***************************************************

#!/bin/bash
# update: 1/9/2014
# this put nice back to 0 default so it runs as if was run by ./look and not ./nicelook*
for i in `ps ax | egrep grep | egrep -v egrep | awk '{print $1}'`; do renice -n 0 -p $i; done;



***************************************************
***************************************************
SOURCE CODE OF FILE: renice.sh
---------------------------------------------------
Date of code: Fri Jan 10 00:01:26 PST 2014
# of lines: 5 renice.sh
# of bytes: 183 renice.sh
# of chars: 183 renice.sh
# of words: 40 renice.sh
Longer line length: 95 renice.sh
***************************************************
***************************************************

#!/bin/bash
# update: 1/9/2014
# make nice of the program less cpu
# good for huge ops
for i in `ps ax | egrep grep | egrep -v egrep | awk '{print $1}'`; do renice -n 19 -p $i; done;



***************************************************
***************************************************
SOURCE CODE OF FILE: terms
---------------------------------------------------
Date of code: Fri Jan 10 00:01:26 PST 2014
# of lines: 3 terms
# of bytes: 18 terms
# of chars: 18 terms
# of words: 3 terms
Longer line length: 5 terms
***************************************************
***************************************************

term1
term2
term3



***************************************************
***************************************************
SOURCE CODE OF FILE: backup.script
---------------------------------------------------
Date of code: Fri Jan 10 00:01:26 PST 2014
# of lines: 19 backup.script
# of bytes: 984 backup.script
# of chars: 984 backup.script
# of words: 118 backup.script
Longer line length: 115 backup.script
***************************************************
***************************************************

#!/bin/bash
SCRIPTS="cleanup.sh look.sh monitor1-tail.sh monitor2-lines.sh monitor3-processes.sh move.sh together.sh"
SCRIPTS="$SCRIPTS nicelook-highpriority.sh nicelook.sh renice-high-priority.sh renice-normal.sh renice.sh"
echo "* REMOVING OLD TGZ FILE"
rm -f termsearch.tgz
echo "* RENAMING TERMS TO .tempterms123"
mv terms .tempterms123
echo "* MAKING SHOWCASE terms FILE WITH TERMS: term1 THRU term3"
echo -e "term1\nterm2\nterm3" > terms
echo "* GRABBING SOURCE CODE"
./print.script > termsearch-all-code.txt
echo "* TAR GZIPPING EVERY SCRIPT AND terms FILE INTO termsearch.tgz"
tar -zcvf termsearch.tgz termsearch-all-code.txt ${SCRIPTS} terms backup.script print.script
echo "* COPYING termsearch.tgz AND ALL SOURCE CODE termsearch-all-code.txt TO WEB SERVER /var/www"
cp termsearch.tgz termsearch-all-code.txt /var/www/
echo "* RENAME TEMP TERMS BACK TO ORIGINAL terms FILE - REMOVING SHOWCASE terms FILE AND RESTORING YOUR terms FILE"
mv .tempterms123 terms
echo "* DONE!"




***************************************************
***************************************************
SOURCE CODE OF FILE: print.script
---------------------------------------------------
Date of code: Fri Jan 10 00:01:26 PST 2014
# of lines: 35 print.script
# of bytes: 1127 print.script
# of chars: 1127 print.script
# of words: 124 print.script
Longer line length: 130 print.script
***************************************************
***************************************************

#!/bin/bash
# update 1/9/2014
SCRIPTS="cleanup.sh look.sh monitor1-tail.sh monitor2-lines.sh monitor3-processes.sh move.sh together.sh" #cant include extract.sh
SCRIPTS="$SCRIPTS nicelook-highpriority.sh nicelook.sh renice-high-priority.sh renice-normal.sh renice.sh"
echo "TERMSEARCH SOURCE CODE"
echo "#######################"
echo "#######################"
echo
echo "SOURCE CODE PRINTED USING `basename $0` ON `date`"
echo "Printing Code From the following files:"
echo *sh terms backup.script print.script
echo
echo
echo
for i in ${SCRIPTS} terms backup.script print.script
do
echo "***************************************************"
echo "***************************************************"
echo "SOURCE CODE OF FILE: $i"
echo "---------------------------------------------------"
echo "Date of code: `date`"
echo "# of lines: `wc -l $i`"
echo "# of bytes: `wc -c $i`"
echo "# of chars: `wc -m $i`"
echo "# of words: `wc -w $i`"
echo "Longer line length: `wc -L $i`"
echo "***************************************************"
echo "***************************************************"
echo
cat $i
echo
echo
echo
done

Comments