Montag, 30. Mai 2011

Count Regular Expression matches perl line

Count Regular Expression matches per line in several files using a Batch file and Perl:
REM @ECHO OFF

ECHO # #################### _count.bat #################### #
ECHO # Script to count regexp in lines in all filtered files in a given directory.
ECHO # (requires Perl 5.10)
ECHO # (C) 2011/05/30 by Frank Glaser
ECHO # #################################################### #

REM BATCH
REM %d% directory
REM %e% extension
REM %~dp0 path from batch
REM %1 filename as batch parameter
REM %~1 batch parameter without quotes
REM for /f for loop with several files
REM for "delims=" avoid splitting of filenames with spaces
REM %%F actual input file from batch for loop
REM dir /b simple format
REM dir /d wideformat with column sort
REM > dump stdout into file

REM PERL
REM perl -e perl one-liner
REM perl -l turns on line-ending processing
REM perl -n don't print every line
REM $c count
REM $m max
REM $_ actual line of file
REM $. actual line number
REM $ARGV perl actual input file
REM \t print tab

REM change working folder
cd /d "%~dp0"

REM check batch parameter
IF [%1]==[] (
SET d=%~dp0
) ELSE (
SET d=%~1\
)

REM set file extension
SET e=*.txt

REM set regexp
SET r=\.

REM for all files, count regexp per line, find max and dump into log
(for /f "delims=" %%F in ('dir "%d%%e%" /b /d') do call perl.exe -lne "$c=0; $c++ while $_ =~ /%r%/g; $m=$m>$c?$m:$c; END{print \"$ARGV \t $m\"}" "%d%%%F") > _count.log

REM dump log file
echo.
type _count.log
echo.

REM open log file
_count.log

pause

Attention:
Output file should have another extension as the input files.

Perl portable for Windows:
http://sourceforge.net/projects/perlportable/
Alternative:
http://strawberryperl.com

Update:
If You want to use awk, just replace the Perl line:
(for /f "delims=" %%F in ('dir "%d%%e%" /b /d') do call gawk.exe "c=0;{while(match($0, /%r%/)){c++; $0 = substr($0, RSTART + RLENGTH); }m=m>c?m:c;}END{print FILENAME\" \"m;}" "%d%%%F") > _count.log
Gawk for Windows:
http://gnuwin32.sourceforge.net/packages/gawk.htm

Keine Kommentare:

Kommentar veröffentlichen