Tuesday, 30 October 2012

Sed and AWK Scripts

1. Sed

sed - stream editor for filtering and transforming text

Sed is a non-interactive  stream editor. It receives text input, whether from stdin or from a file, performs certain operations on specified lines of the input, one line at a time, then outputs the result to stdout or to a file. Within a shell script, sed is usually one of several tool components in a pipe.

All the operations in the sed toolkit, we will focus primarily on the three most commonly used ones.

These are printing (to stdout), deletion, and substitution.

Table 1. Basic sed operators
[address-range]/pprintPrint [specified address range]
[address-range]/ddeleteDelete [specified address range]
s/pattern1/pattern2/substituteSubstitute pattern2 for first instance of pattern1 in a line
[address-range]/s/pattern1/pattern2/substituteSubstitute pattern2 for first instance of pattern1 in a line, over address-range
[address-range]/y/pattern1/pattern2/transformreplace any character in pattern1 with the corresponding character in pattern2, over address-range(equivalent of tr)
gglobalOperate on every pattern match within each matched line of input

NoteUnless the g (global) operator is appended to a substitute command, the substitution operates only on the first instance of a pattern match within each line.

sed -e '/^$/d' $filename
# The -e option causes the next string to be interpreted as an editing instruction.
#  (If passing only a single instruction to sed, the "-e" is optional.)
#  The "strong" quotes ('') protect the RE characters in the instruction
# Operates on the text contained in file $filename.

NoteSed uses the -e option to specify that the following string is an instruction or set of instructions. If there is only a single instruction contained in the string, then this may be omitted.

sed -n '/xzy/p' $filename
# The -n option tells sed to print only those lines matching the pattern.
# Otherwise all input lines would print.
# The -e option not necessary here since there is only a single editing instruction.

Table 2. Examples of sed operators
8dDelete 8th line of input.
/^$/dDelete all blank lines.
1,/^$/dDelete from beginning of input up to, and including first blank line.
/Jones/pPrint only lines containing "Jones" (with -n option).
s/Windows/Linux/Substitute "Linux" for first instance of "Windows" found in each input line.
s/BSOD/stability/gSubstitute "stability" for every instance of "BSOD" found in each input line.
s/ *$//Delete all spaces at the end of every line.
s/00*/0/gCompress all consecutive sequences of zeroes into a single zero.
/GUI/dDelete all lines containing "GUI".
s/GUI//gDelete all instances of "GUI", leaving the remainder of each line intact.
NoteThe usual delimiter that sed uses is /. However, sed allows other delimiters, such as %. This is useful when / is part of a replacement string,

Shell Wrappers

wrapper is a shell script that embeds a system command or utility, that accepts and passes a set of parameters to that command.

Wrapping a script around a complex command-line simplifies invoking it.

1. This simple script removes blank lines from a file.

sed -e /^$/d "$1"
# Same as
#    sed -e '/^$/d' filename
# invoked from the command-line.

#  The '-e' means an "editing" command follows (optional here).
#  '^' indicates the beginning of line, '$' the end.
#  This matches lines with nothing between the beginning and the end --
#+ blank lines.
#  The 'd' is the delete command.

2. A script that substitutes one pattern for another in a file.

#  Here is where the heavy work gets done.
sed -e "s/$old_pattern/$new_pattern/g" $file_name

#  's' is, of course, the substitute command in sed,
#+ and /pattern/ invokes address matching.
#  The 'g,' or global flag causes substitution for EVERY
#+ occurence of $old_pattern on each line, not just the first.
#  Read the 'sed' docs for an in-depth explanation.

exit $?  # Redirect the output of this script to write to a file.

2. Awk

Awk is a full-featured text processing language with a syntax reminiscent of C. While it possesses an extensive set of operators and capabilities.

Awk breaks each line of input passed to it into fields. By default, a field is a string of consecutive characters delimited by whitespace.

Awk parses and operates on each separate field. This makes it ideal for handling structured text files -- especially tables -- data organized into consistent chunks, such as rows and columns.

[sankar@new-host ~]$ echo one two
one two
[sankar@new-host ~]$ echo one two | awk '{print $1}'

[sankar@new-host ~]$ echo one two|awk '{print $0}'
one two

the awk print command in action. The only other feature of awk we need to deal with here is variables. Awk handles variables similarly to shell scripts, though a bit more flexibly.

No comments:

Post a Comment