May 10, 2014
Tinysnip #1: Colored diff with mksh or awk
For the record, when I started this article I wanted to "share" a snippet related to awk. It was almost ready for a publishing, but I had another idea, just before pushing the "nuke" button. I'm going to show you how to proceed, using two manners (mksh and awk).
NOTE: This article was improved compared to the first version. Now, I tend to use the awk(1) snippet. The mksh(1) variant is still provided but is less "powerful". Therefore, the first snippet was removed from the repository.
I need to compare two files very often and I'm unable to find
something better than diff -u
. The output is pretty
clear and it can be improved with
colordiff. It's a
perl(1) wrapper which adds color. Now, you have no excuse to
misunderstand the changes.
However, there is a problem… perl(1)
is not installed by default on all machines (some distros have it in
the base) and I find it quite "heavy", for this trivial task.
Anyway, my "dirty" and "hacky" mind found
a solution two solutions.
* mksh
Why not use mksh(1), the awesome shell?
while IFS= read -r LINE; do
CHAR="${LINE::1}"
case $CHAR in
\-) print -r -- "\033[1;31m${LINE}\033[0m" ;;
\+) print -r -- "\033[0;32m${LINE}\033[0m" ;;
\@) print -r -- "\033[1;35m${LINE}\033[0m" ;;
*) print -r -- "$LINE" ;;
esac
done
We analyze the first character on each line using
CHAR="${LINE::1}"
and we apply syntax highlighting,
according to this CHAR
(green for addition, red for
deletion and bright magenta for the context). For non-matching
lines, print
just show them.
When a line was modified, it's necessary to reuse the default
colors. Fortunately \033[0m
does the trick. It can be
invoked like this diff -u file1 file2 | yiff
(yiff
is the script itself).
* awk
Why not use awk(1), the "venerable" tool?
symb = substr($0,1,1) {
# Highlight end line space(s)/tab(s)
gsub(/[\t ]+$/,"\033[0;41m \033[0m")
if (symb == "@") {
printf("\033[1;35m%s\033[0m\n", $0)
} else if (symb == "-") {
printf("\033[0;31m%s\033[0m\n", $0)
} else if (symb == "+") {
printf("\033[0;32m%s\033[0m\n", $0)
} else {
printf("%s\n", $0)
}
}
More I work with awk(1), more I love it... It's really simple, powerful, fast and more understandable than sed(1), IMHO.
symb = substr($0,1,1)
declares a variable which will
match the first char, on every line (-
, +
,
@
or just a space). It will be used later (don't be in
a hurry, comrade). Then, we append
ANSI escapes codes
and we color the whole line.
For example: red for a deletion, green for an addition and magenta
for range information. The colors could be easily changed.
When the line ends, we have to restore the default foreground.
If not, the next line will be highlighted even if it didn't change
and you'll be confused. That's why "\033[0m"
is added,
at the end of "special" lines. It's not needed with unchanged lines.
For testing purposes, let's replace "\033[0m"
by
" endline"
:
--- file1 2014-05-09 14:58:12.066021088 +0200 endline
+++ file2 2014-05-09 14:59:16.299352783 +0200 endline
@@ -1,5 +1,6 @@ endline
line 1 coco
line 2 moo
-line 3 what endline
+line 8 nothing endline
line 4 shell
-line 5 bro endline
+line 10 hehe endline
+line 24 limit endline
As you can see endline
is only "echoed" on the
lines starting by the required glyphs. I also wanted to color the
spaces / tabs at the end of the line(s), for the reason that they
are almost hidden. The trailing whitespace(s) are matched and
reported red (background). That was useful to me several times.
The snippet was created on Debian with mawk(1). It's ok with mawk(1), nawk(1) and gawk(1). As usual, it's available on git.
Assuming you prefer "traditional" diff(1) output, replace
+
and -
by >
and
<
. By the way, do I need to mention that my
"hacks" are faster than colordiff(1) ?
$ time diff -u file{1,2} | yiff
0m0.09s real 0m0.00s user 0m0.00s system
$ time diff -u file{1,2} | yiffawk
0m0.08s real 0m0.00s user 0m0.00s system
$ time colordiff -u file{1,2}
0m0.14s real 0m0.03s user 0m0.00s system
It was tested on a file with ~ 900 lines (20 differences). Yeah, I know the test isn't representative at all. colordiff(1) includes more options, but I'm not using them (big deal, right?). I'm glad to see I can do things faster...
N.B.: A reader asked me "Why not add a POSIX version?".
Hum, on this blog I usually try to focus on
(m)ksh(1)
(or awk(1)), because I really "love" it
and also because ksh is not very used/known, in Linux world. In that
article, the awk(1) version is a solid
and efficient alternative, when your shell doesn't support
${var::1}
(or just use cut(1)
instead).