How to make ‘git diff’ ignore comments

Here is a solution that is working well for me. I’ve written up the solution and some additional missing documentation on the git (log|diff) -G<regex> option.

It is basically using the same solution as in previous answers, but specifically for comments that start with a * or a #, and sometimes a space before the *… But it still needs to allow #ifdef, #include, etc. changes.

Look ahead and look behind do not seem to be supported by the -G option, nor does the ? in general, and I have had problems with using *, too. + seems to be working well, though.

(Note, tested on Git v2.7.0)

Multi-Line Comment Version

git diff -w -G'(^[^\*# /])|(^#\w)|(^\s+[^\*#/])'
  • -w ignore whitespace
  • -G only show diff lines that match the following regex
  • (^[^\*# /]) any line that does not start with a star or a hash or a space
  • (^#\w) any line that starts with # followed by a letter
  • (^\s+[^\*#/]) any line that starts with some whitespace followed by a comment character

Basically an SVN hook modifies every file in and out right now and modifies multi-line comment blocks on every file. Now I can diff my changes against SVN without the FYI information that SVN drops in the comments.

Technically this will allow for Python and Bash comments like #TODO to be shown in the diff, and if a division operator started on a new line in C++ it could be ignored:

a = b
    / c;

Also the documentation on -G in Git seemed pretty lacking, so the information here should help:

git diff -G<regex>

-G<regex>

Look for differences whose patch text contains added/removed lines that match <regex>.

To illustrate the difference between -S<regex> --pickaxe-regex and -G<regex>,
consider a commit with the following diff in the same file:

+    return !regexec(regexp, two->ptr, 1, &regmatch, 0);
...
-    hit = !regexec(regexp, mf2.ptr, 1, &regmatch, 0);

While git log -G"regexec\(regexp" will show this commit,
git log -S"regexec\(regexp" --pickaxe-regex will not
(because the number of occurrences of that string did not change).

See the pickaxe entry in gitdiffcore(7) for more information.

(Note, tested on Git v2.7.0)

  • -G uses a basic regular expression.
  • No support for ?, *, !, {, } regular expression syntax.
  • Grouping with () and OR-ing groups works with |.
  • Wild card characters such as \s, \W, etc. are supported.
  • Look-ahead and look-behind are not supported.
  • Beginning and ending line anchors ^$ work.
  • Feature has been available since Git 1.7.4.

Excluded Files v Excluded Diffs

Note that the -G option filters the files that will be diffed.

But if a file gets “diffed” those lines that were “excluded/included” before will all be shown in the diff.

Examples

Only show file differences with at least one line that mentions foo.

git diff -G'foo'

Show file differences for everything except lines that start with a #

git diff -G'^[^#]'

Show files that have differences mentioning FIXME or TODO

git diff -G`(FIXME)|(TODO)`

See also git log -G, git grep, git log -S, --pickaxe-regex, and --pickaxe-all

UPDATE: Which regular expression tool is in use by the -G option?

https://github.com/git/git/search?utf8=%E2%9C%93&q=regcomp&type=

https://github.com/git/git/blob/master/diffcore-pickaxe.c

if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
    int cflags = REG_EXTENDED | REG_NEWLINE;
    if (DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE))
        cflags |= REG_ICASE;
    regcomp_or_die(&regex, needle, cflags);
    regexp = &regex;

// and in the regcom_or_die function
regcomp(regex, needle, cflags);

http://man7.org/linux/man-pages/man3/regexec.3.html

   REG_EXTENDED
          Use POSIX Extended Regular Expression syntax when interpreting
          regex.  If not set, POSIX Basic Regular Expression syntax is
          used.

// …

   REG_NEWLINE
          Match-any-character operators don't match a newline.

          A nonmatching list ([^...])  not containing a newline does not
          match a newline.

          Match-beginning-of-line operator (^) matches the empty string
          immediately after a newline, regardless of whether eflags, the
          execution flags of regexec(), contains REG_NOTBOL.

          Match-end-of-line operator ($) matches the empty string
          immediately before a newline, regardless of whether eflags
          contains REG_NOTEOL.

Leave a Comment

tech