mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-10-31 14:27:54 +01:00

302 lines

9.6 KiB

C

Raw Normal View History

diff: cache textconv output Running a textconv filter can take a long time. It's particularly bad for a large file which needs to be spooled to disk, but even for small files, the fork+exec overhead can add up for something like "git log -p". This patch uses the notes-cache mechanism to keep a fast cache of textconv output. Caches are stored in refs/notes/textconv/$x, where $x is the userdiff driver defined in gitattributes. Caching is enabled only if diff.$x.cachetextconv is true. In my test repo, on a commit with 45 jpg and avi files changed and a textconv to show their exif tags: [before] $ time git show >/dev/null real 0m13.724s user 0m12.057s sys 0m1.624s [after, first run] $ git config diff.mfo.cachetextconv true $ time git show >/dev/null real 0m14.252s user 0m12.197s sys 0m1.800s [after, subsequent runs] $ time git show >/dev/null real 0m0.352s user 0m0.148s sys 0m0.200s So for a slight (3.8%) cost on the first run, we achieve an almost 40x speed up on subsequent runs. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-02 02:12:15 +02:00			`#include "cache.h"`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00			`#include "userdiff.h"`
			`#include "attr.h"`

			`static struct userdiff_driver *drivers;`
			`static int ndrivers;`
			`static int drivers_alloc;`

Change the spelling of "wordregex". Use "wordRegex" for configuration variable names. Use "word_regex" for C language tokens. Signed-off-by: Boyd Stephen Smith Jr. <bss@iguanasuicide.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-21 05:59:54 +01:00			`#define PATTERNS(name, pattern, word_regex) \`
userdiff: simplify word-diff safeguard git's diff-words support has a detail that can be a little dangerous: any text not matched by a given language's tokenization pattern is treated as whitespace and changes in such text would go unnoticed. Therefore each of the built-in regexes allows a special token type consisting of a single non-whitespace character [^[:space:]]. To make sure UTF-8 sequences remain human readable, the builtin regexes also have a special token type for runs of bytes with the high bit set. In English, non-ASCII characters are usually isolated so this is analogous to the [^[:space:]] pattern, except it matches a single _multibyte_ character despite use of the C locale. Unfortunately it is easy to make typos or forget entirely to include these catch-all token types when adding support for new languages (see v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes, 2010-12-18). Avoid this by including them automatically within the PATTERNS and IPATTERN macros. While at it, change the UTF-8 sequence token type to match exactly one non-ASCII multi-byte character, rather than an arbitrary run of them. Suggested-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-01-11 22:48:50 +01:00			`{ name, NULL, -1, { pattern, REG_EXTENDED }, \`
			`word_regex "\|[^[:space:]]\|[\xc0-\xff][\x80-\xbf]+" }`
userdiff.c: add builtin fortran regex patterns This adds fortran xfuncname and wordRegex patterns to the list of builtin patterns. The intention is for the patterns to be appropriate for all versions of fortran including 77, 90, 95. The patterns can be enabled by adding the diff=fortran attribute to the .gitattributes file for the desired file glob. This also adds a new macro named IPATTERN which is just like the PATTERNS macro except it sets the REG_ICASE flag so that case will be ignored. The test code in t4018 and the docs were updated as appropriate. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-10 18:18:14 +02:00			`#define IPATTERN(name, pattern, word_regex) \`
userdiff: simplify word-diff safeguard git's diff-words support has a detail that can be a little dangerous: any text not matched by a given language's tokenization pattern is treated as whitespace and changes in such text would go unnoticed. Therefore each of the built-in regexes allows a special token type consisting of a single non-whitespace character [^[:space:]]. To make sure UTF-8 sequences remain human readable, the builtin regexes also have a special token type for runs of bytes with the high bit set. In English, non-ASCII characters are usually isolated so this is analogous to the [^[:space:]] pattern, except it matches a single _multibyte_ character despite use of the C locale. Unfortunately it is easy to make typos or forget entirely to include these catch-all token types when adding support for new languages (see v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes, 2010-12-18). Avoid this by including them automatically within the PATTERNS and IPATTERN macros. While at it, change the UTF-8 sequence token type to match exactly one non-ASCII multi-byte character, rather than an arbitrary run of them. Suggested-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-01-11 22:48:50 +01:00			`{ name, NULL, -1, { pattern, REG_EXTENDED \| REG_ICASE }, \`
			`word_regex "\|[^[:space:]]\|[\xc0-\xff][\x80-\xbf]+" }`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00			`static struct userdiff_driver builtin_drivers[] = {`
Add userdiff patterns for Ada Add Ada xfuncname and wordRegex patterns to the list of builtin patterns. Signed-off-by: Adrian Johnson <ajohnson@redneon.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-16 05:54:15 +02:00			`IPATTERN("ada",`
userdiff: update Ada patterns - Allow extra space in "is new" and "is separate" - Fix bug in word regex for numbers Signed-off-by: Adrian Johnson <ajohnson@redneon.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-02-03 12:33:16 +01:00			`"!^(.[ \t])?(is[ \t]+new\|renames\|is[ \t]+separate)([ \t].)?$\n"`
Add userdiff patterns for Ada Add Ada xfuncname and wordRegex patterns to the list of builtin patterns. Signed-off-by: Adrian Johnson <ajohnson@redneon.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-16 05:54:15 +02:00			`"!^[ \t]with[ \t].$\n"`
			`"^[ \t]((procedure\|function)[ \t]+.)$\n"`
			`"^[ \t]((package\|protected\|task)[ \t]+.)$",`
			`/* -- */`
			`"[a-zA-Z][a-zA-Z0-9_]*"`
userdiff: update Ada patterns - Allow extra space in "is new" and "is separate" - Fix bug in word regex for numbers Signed-off-by: Adrian Johnson <ajohnson@redneon.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-02-03 12:33:16 +01:00			`"\|[-+]?[0-9][0-9#_.aAbBcCdDeEfF]*([eE][+-]?[0-9_]+)?"`
Add userdiff patterns for Ada Add Ada xfuncname and wordRegex patterns to the list of builtin patterns. Signed-off-by: Adrian Johnson <ajohnson@redneon.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-09-16 05:54:15 +02:00			`"\|=>\|\\.\\.\|\\\\\|:=\|/=\|>=\|<=\|<<\|>>\|<>"),`
userdiff.c: add builtin fortran regex patterns This adds fortran xfuncname and wordRegex patterns to the list of builtin patterns. The intention is for the patterns to be appropriate for all versions of fortran including 77, 90, 95. The patterns can be enabled by adding the diff=fortran attribute to the .gitattributes file for the desired file glob. This also adds a new macro named IPATTERN which is just like the PATTERNS macro except it sets the REG_ICASE flag so that case will be ignored. The test code in t4018 and the docs were updated as appropriate. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-10 18:18:14 +02:00			`IPATTERN("fortran",`
			`"!^([C]\|[ \t]!)\n"`
			`"!^[ \t]*MODULE[ \t]+PROCEDURE[ \t]\n"`
			`"^[ \t]*((END[ \t]+)?(PROGRAM\|MODULE\|BLOCK[ \t]+DATA"`
			`"\|([^'\" \t]+[ \t]+)(SUBROUTINE\|FUNCTION))[ \t]+[A-Z].)$",`
			`/* -- */`
			`"[a-zA-Z][a-zA-Z0-9_]*"`
			`"\|\\.([Ee][Qq]\|[Nn][Ee]\|[Gg][TtEe]\|[Ll][TtEe]\|[Tt][Rr][Uu][Ee]\|[Ff][Aa][Ll][Ss][Ee]\|[Aa][Nn][Dd]\|[Oo][Rr]\|[Nn]?[Ee][Qq][Vv]\|[Nn][Oo][Tt])\\."`
			`/* numbers and format statements like 2E14.4, or ES12.6, 9X.`
			`* Don't worry about format statements without leading digits since`
			`* they would have been matched above as a variable anyway. */`
			`"\|[-+]?[0-9.]+([AaIiDdEeFfLlTtXx][Ss]?[-+]?[0-9.])?(_[a-zA-Z0-9][a-zA-Z0-9_])?"`
userdiff: simplify word-diff safeguard git's diff-words support has a detail that can be a little dangerous: any text not matched by a given language's tokenization pattern is treated as whitespace and changes in such text would go unnoticed. Therefore each of the built-in regexes allows a special token type consisting of a single non-whitespace character [^[:space:]]. To make sure UTF-8 sequences remain human readable, the builtin regexes also have a special token type for runs of bytes with the high bit set. In English, non-ASCII characters are usually isolated so this is analogous to the [^[:space:]] pattern, except it matches a single _multibyte_ character despite use of the C locale. Unfortunately it is easy to make typos or forget entirely to include these catch-all token types when adding support for new languages (see v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes, 2010-12-18). Avoid this by including them automatically within the PATTERNS and IPATTERN macros. While at it, change the UTF-8 sequence token type to match exactly one non-ASCII multi-byte character, rather than an arbitrary run of them. Suggested-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-01-11 22:48:50 +01:00			`"\|//\|\\\\\|::\|[/<>=]="),`
userdiff: add support for Fountain documents Add support for Fountain, a plain text screenplay format. Git facilitates not just programming specifically, but creative writing in general, so it makes sense to also support other plain text documents besides source code. In the structure of a screenplay specifically, scenes are roughly analogous to functions, in the sense that it makes your job easier if you can see which ones were changed in a given range of patches. More information about the Fountain format can be found on its official website, at http://fountain.io . Signed-off-by: Zoë Blade <zoe@bytenoise.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-07-21 15:22:46 +02:00			`IPATTERN("fountain", "^((\\.[^.]\|(int\|ext\|est\|int\\.?/ext\|i/e)[. ]).*)$",`
			`"[^ \t-]+"),`
color-words: make regex configurable via attributes Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-17 17:29:48 +01:00			`PATTERNS("html", "^[ \t](<[Hh][1-6][ \t].>.*)$",`
userdiff: simplify word-diff safeguard git's diff-words support has a detail that can be a little dangerous: any text not matched by a given language's tokenization pattern is treated as whitespace and changes in such text would go unnoticed. Therefore each of the built-in regexes allows a special token type consisting of a single non-whitespace character [^[:space:]]. To make sure UTF-8 sequences remain human readable, the builtin regexes also have a special token type for runs of bytes with the high bit set. In English, non-ASCII characters are usually isolated so this is analogous to the [^[:space:]] pattern, except it matches a single _multibyte_ character despite use of the C locale. Unfortunately it is easy to make typos or forget entirely to include these catch-all token types when adding support for new languages (see v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes, 2010-12-18). Avoid this by including them automatically within the PATTERNS and IPATTERN macros. While at it, change the UTF-8 sequence token type to match exactly one non-ASCII multi-byte character, rather than an arbitrary run of them. Suggested-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-01-11 22:48:50 +01:00			`"[^<>= \t]+"),`
color-words: make regex configurable via attributes Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-17 17:29:48 +01:00			`PATTERNS("java",`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00			`"!^[ \t]*(catch\|do\|for\|if\|instanceof\|new\|return\|switch\|throw\|while)\n"`
avoid exponential regex match for java and objc function names In the old regex ^[ \t](([ \t][A-Za-z_][A-Za-z_0-9]){2,}[ \t]\([^;])$ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ you can backtrack arbitrarily from [A-Za-z_0-9] into [A-Za-z_], thus causing an exponential number of backtracks. Ironically it also causes the regex not to work as intended; for example "catch" can match the underlined part of the regex, the first repetition matching "c" and the second matching "atch". The replacement regex avoids this problem, because it makes sure that at least a space/tab is eaten on each repetition. In other words, a suffix of a repetition can never be a prefix of the next repetition. Signed-off-by: Paolo Bonzini <bonzini@gnu.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-17 16:26:06 +02:00			`"^[ \t](([A-Za-z_][A-Za-z_0-9][ \t]+)+[A-Za-z_][A-Za-z_0-9][ \t]\\([^;]*)$",`
			`/* -- */`
color-words: make regex configurable via attributes Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-17 17:29:48 +01:00			`"[a-zA-Z_][a-zA-Z0-9_]*"`
			`"\|[-+0-9.e]+[fFlL]?\|0[xXbB]?[0-9a-fA-F]+[lL]?"`
			`"\|[-+*/<>%&^\|=!]="`
userdiff: simplify word-diff safeguard git's diff-words support has a detail that can be a little dangerous: any text not matched by a given language's tokenization pattern is treated as whitespace and changes in such text would go unnoticed. Therefore each of the built-in regexes allows a special token type consisting of a single non-whitespace character [^[:space:]]. To make sure UTF-8 sequences remain human readable, the builtin regexes also have a special token type for runs of bytes with the high bit set. In English, non-ASCII characters are usually isolated so this is analogous to the [^[:space:]] pattern, except it matches a single _multibyte_ character despite use of the C locale. Unfortunately it is easy to make typos or forget entirely to include these catch-all token types when adding support for new languages (see v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes, 2010-12-18). Avoid this by including them automatically within the PATTERNS and IPATTERN macros. While at it, change the UTF-8 sequence token type to match exactly one non-ASCII multi-byte character, rather than an arbitrary run of them. Suggested-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-01-11 22:48:50 +01:00			`"\|--\|\\+\\+\|<<=?\|>>>?=?\|&&\|\\\|\\\|"),`
Add built-in diff patterns for MATLAB code MATLAB is often used in industry and academia for scientific computations motivating it being included as a built-in pattern. Signed-off-by: Gustaf Hendeby <hendeby@isy.liu.se> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-15 21:15:03 +01:00			`PATTERNS("matlab",`
			`"^[[:space:]]((classdef\|function)[[:space:]].)$\|^%%[[:space:]].*$",`
			`"[a-zA-Z_][a-zA-Z0-9_]\|[-+0-9.e]+\|[=~<>]=\|\\.[/\\^']\|\\\|\\\|\|&&"),`
color-words: make regex configurable via attributes Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-17 17:29:48 +01:00			`PATTERNS("objc",`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00			`/* Negate C statements that can look like functions */`
			`"!^[ \t]*(do\|for\|if\|else\|return\|switch\|while)\n"`
			`/* Objective-C methods */`
			`"^[ \t]([-+][ \t]\\([ \t][A-Za-z_][A-Za-z_0-9 \t]\\)[ \t][A-Za-z_].*)$\n"`
			`/* C functions */`
avoid exponential regex match for java and objc function names In the old regex ^[ \t](([ \t][A-Za-z_][A-Za-z_0-9]){2,}[ \t]\([^;])$ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ you can backtrack arbitrarily from [A-Za-z_0-9] into [A-Za-z_], thus causing an exponential number of backtracks. Ironically it also causes the regex not to work as intended; for example "catch" can match the underlined part of the regex, the first repetition matching "c" and the second matching "atch". The replacement regex avoids this problem, because it makes sure that at least a space/tab is eaten on each repetition. In other words, a suffix of a repetition can never be a prefix of the next repetition. Signed-off-by: Paolo Bonzini <bonzini@gnu.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-06-17 16:26:06 +02:00			`"^[ \t](([A-Za-z_][A-Za-z_0-9][ \t]+)+[A-Za-z_][A-Za-z_0-9][ \t]\\([^;]*)$\n"`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00			`/* Objective-C class/protocol definitions */`
color-words: make regex configurable via attributes Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-17 17:29:48 +01:00			`"^(@(implementation\|interface\|protocol)[ \t].*)$",`
			`/* -- */`
			`"[a-zA-Z_][a-zA-Z0-9_]*"`
			`"\|[-+0-9.e]+[fFlL]?\|0[xXbB]?[0-9a-fA-F]+[lL]?"`
userdiff: simplify word-diff safeguard git's diff-words support has a detail that can be a little dangerous: any text not matched by a given language's tokenization pattern is treated as whitespace and changes in such text would go unnoticed. Therefore each of the built-in regexes allows a special token type consisting of a single non-whitespace character [^[:space:]]. To make sure UTF-8 sequences remain human readable, the builtin regexes also have a special token type for runs of bytes with the high bit set. In English, non-ASCII characters are usually isolated so this is analogous to the [^[:space:]] pattern, except it matches a single _multibyte_ character despite use of the C locale. Unfortunately it is easy to make typos or forget entirely to include these catch-all token types when adding support for new languages (see v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes, 2010-12-18). Avoid this by including them automatically within the PATTERNS and IPATTERN macros. While at it, change the UTF-8 sequence token type to match exactly one non-ASCII multi-byte character, rather than an arbitrary run of them. Suggested-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-01-11 22:48:50 +01:00			`"\|[-+*/<>%&^\|=!]=\|--\|\\+\\+\|<<=?\|>>=?\|&&\|\\\|\\\|\|::\|->"),`
color-words: make regex configurable via attributes Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-17 17:29:48 +01:00			`PATTERNS("pascal",`
userdiff: match Pascal class methods Class declarations were already covered by the second pattern, but class methods have the 'class' keyword in front too. Account for it. Signed-off-by: Alexey Shumkin <zapped@mail.ru> Acked-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-01-11 09:53:59 +01:00			`"^(((class[ \t]+)?(procedure\|function)\|constructor\|destructor\|interface\|"`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00			`"implementation\|initialization\|finalization)[ \t].)$"`
			`"\n"`
color-words: make regex configurable via attributes Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-17 17:29:48 +01:00			`"^(.=[ \t](class\|record).*)$",`
			`/* -- */`
			`"[a-zA-Z_][a-zA-Z0-9_]*"`
			`"\|[-+0-9.e]+\|0[xXbB]?[0-9a-fA-F]+"`
userdiff: simplify word-diff safeguard git's diff-words support has a detail that can be a little dangerous: any text not matched by a given language's tokenization pattern is treated as whitespace and changes in such text would go unnoticed. Therefore each of the built-in regexes allows a special token type consisting of a single non-whitespace character [^[:space:]]. To make sure UTF-8 sequences remain human readable, the builtin regexes also have a special token type for runs of bytes with the high bit set. In English, non-ASCII characters are usually isolated so this is analogous to the [^[:space:]] pattern, except it matches a single _multibyte_ character despite use of the C locale. Unfortunately it is easy to make typos or forget entirely to include these catch-all token types when adding support for new languages (see v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes, 2010-12-18). Avoid this by including them automatically within the PATTERNS and IPATTERN macros. While at it, change the UTF-8 sequence token type to match exactly one non-ASCII multi-byte character, rather than an arbitrary run of them. Suggested-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-01-11 22:48:50 +01:00			`"\|<>\|<=\|>=\|:=\|\\.\\."),`
diff: funcname and word patterns for perl The default function name discovery already works quite well for Perl code... with the exception of here-documents (or rather their ending). sub foo { print <<END here-document END return 1; } The default funcname pattern treats the unindented END line as a function declaration and puts it in the @@ line of diff and "grep --show-function" output. With a little knowledge of perl syntax, we can do better. You can try it out by adding "*.perl diff=perl" to the gitattributes file. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-26 10:07:31 +01:00			`PATTERNS("perl",`
userdiff/perl: catch sub with brace on second line Accept sub foo { } as an alternative to a more common style that introduces perl functions with a brace on the first line (and likewise for BEGIN/END blocks). The new regex is a little hairy to avoid matching # forward declaration sub foo; while continuing to match "sub foo($;@) {" and sub foo { # This routine is interesting; # in fact, the lines below explain how... While at it, pay attention to Perl 5.14's "package foo {" syntax as an alternative to the traditional "package foo;". Requested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 21:38:26 +02:00			`"^package .*\n"`
			`"^sub [[:alnum:]_':]+[ \t]*"`
			`"(\\([^)]\\)[ \t])?" /* prototype */`
			`/*`
			`* Attributes. A regex can't count nested parentheses,`
			`* so just slurp up whatever we see, taking care not`
			`* to accept lines like "sub foo; # defined elsewhere".`
			`*`
			`* An attribute could contain a semicolon, but at that`
			`* point it seems reasonable enough to give up.`
			`*/`
			`"(:[^;#]*)?"`
			`"(\\{[ \t])?" / brace can come here or on the next line */`
			`"(#.)?$\n" / comment */`
userdiff/perl: tighten BEGIN/END block pattern to reject here-doc delimiters A naive method of treating BEGIN/END blocks with a brace on the second line as diff/grep funcname context involves also matching unrelated lines that consist of all-caps letters: sub foo { print <<'EOF' text goes here ... EOF ... rest of foo ... } That's not so great, because it means that "git diff" and "git grep --show-function" would write "=EOF" or "@@ EOF" as context instead of a more useful reminder like "@@ sub foo {". To avoid this, tighten the pattern to only match the special block names that perl accepts (namely BEGIN, END, INIT, CHECK, UNITCHECK, AUTOLOAD, and DESTROY). The list is taken from perl's toke.c. Suggested-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-22 19:29:32 +02:00			`"^(BEGIN\|END\|INIT\|CHECK\|UNITCHECK\|AUTOLOAD\|DESTROY)[ \t]*"`
userdiff/perl: catch sub with brace on second line Accept sub foo { } as an alternative to a more common style that introduces perl functions with a brace on the first line (and likewise for BEGIN/END blocks). The new regex is a little hairy to avoid matching # forward declaration sub foo; while continuing to match "sub foo($;@) {" and sub foo { # This routine is interesting; # in fact, the lines below explain how... While at it, pay attention to Perl 5.14's "package foo {" syntax as an alternative to the traditional "package foo;". Requested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 21:38:26 +02:00			`"(\\{[ \t])?" / brace can come here or on the next line */`
			`"(#.*)?$\n"`
userdiff/perl: match full line of POD headers The builtin perl userdiff driver is not greedy enough about catching POD header lines. Capture the whole line, so instead of just declaring that we are in some "@@ =head1" section, diff/grep output can explain that the enclosing section is about "@@ =head1 OPTIONS". Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 21:35:51 +02:00			`"^=head[0-9] .", / POD */`
diff: funcname and word patterns for perl The default function name discovery already works quite well for Perl code... with the exception of here-documents (or rather their ending). sub foo { print <<END here-document END return 1; } The default funcname pattern treats the unindented END line as a function declaration and puts it in the @@ line of diff and "grep --show-function" output. With a little knowledge of perl syntax, we can do better. You can try it out by adding "*.perl diff=perl" to the gitattributes file. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-26 10:07:31 +01:00			`/* -- */`
			`"[[:alpha:]_'][[:alnum:]_']*"`
			`"\|0[xb]?[0-9a-fA-F_]*"`
			`/* taking care not to interpret 3..5 as (3.)(.5) */`
			`"\|[0-9a-fA-F_]+(\\.[0-9a-fA-F_]+)?([eE][-+]?[0-9_]+)?"`
			`"\|=>\|-[rwxoRWXOezsfdlpSugkbctTBMAC>]\|~~\|::"`
			`"\|&&=\|\\\|\\\|=\|//=\|\\\\="`
			`"\|&&\|\\\|\\\|\|//\|\\+\\+\|--\|\\\\\|\\.\\.\\.?"`
			`"\|[-+*/%.^&<>=!\|]="`
			`"\|=~\|!~"`
userdiff: simplify word-diff safeguard git's diff-words support has a detail that can be a little dangerous: any text not matched by a given language's tokenization pattern is treated as whitespace and changes in such text would go unnoticed. Therefore each of the built-in regexes allows a special token type consisting of a single non-whitespace character [^[:space:]]. To make sure UTF-8 sequences remain human readable, the builtin regexes also have a special token type for runs of bytes with the high bit set. In English, non-ASCII characters are usually isolated so this is analogous to the [^[:space:]] pattern, except it matches a single _multibyte_ character despite use of the C locale. Unfortunately it is easy to make typos or forget entirely to include these catch-all token types when adding support for new languages (see v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes, 2010-12-18). Avoid this by including them automatically within the PATTERNS and IPATTERN macros. While at it, change the UTF-8 sequence token type to match exactly one non-ASCII multi-byte character, rather than an arbitrary run of them. Suggested-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-01-11 22:48:50 +01:00			`"\|<<\|<>\|<=>\|>>"),`
diff: Support visibility modifiers in the PHP hunk header regexp Starting with PHP5, class methods can have a visibility modifier, which caused the methods not to be matched by the existing regexp, so extend the regexp to match those modifiers. And while we're at it, allow the "static" modifier as well. Since the "static" modifier can appear either before or after the visibility modifier, let's just allow any number of modifiers to appear in any order, as that simplifies the regexp and shouldn't cause any false positives. Signed-off-by: BjÃ¶rn Steinbrink <B.Steinbrink@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-05-23 20:05:40 +02:00			`PATTERNS("php",`
			`"^[\t ](((public\|protected\|private\|static)[\t ]+)function.*)$\n"`
			`"^[\t ](class.)$",`
color-words: make regex configurable via attributes Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-17 17:29:48 +01:00			`/* -- */`
			`"[a-zA-Z_][a-zA-Z0-9_]*"`
			`"\|[-+0-9.e]+\|0[xXbB]?[0-9a-fA-F]+"`
userdiff: simplify word-diff safeguard git's diff-words support has a detail that can be a little dangerous: any text not matched by a given language's tokenization pattern is treated as whitespace and changes in such text would go unnoticed. Therefore each of the built-in regexes allows a special token type consisting of a single non-whitespace character [^[:space:]]. To make sure UTF-8 sequences remain human readable, the builtin regexes also have a special token type for runs of bytes with the high bit set. In English, non-ASCII characters are usually isolated so this is analogous to the [^[:space:]] pattern, except it matches a single _multibyte_ character despite use of the C locale. Unfortunately it is easy to make typos or forget entirely to include these catch-all token types when adding support for new languages (see v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes, 2010-12-18). Avoid this by including them automatically within the PATTERNS and IPATTERN macros. While at it, change the UTF-8 sequence token type to match exactly one non-ASCII multi-byte character, rather than an arbitrary run of them. Suggested-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-01-11 22:48:50 +01:00			`"\|[-+*/<>%&^\|=!.]=\|--\|\\+\\+\|<<=?\|>>=?\|===\|&&\|\\\|\\\|\|::\|->"),`
color-words: make regex configurable via attributes Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-17 17:29:48 +01:00			`PATTERNS("python", "^[ \t]((class\|def)[ \t].)$",`
			`/* -- */`
			`"[a-zA-Z_][a-zA-Z0-9_]*"`
			`"\|[-+0-9.e]+[jJlL]?\|0[xX]?[0-9a-fA-F]+[lL]?"`
userdiff: simplify word-diff safeguard git's diff-words support has a detail that can be a little dangerous: any text not matched by a given language's tokenization pattern is treated as whitespace and changes in such text would go unnoticed. Therefore each of the built-in regexes allows a special token type consisting of a single non-whitespace character [^[:space:]]. To make sure UTF-8 sequences remain human readable, the builtin regexes also have a special token type for runs of bytes with the high bit set. In English, non-ASCII characters are usually isolated so this is analogous to the [^[:space:]] pattern, except it matches a single _multibyte_ character despite use of the C locale. Unfortunately it is easy to make typos or forget entirely to include these catch-all token types when adding support for new languages (see v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes, 2010-12-18). Avoid this by including them automatically within the PATTERNS and IPATTERN macros. While at it, change the UTF-8 sequence token type to match exactly one non-ASCII multi-byte character, rather than an arbitrary run of them. Suggested-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-01-11 22:48:50 +01:00			`"\|[-+/<>%&^\|=!]=\|//=?\|<<=?\|>>=?\|\\\\*=?"),`
color-words: make regex configurable via attributes Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-17 17:29:48 +01:00			`/* -- */`
			`PATTERNS("ruby", "^[ \t]((class\|module\|def)[ \t].)$",`
			`/* -- */`
			`"(@\|@@\|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"`
			`"\|[-+0-9.e]+\|0[xXbB]?[0-9a-fA-F]+\|\\?(\\\\C-)?(\\\\M-)?."`
userdiff: simplify word-diff safeguard git's diff-words support has a detail that can be a little dangerous: any text not matched by a given language's tokenization pattern is treated as whitespace and changes in such text would go unnoticed. Therefore each of the built-in regexes allows a special token type consisting of a single non-whitespace character [^[:space:]]. To make sure UTF-8 sequences remain human readable, the builtin regexes also have a special token type for runs of bytes with the high bit set. In English, non-ASCII characters are usually isolated so this is analogous to the [^[:space:]] pattern, except it matches a single _multibyte_ character despite use of the C locale. Unfortunately it is easy to make typos or forget entirely to include these catch-all token types when adding support for new languages (see v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes, 2010-12-18). Avoid this by including them automatically within the PATTERNS and IPATTERN macros. While at it, change the UTF-8 sequence token type to match exactly one non-ASCII multi-byte character, rather than an arbitrary run of them. Suggested-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-01-11 22:48:50 +01:00			`"\|//=?\|[-+*/<>%&^\|=!]=\|<<=?\|>>=?\|===\|\\.{1,3}\|::\|[!=]~"),`
color-words: make regex configurable via attributes Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-17 17:29:48 +01:00			`PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]\\{{0,1}[ \t][^ \t\"@',\\#}{~%]).$",`
			`"[={}\"]\|[^={}\" \t]+"),`
			`PATTERNS("tex", "^(\\\\((sub)section\|chapter\|part)\\{0,1}\\{.*)$",`
userdiff: simplify word-diff safeguard git's diff-words support has a detail that can be a little dangerous: any text not matched by a given language's tokenization pattern is treated as whitespace and changes in such text would go unnoticed. Therefore each of the built-in regexes allows a special token type consisting of a single non-whitespace character [^[:space:]]. To make sure UTF-8 sequences remain human readable, the builtin regexes also have a special token type for runs of bytes with the high bit set. In English, non-ASCII characters are usually isolated so this is analogous to the [^[:space:]] pattern, except it matches a single _multibyte_ character despite use of the C locale. Unfortunately it is easy to make typos or forget entirely to include these catch-all token types when adding support for new languages (see v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes, 2010-12-18). Avoid this by including them automatically within the PATTERNS and IPATTERN macros. While at it, change the UTF-8 sequence token type to match exactly one non-ASCII multi-byte character, rather than an arbitrary run of them. Suggested-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-01-11 22:48:50 +01:00			`"\\\\[a-zA-Z@]+\|\\\\.\|[a-zA-Z0-9\x80-\xff]+"),`
color-words: make regex configurable via attributes Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-17 17:29:48 +01:00			`PATTERNS("cpp",`
			`/* Jump targets or access declarations */`
userdiff: have 'cpp' hunk header pattern catch more C++ anchor points The hunk header pattern 'cpp' is intended for C and C++ source code, but it is actually not particularly useful for the latter, and even misses some use-cases for the former. The parts of the pattern have the following flaws: - The first part matches an identifier followed immediately by a colon and arbitrary text and is intended to reject goto labels and C++ access specifiers (public, private, protected). But this pattern also rejects C++ constructs, which look like this: MyClass::MyClass() MyClass::~MyClass() MyClass::Item MyClass::Find(... - The second part matches an identifier followed by a list of qualified names (i.e. identifiers separated by the C++ scope operator '::') separated by space or '' followed by an opening parenthesis (with space between the tokens). It matches function declarations like struct item get_head(... int Outer::Inner::Func(... Since the pattern requires at least two identifiers, GNU-style function definitions are ignored: void func(... Moreover, since the pattern does not allow punctuation other than '', the following C++ constructs are not recognized: . template definitions: template<class T> int func(T arg) . functions returning references: const string& get_message() . functions returning templated types: vector<int> foo() . operator definitions: Value operator+(Value l, Value r) - The third part of the pattern finally matches compound definitions. But it forgets about unions and namespaces, and also skips single-line definitions struct random_iterator_tag {}; because no semicolon can occur on the line. Change the first pattern to require a colon at the end of the line (except for trailing space and comments), so that it does not reject constructor or destructor definitions. Notice that all interesting anchor points begin with an identifier or keyword. But since there is a large variety of syntactical constructs after the first "word", the simplest is to require only this word and accept everything else. Therefore, this boils down to a line that begins with a letter or underscore (optionally preceded by the C++ scope operator '::' to accept functions returning a type anchored at the global namespace). Replace the second and third part by a single pattern that picks such a line. This has the following desirable consequence: - All constructs mentioned above are recognized. and the following likely desirable consequences: - Definitions of global variables and typedefs are recognized: int num_entries = 0; extern const char help_text; typedef basic_string<wchar_t> wstring; - Commonly used marco-ized boilerplate code is recognized: BEGIN_MESSAGE_MAP(CCanvas,CWnd) Q_DECLARE_METATYPE(MyStruct) PATTERNS("tex",...) (The last one is from this very patch.) but also the following possibly undesirable consequence: - When a label is not on a line by itself (except for a comment) it is no longer rejected, but can appear as a hunk header if it occurs at the beginning of a line: next:; IMO, the benefits of the change outweigh the (possible) regressions by a large margin. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-21 22:07:22 +01:00			`"!^[ \t][A-Za-z_][A-Za-z_0-9]:[[:space:]]($\|/[/])\n"`
			`/* functions/methods, variables, and compounds at top level */`
			`"^((::[[:space:]])?[A-Za-z_].)$",`
color-words: make regex configurable via attributes Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-17 17:29:48 +01:00			`/* -- */`
			`"[a-zA-Z_][a-zA-Z0-9_]*"`
userdiff: support unsigned and long long suffixes of integer constants Do not split constants such as 123U, 456ll, 789UL at the first U or second L. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-21 22:07:14 +01:00			`"\|[-+0-9.e]+[fFlL]?\|0[xXbB]?[0-9a-fA-F]+[lLuU]*"`
userdiff: support C++ ->* and .* operators in the word regexp The character sequences ->* and .* are valid C++ operators. Keep them together in --word-diff mode. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-03-21 22:07:13 +01:00			`"\|[-+/<>%&^\|=!]=\|--\|\\+\\+\|<<=?\|>>=?\|&&\|\\\|\\\|\|::\|->\\?\|\\.\\*"),`
Userdiff patterns for C# Add userdiff patterns for C#. This code is an improved version of code by Adam Petaccia from 21 June 2009 mail to the list. Signed-off-by: Petr Onderka <gsvick@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-08-16 19:01:02 +02:00			`PATTERNS("csharp",`
			`/* Keywords */`
			`"!^[ \t]*(do\|while\|for\|if\|else\|instanceof\|new\|return\|switch\|case\|throw\|catch\|using)\n"`
			`/* Methods and constructors */`
			`"^[ \t](((static\|public\|internal\|private\|protected\|new\|virtual\|sealed\|override\|unsafe)[ \t]+)[][<>@.~_[:alnum:]]+[ \t]+[<>@._[:alnum:]]+[ \t]\\(.\\))[ \t]*$\n"`
			`/* Properties */`
			`"^[ \t](((static\|public\|internal\|private\|protected\|new\|virtual\|sealed\|override\|unsafe)[ \t]+)[][<>@.~_[:alnum:]]+[ \t]+[@._[:alnum:]]+)[ \t]*$\n"`
			`/* Type definitions */`
			`"^[ \t](((static\|public\|internal\|private\|protected\|new\|unsafe\|sealed\|abstract\|partial)[ \t]+)(class\|enum\|interface\|struct)[ \t]+.*)$\n"`
			`/* Namespace */`
			`"^[ \t](namespace[ \t]+.)$",`
			`/* -- */`
			`"[a-zA-Z_][a-zA-Z0-9_]*"`
			`"\|[-+0-9.e]+[fFlL]?\|0[xXbB]?[0-9a-fA-F]+[lL]?"`
userdiff: simplify word-diff safeguard git's diff-words support has a detail that can be a little dangerous: any text not matched by a given language's tokenization pattern is treated as whitespace and changes in such text would go unnoticed. Therefore each of the built-in regexes allows a special token type consisting of a single non-whitespace character [^[:space:]]. To make sure UTF-8 sequences remain human readable, the builtin regexes also have a special token type for runs of bytes with the high bit set. In English, non-ASCII characters are usually isolated so this is analogous to the [^[:space:]] pattern, except it matches a single _multibyte_ character despite use of the C locale. Unfortunately it is easy to make typos or forget entirely to include these catch-all token types when adding support for new languages (see v1.7.3.5~16, userdiff: fix typo in ruby and python word regexes, 2010-12-18). Avoid this by including them automatically within the PATTERNS and IPATTERN macros. While at it, change the UTF-8 sequence token type to match exactly one non-ASCII multi-byte character, rather than an arbitrary run of them. Suggested-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-01-11 22:48:50 +01:00			`"\|[-+*/<>%&^\|=!]=\|--\|\\+\\+\|<<=?\|>>=?\|&&\|\\\|\\\|\|::\|->"),`
userdiff: add built-in pattern for CSS CSS is widely used, motivating it being included as a built-in pattern. It must be noted that the word_regex for CSS (i.e. the regex defining what is a word in the language) does not consider '.' and '#' characters (in CSS selectors) to be part of the word. This behavior is documented by the test t/t4018/css-rule. The logic behind this behavior is the following: identifiers in CSS selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#' character are not part of the identifier, but an indicator of the nature of the identifier in HTML/XML (class or id). Diffing ".class1" and ".class2" must show that the class name is changed, but we still are selecting a class. Logic behind the "pattern" regex is: 1. reject lines ending with a colon/semicolon (properties) 2. if a line begins with a name in column 1, pick the whole line Credits to Johannes Sixt (j6t@kdbg.org) for the pattern regex and most of the tests. Signed-off-by: William Duclot <william.duclot@ensimag.grenoble-inp.fr> Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-06-03 14:32:26 +02:00			`IPATTERN("css",`
			`"![:;][[:space:]]*$\n"`
			`"^[_a-z0-9].*$",`
			`/* -- */`
			`/*`
			`* This regex comes from W3C CSS specs. Should theoretically also`
			`* allow ISO 10646 characters U+00A0 and higher,`
			`* but they are not handled in this regex.`
			`*/`
			`"-?[_a-zA-Z][-_a-zA-Z0-9]" / identifiers */`
			`"\|-?[0-9]+\|\\#[0-9a-fA-F]+" /* numbers */`
			`),`
diff: introduce diff.<driver>.binary The "diff" gitattribute is somewhat overloaded right now. It can say one of three things: 1. this file is definitely binary, or definitely not (i.e., diff or !diff) 2. this file should use an external diff engine (i.e., diff=foo, diff.foo.command = custom-script) 3. this file should use particular funcname patterns (i.e., diff=foo, diff.foo.(x?)funcname = some-regex) Most of the time, there is no conflict between these uses, since using one implies that the other is irrelevant (e.g., an external diff engine will decide for itself whether the file is binary). However, there is at least one conflicting situation: there is no way to say "use the regular rules to determine whether this file is binary, but if we do diff it textually, use this funcname pattern." That is, currently setting diff=foo indicates that the file is definitely text. This patch introduces a "binary" config option for a diff driver, so that one can explicitly set diff.foo.binary. We default this value to "don't know". That is, setting a diff attribute to "foo" and using "diff.foo.funcname" will have no effect on the binaryness of a file. To get the current behavior, one can set diff.foo.binary to true. This patch also has one additional advantage: it cleans up the interface to the userdiff code a bit. Before, calling code had to know more about whether attributes were false, true, or unset to determine binaryness. Now that binaryness is a property of a driver, we can represent these situations just by passing back a driver struct. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:36 +02:00			`{ "default", NULL, -1, { NULL, 0 } },`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00			`};`
color-words: make regex configurable via attributes Make the --color-words splitting regular expression configurable via the diff driver's 'wordregex' attribute. The user can then set the driver on a file in .gitattributes. If a regex is given on the command line, it overrides the driver's setting. We also provide built-in regexes for the languages that already had funcname patterns, and add an appropriate diff driver entry for C/++. (The patterns are designed to run UTF-8 sequences into a single chunk to make sure they remain readable.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-01-17 17:29:48 +01:00			`#undef PATTERNS`
userdiff.c: add builtin fortran regex patterns This adds fortran xfuncname and wordRegex patterns to the list of builtin patterns. The intention is for the patterns to be appropriate for all versions of fortran including 77, 90, 95. The patterns can be enabled by adding the diff=fortran attribute to the .gitattributes file for the desired file glob. This also adds a new macro named IPATTERN which is just like the PATTERNS macro except it sets the REG_ICASE flag so that case will be ignored. The test code in t4018 and the docs were updated as appropriate. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-09-10 18:18:14 +02:00			`#undef IPATTERN`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00
			`static struct userdiff_driver driver_true = {`
			`"diff=true",`
			`NULL,`
diff: introduce diff.<driver>.binary The "diff" gitattribute is somewhat overloaded right now. It can say one of three things: 1. this file is definitely binary, or definitely not (i.e., diff or !diff) 2. this file should use an external diff engine (i.e., diff=foo, diff.foo.command = custom-script) 3. this file should use particular funcname patterns (i.e., diff=foo, diff.foo.(x?)funcname = some-regex) Most of the time, there is no conflict between these uses, since using one implies that the other is irrelevant (e.g., an external diff engine will decide for itself whether the file is binary). However, there is at least one conflicting situation: there is no way to say "use the regular rules to determine whether this file is binary, but if we do diff it textually, use this funcname pattern." That is, currently setting diff=foo indicates that the file is definitely text. This patch introduces a "binary" config option for a diff driver, so that one can explicitly set diff.foo.binary. We default this value to "don't know". That is, setting a diff attribute to "foo" and using "diff.foo.funcname" will have no effect on the binaryness of a file. To get the current behavior, one can set diff.foo.binary to true. This patch also has one additional advantage: it cleans up the interface to the userdiff code a bit. Before, calling code had to know more about whether attributes were false, true, or unset to determine binaryness. Now that binaryness is a property of a driver, we can represent these situations just by passing back a driver struct. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:36 +02:00			`0,`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00			`{ NULL, 0 }`
			`};`

			`static struct userdiff_driver driver_false = {`
			`"!diff",`
			`NULL,`
diff: introduce diff.<driver>.binary The "diff" gitattribute is somewhat overloaded right now. It can say one of three things: 1. this file is definitely binary, or definitely not (i.e., diff or !diff) 2. this file should use an external diff engine (i.e., diff=foo, diff.foo.command = custom-script) 3. this file should use particular funcname patterns (i.e., diff=foo, diff.foo.(x?)funcname = some-regex) Most of the time, there is no conflict between these uses, since using one implies that the other is irrelevant (e.g., an external diff engine will decide for itself whether the file is binary). However, there is at least one conflicting situation: there is no way to say "use the regular rules to determine whether this file is binary, but if we do diff it textually, use this funcname pattern." That is, currently setting diff=foo indicates that the file is definitely text. This patch introduces a "binary" config option for a diff driver, so that one can explicitly set diff.foo.binary. We default this value to "don't know". That is, setting a diff attribute to "foo" and using "diff.foo.funcname" will have no effect on the binaryness of a file. To get the current behavior, one can set diff.foo.binary to true. This patch also has one additional advantage: it cleans up the interface to the userdiff code a bit. Before, calling code had to know more about whether attributes were false, true, or unset to determine binaryness. Now that binaryness is a property of a driver, we can represent these situations just by passing back a driver struct. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:36 +02:00			`1,`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00			`{ NULL, 0 }`
			`};`

			`static struct userdiff_driver userdiff_find_by_namelen(const char k, int len)`
			`{`
			`int i;`
			`for (i = 0; i < ndrivers; i++) {`
			`struct userdiff_driver *drv = drivers + i;`
			`if (!strncmp(drv->name, k, len) && !drv->name[len])`
			`return drv;`
			`}`
			`for (i = 0; i < ARRAY_SIZE(builtin_drivers); i++) {`
			`struct userdiff_driver *drv = builtin_drivers + i;`
			`if (!strncmp(drv->name, k, len) && !drv->name[len])`
			`return drv;`
			`}`
			`return NULL;`
			`}`

			`static int parse_funcname(struct userdiff_funcname f, const char k,`
			`const char *v, int cflags)`
			`{`
			`if (git_config_string(&f->pattern, k, v) < 0)`
			`return -1;`
			`f->cflags = cflags;`
drop odd return value semantics from userdiff_config When the userdiff_config function was introduced in be58e70 (diff: unify external diff and funcname parsing code, 2008-10-05), it used a return value convention unlike any other config callback. Like other callbacks, it used "-1" to signal error. But it returned "1" to indicate that it found something, and "0" otherwise; other callbacks simply returned "0" to indicate that no error occurred. This distinction was necessary at the time, because the userdiff namespace overlapped slightly with the color configuration namespace. So "diff.color.foo" could mean "the 'foo' slot of diff coloring" or "the 'foo' component of the "color" userdiff driver". Because the color-parsing code would die on an unknown color slot, we needed the userdiff code to indicate that it had matched the variable, letting us bypass the color-parsing code entirely. Later, in 8b8e862 (ignore unknown color configuration, 2009-12-12), the color-parsing code learned to silently ignore unknown slots. This means we no longer need to protect userdiff-matched variables from reaching the color-parsing code. We can therefore change the userdiff_config calling convention to a more normal one. This drops some code from each caller, which is nice. But more importantly, it reduces the cognitive load for readers who may wonder why userdiff_config is unlike every other config callback. There's no need to add a new test confirming that this works; t4020 already contains a test that sets diff.color.external. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-07 19:23:02 +01:00			`return 0;`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00			`}`

diff: introduce diff.<driver>.binary The "diff" gitattribute is somewhat overloaded right now. It can say one of three things: 1. this file is definitely binary, or definitely not (i.e., diff or !diff) 2. this file should use an external diff engine (i.e., diff=foo, diff.foo.command = custom-script) 3. this file should use particular funcname patterns (i.e., diff=foo, diff.foo.(x?)funcname = some-regex) Most of the time, there is no conflict between these uses, since using one implies that the other is irrelevant (e.g., an external diff engine will decide for itself whether the file is binary). However, there is at least one conflicting situation: there is no way to say "use the regular rules to determine whether this file is binary, but if we do diff it textually, use this funcname pattern." That is, currently setting diff=foo indicates that the file is definitely text. This patch introduces a "binary" config option for a diff driver, so that one can explicitly set diff.foo.binary. We default this value to "don't know". That is, setting a diff attribute to "foo" and using "diff.foo.funcname" will have no effect on the binaryness of a file. To get the current behavior, one can set diff.foo.binary to true. This patch also has one additional advantage: it cleans up the interface to the userdiff code a bit. Before, calling code had to know more about whether attributes were false, true, or unset to determine binaryness. Now that binaryness is a property of a driver, we can represent these situations just by passing back a driver struct. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:36 +02:00			`static int parse_tristate(int b, const char k, const char *v)`
			`{`
			`if (v && !strcasecmp(v, "auto"))`
			`*b = -1;`
			`else`
			`*b = git_config_bool(k, v);`
drop odd return value semantics from userdiff_config When the userdiff_config function was introduced in be58e70 (diff: unify external diff and funcname parsing code, 2008-10-05), it used a return value convention unlike any other config callback. Like other callbacks, it used "-1" to signal error. But it returned "1" to indicate that it found something, and "0" otherwise; other callbacks simply returned "0" to indicate that no error occurred. This distinction was necessary at the time, because the userdiff namespace overlapped slightly with the color configuration namespace. So "diff.color.foo" could mean "the 'foo' slot of diff coloring" or "the 'foo' component of the "color" userdiff driver". Because the color-parsing code would die on an unknown color slot, we needed the userdiff code to indicate that it had matched the variable, letting us bypass the color-parsing code entirely. Later, in 8b8e862 (ignore unknown color configuration, 2009-12-12), the color-parsing code learned to silently ignore unknown slots. This means we no longer need to protect userdiff-matched variables from reaching the color-parsing code. We can therefore change the userdiff_config calling convention to a more normal one. This drops some code from each caller, which is nice. But more importantly, it reduces the cognitive load for readers who may wonder why userdiff_config is unlike every other config callback. There's no need to add a new test confirming that this works; t4020 already contains a test that sets diff.color.external. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-07 19:23:02 +01:00			`return 0;`
diff: introduce diff.<driver>.binary The "diff" gitattribute is somewhat overloaded right now. It can say one of three things: 1. this file is definitely binary, or definitely not (i.e., diff or !diff) 2. this file should use an external diff engine (i.e., diff=foo, diff.foo.command = custom-script) 3. this file should use particular funcname patterns (i.e., diff=foo, diff.foo.(x?)funcname = some-regex) Most of the time, there is no conflict between these uses, since using one implies that the other is irrelevant (e.g., an external diff engine will decide for itself whether the file is binary). However, there is at least one conflicting situation: there is no way to say "use the regular rules to determine whether this file is binary, but if we do diff it textually, use this funcname pattern." That is, currently setting diff=foo indicates that the file is definitely text. This patch introduces a "binary" config option for a diff driver, so that one can explicitly set diff.foo.binary. We default this value to "don't know". That is, setting a diff attribute to "foo" and using "diff.foo.funcname" will have no effect on the binaryness of a file. To get the current behavior, one can set diff.foo.binary to true. This patch also has one additional advantage: it cleans up the interface to the userdiff code a bit. Before, calling code had to know more about whether attributes were false, true, or unset to determine binaryness. Now that binaryness is a property of a driver, we can represent these situations just by passing back a driver struct. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:36 +02:00			`}`

diff: cache textconv output Running a textconv filter can take a long time. It's particularly bad for a large file which needs to be spooled to disk, but even for small files, the fork+exec overhead can add up for something like "git log -p". This patch uses the notes-cache mechanism to keep a fast cache of textconv output. Caches are stored in refs/notes/textconv/$x, where $x is the userdiff driver defined in gitattributes. Caching is enabled only if diff.$x.cachetextconv is true. In my test repo, on a commit with 45 jpg and avi files changed and a textconv to show their exif tags: [before] $ time git show >/dev/null real 0m13.724s user 0m12.057s sys 0m1.624s [after, first run] $ git config diff.mfo.cachetextconv true $ time git show >/dev/null real 0m14.252s user 0m12.197s sys 0m1.800s [after, subsequent runs] $ time git show >/dev/null real 0m0.352s user 0m0.148s sys 0m0.200s So for a slight (3.8%) cost on the first run, we achieve an almost 40x speed up on subsequent runs. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-02 02:12:15 +02:00			`static int parse_bool(int b, const char k, const char *v)`
			`{`
			`*b = git_config_bool(k, v);`
drop odd return value semantics from userdiff_config When the userdiff_config function was introduced in be58e70 (diff: unify external diff and funcname parsing code, 2008-10-05), it used a return value convention unlike any other config callback. Like other callbacks, it used "-1" to signal error. But it returned "1" to indicate that it found something, and "0" otherwise; other callbacks simply returned "0" to indicate that no error occurred. This distinction was necessary at the time, because the userdiff namespace overlapped slightly with the color configuration namespace. So "diff.color.foo" could mean "the 'foo' slot of diff coloring" or "the 'foo' component of the "color" userdiff driver". Because the color-parsing code would die on an unknown color slot, we needed the userdiff code to indicate that it had matched the variable, letting us bypass the color-parsing code entirely. Later, in 8b8e862 (ignore unknown color configuration, 2009-12-12), the color-parsing code learned to silently ignore unknown slots. This means we no longer need to protect userdiff-matched variables from reaching the color-parsing code. We can therefore change the userdiff_config calling convention to a more normal one. This drops some code from each caller, which is nice. But more importantly, it reduces the cognitive load for readers who may wonder why userdiff_config is unlike every other config callback. There's no need to add a new test confirming that this works; t4020 already contains a test that sets diff.color.external. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-07 19:23:02 +01:00			`return 0;`
diff: cache textconv output Running a textconv filter can take a long time. It's particularly bad for a large file which needs to be spooled to disk, but even for small files, the fork+exec overhead can add up for something like "git log -p". This patch uses the notes-cache mechanism to keep a fast cache of textconv output. Caches are stored in refs/notes/textconv/$x, where $x is the userdiff driver defined in gitattributes. Caching is enabled only if diff.$x.cachetextconv is true. In my test repo, on a commit with 45 jpg and avi files changed and a textconv to show their exif tags: [before] $ time git show >/dev/null real 0m13.724s user 0m12.057s sys 0m1.624s [after, first run] $ git config diff.mfo.cachetextconv true $ time git show >/dev/null real 0m14.252s user 0m12.197s sys 0m1.800s [after, subsequent runs] $ time git show >/dev/null real 0m0.352s user 0m0.148s sys 0m0.200s So for a slight (3.8%) cost on the first run, we achieve an almost 40x speed up on subsequent runs. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-02 02:12:15 +02:00			`}`

userdiff: require explicitly allowing textconv Diffs that have been produced with textconv almost certainly cannot be applied, so we want to be careful not to generate them in things like format-patch. This introduces a new diff options, ALLOW_TEXTCONV, which controls this behavior. It is off by default, but is explicitly turned on for the "log" family of commands, as well as the "diff" porcelain (but not diff-* plumbing). Because both text conversion and external diffing are controlled by these diff options, we can get rid of the "plumbing versus porcelain" distinction when reading the config. This was an attempt to control the same thing, but suffered from being too coarse-grained. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-10-26 05:45:55 +01:00			`int userdiff_config(const char k, const char v)`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00			`{`
			`struct userdiff_driver *drv;`
userdiff: drop parse_driver function When we parse userdiff config, we generally assume that diff.name.key will affect the "key" value of the "name" driver. However, without confirming that the key is a valid userdiff key, we may accidentally conflict with the ancient "diff.color.*" namespace. The current code is careful not to even create a driver struct if we do not see a key that is known by the diff-driver code. However, this carefulness is unnecessary; the default driver with no keys set behaves exactly the same as having no driver at all. We can simply set up the driver struct as soon as we see we have a config key that looks like a driver. This makes the code a bit more readable. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-23 07:25:07 +01:00			`const char name, type;`
			`int namelen;`

			`if (parse_config_key(k, "diff", &name, &namelen, &type) \|\| !name)`
			`return 0;`

			`drv = userdiff_find_by_namelen(name, namelen);`
			`if (!drv) {`
			`ALLOC_GROW(drivers, ndrivers+1, drivers_alloc);`
			`drv = &drivers[ndrivers++];`
			`memset(drv, 0, sizeof(*drv));`
			`drv->name = xmemdupz(name, namelen);`
			`drv->binary = -1;`
			`}`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00
userdiff: drop parse_driver function When we parse userdiff config, we generally assume that diff.name.key will affect the "key" value of the "name" driver. However, without confirming that the key is a valid userdiff key, we may accidentally conflict with the ancient "diff.color.*" namespace. The current code is careful not to even create a driver struct if we do not see a key that is known by the diff-driver code. However, this carefulness is unnecessary; the default driver with no keys set behaves exactly the same as having no driver at all. We can simply set up the driver struct as soon as we see we have a config key that looks like a driver. This makes the code a bit more readable. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-23 07:25:07 +01:00			`if (!strcmp(type, "funcname"))`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00			`return parse_funcname(&drv->funcname, k, v, 0);`
userdiff: drop parse_driver function When we parse userdiff config, we generally assume that diff.name.key will affect the "key" value of the "name" driver. However, without confirming that the key is a valid userdiff key, we may accidentally conflict with the ancient "diff.color.*" namespace. The current code is careful not to even create a driver struct if we do not see a key that is known by the diff-driver code. However, this carefulness is unnecessary; the default driver with no keys set behaves exactly the same as having no driver at all. We can simply set up the driver struct as soon as we see we have a config key that looks like a driver. This makes the code a bit more readable. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-23 07:25:07 +01:00			`if (!strcmp(type, "xfuncname"))`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00			`return parse_funcname(&drv->funcname, k, v, REG_EXTENDED);`
userdiff: drop parse_driver function When we parse userdiff config, we generally assume that diff.name.key will affect the "key" value of the "name" driver. However, without confirming that the key is a valid userdiff key, we may accidentally conflict with the ancient "diff.color.*" namespace. The current code is careful not to even create a driver struct if we do not see a key that is known by the diff-driver code. However, this carefulness is unnecessary; the default driver with no keys set behaves exactly the same as having no driver at all. We can simply set up the driver struct as soon as we see we have a config key that looks like a driver. This makes the code a bit more readable. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-23 07:25:07 +01:00			`if (!strcmp(type, "binary"))`
diff: introduce diff.<driver>.binary The "diff" gitattribute is somewhat overloaded right now. It can say one of three things: 1. this file is definitely binary, or definitely not (i.e., diff or !diff) 2. this file should use an external diff engine (i.e., diff=foo, diff.foo.command = custom-script) 3. this file should use particular funcname patterns (i.e., diff=foo, diff.foo.(x?)funcname = some-regex) Most of the time, there is no conflict between these uses, since using one implies that the other is irrelevant (e.g., an external diff engine will decide for itself whether the file is binary). However, there is at least one conflicting situation: there is no way to say "use the regular rules to determine whether this file is binary, but if we do diff it textually, use this funcname pattern." That is, currently setting diff=foo indicates that the file is definitely text. This patch introduces a "binary" config option for a diff driver, so that one can explicitly set diff.foo.binary. We default this value to "don't know". That is, setting a diff attribute to "foo" and using "diff.foo.funcname" will have no effect on the binaryness of a file. To get the current behavior, one can set diff.foo.binary to true. This patch also has one additional advantage: it cleans up the interface to the userdiff code a bit. Before, calling code had to know more about whether attributes were false, true, or unset to determine binaryness. Now that binaryness is a property of a driver, we can represent these situations just by passing back a driver struct. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:36 +02:00			`return parse_tristate(&drv->binary, k, v);`
userdiff: drop parse_driver function When we parse userdiff config, we generally assume that diff.name.key will affect the "key" value of the "name" driver. However, without confirming that the key is a valid userdiff key, we may accidentally conflict with the ancient "diff.color.*" namespace. The current code is careful not to even create a driver struct if we do not see a key that is known by the diff-driver code. However, this carefulness is unnecessary; the default driver with no keys set behaves exactly the same as having no driver at all. We can simply set up the driver struct as soon as we see we have a config key that looks like a driver. This makes the code a bit more readable. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-23 07:25:07 +01:00			`if (!strcmp(type, "command"))`
drop odd return value semantics from userdiff_config When the userdiff_config function was introduced in be58e70 (diff: unify external diff and funcname parsing code, 2008-10-05), it used a return value convention unlike any other config callback. Like other callbacks, it used "-1" to signal error. But it returned "1" to indicate that it found something, and "0" otherwise; other callbacks simply returned "0" to indicate that no error occurred. This distinction was necessary at the time, because the userdiff namespace overlapped slightly with the color configuration namespace. So "diff.color.foo" could mean "the 'foo' slot of diff coloring" or "the 'foo' component of the "color" userdiff driver". Because the color-parsing code would die on an unknown color slot, we needed the userdiff code to indicate that it had matched the variable, letting us bypass the color-parsing code entirely. Later, in 8b8e862 (ignore unknown color configuration, 2009-12-12), the color-parsing code learned to silently ignore unknown slots. This means we no longer need to protect userdiff-matched variables from reaching the color-parsing code. We can therefore change the userdiff_config calling convention to a more normal one. This drops some code from each caller, which is nice. But more importantly, it reduces the cognitive load for readers who may wonder why userdiff_config is unlike every other config callback. There's no need to add a new test confirming that this works; t4020 already contains a test that sets diff.color.external. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-07 19:23:02 +01:00			`return git_config_string(&drv->external, k, v);`
userdiff: drop parse_driver function When we parse userdiff config, we generally assume that diff.name.key will affect the "key" value of the "name" driver. However, without confirming that the key is a valid userdiff key, we may accidentally conflict with the ancient "diff.color.*" namespace. The current code is careful not to even create a driver struct if we do not see a key that is known by the diff-driver code. However, this carefulness is unnecessary; the default driver with no keys set behaves exactly the same as having no driver at all. We can simply set up the driver struct as soon as we see we have a config key that looks like a driver. This makes the code a bit more readable. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-23 07:25:07 +01:00			`if (!strcmp(type, "textconv"))`
drop odd return value semantics from userdiff_config When the userdiff_config function was introduced in be58e70 (diff: unify external diff and funcname parsing code, 2008-10-05), it used a return value convention unlike any other config callback. Like other callbacks, it used "-1" to signal error. But it returned "1" to indicate that it found something, and "0" otherwise; other callbacks simply returned "0" to indicate that no error occurred. This distinction was necessary at the time, because the userdiff namespace overlapped slightly with the color configuration namespace. So "diff.color.foo" could mean "the 'foo' slot of diff coloring" or "the 'foo' component of the "color" userdiff driver". Because the color-parsing code would die on an unknown color slot, we needed the userdiff code to indicate that it had matched the variable, letting us bypass the color-parsing code entirely. Later, in 8b8e862 (ignore unknown color configuration, 2009-12-12), the color-parsing code learned to silently ignore unknown slots. This means we no longer need to protect userdiff-matched variables from reaching the color-parsing code. We can therefore change the userdiff_config calling convention to a more normal one. This drops some code from each caller, which is nice. But more importantly, it reduces the cognitive load for readers who may wonder why userdiff_config is unlike every other config callback. There's no need to add a new test confirming that this works; t4020 already contains a test that sets diff.color.external. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-07 19:23:02 +01:00			`return git_config_string(&drv->textconv, k, v);`
userdiff: drop parse_driver function When we parse userdiff config, we generally assume that diff.name.key will affect the "key" value of the "name" driver. However, without confirming that the key is a valid userdiff key, we may accidentally conflict with the ancient "diff.color.*" namespace. The current code is careful not to even create a driver struct if we do not see a key that is known by the diff-driver code. However, this carefulness is unnecessary; the default driver with no keys set behaves exactly the same as having no driver at all. We can simply set up the driver struct as soon as we see we have a config key that looks like a driver. This makes the code a bit more readable. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-23 07:25:07 +01:00			`if (!strcmp(type, "cachetextconv"))`
diff: cache textconv output Running a textconv filter can take a long time. It's particularly bad for a large file which needs to be spooled to disk, but even for small files, the fork+exec overhead can add up for something like "git log -p". This patch uses the notes-cache mechanism to keep a fast cache of textconv output. Caches are stored in refs/notes/textconv/$x, where $x is the userdiff driver defined in gitattributes. Caching is enabled only if diff.$x.cachetextconv is true. In my test repo, on a commit with 45 jpg and avi files changed and a textconv to show their exif tags: [before] $ time git show >/dev/null real 0m13.724s user 0m12.057s sys 0m1.624s [after, first run] $ git config diff.mfo.cachetextconv true $ time git show >/dev/null real 0m14.252s user 0m12.197s sys 0m1.800s [after, subsequent runs] $ time git show >/dev/null real 0m0.352s user 0m0.148s sys 0m0.200s So for a slight (3.8%) cost on the first run, we achieve an almost 40x speed up on subsequent runs. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-02 02:12:15 +02:00			`return parse_bool(&drv->textconv_want_cache, k, v);`
userdiff: drop parse_driver function When we parse userdiff config, we generally assume that diff.name.key will affect the "key" value of the "name" driver. However, without confirming that the key is a valid userdiff key, we may accidentally conflict with the ancient "diff.color.*" namespace. The current code is careful not to even create a driver struct if we do not see a key that is known by the diff-driver code. However, this carefulness is unnecessary; the default driver with no keys set behaves exactly the same as having no driver at all. We can simply set up the driver struct as soon as we see we have a config key that looks like a driver. This makes the code a bit more readable. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-23 07:25:07 +01:00			`if (!strcmp(type, "wordregex"))`
drop odd return value semantics from userdiff_config When the userdiff_config function was introduced in be58e70 (diff: unify external diff and funcname parsing code, 2008-10-05), it used a return value convention unlike any other config callback. Like other callbacks, it used "-1" to signal error. But it returned "1" to indicate that it found something, and "0" otherwise; other callbacks simply returned "0" to indicate that no error occurred. This distinction was necessary at the time, because the userdiff namespace overlapped slightly with the color configuration namespace. So "diff.color.foo" could mean "the 'foo' slot of diff coloring" or "the 'foo' component of the "color" userdiff driver". Because the color-parsing code would die on an unknown color slot, we needed the userdiff code to indicate that it had matched the variable, letting us bypass the color-parsing code entirely. Later, in 8b8e862 (ignore unknown color configuration, 2009-12-12), the color-parsing code learned to silently ignore unknown slots. This means we no longer need to protect userdiff-matched variables from reaching the color-parsing code. We can therefore change the userdiff_config calling convention to a more normal one. This drops some code from each caller, which is nice. But more importantly, it reduces the cognitive load for readers who may wonder why userdiff_config is unlike every other config callback. There's no need to add a new test confirming that this works; t4020 already contains a test that sets diff.color.external. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-07 19:23:02 +01:00			`return git_config_string(&drv->word_regex, k, v);`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00
			`return 0;`
			`}`

			`struct userdiff_driver userdiff_find_by_name(const char name) {`
			`int len = strlen(name);`
			`return userdiff_find_by_namelen(name, len);`
			`}`

			`struct userdiff_driver userdiff_find_by_path(const char path)`
			`{`
			`static struct git_attr *attr;`
			`struct git_attr_check check;`

			`if (!attr)`
git_attr(): fix function signature The function took (name, namelen) as its arguments, but all the public callers wanted to pass a full string. Demote the counted-string interface to an internal API status, and allow public callers to just pass the string to the function. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-17 05:39:59 +01:00			`attr = git_attr("diff");`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00			`check.attr = attr;`

			`if (!path)`
			`return NULL;`
Rename git_checkattr() to git_check_attr() Suggested by: Junio Hamano <gitster@pobox.com> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-04 06:36:33 +02:00			`if (git_check_attr(path, 1, &check))`
diff: unify external diff and funcname parsing code Both sets of code assume that one specifies a diff profile as a gitattribute via the "diff=foo" attribute. They then pull information about that profile from the config as diff.foo.*. The code for each is currently completely separate from the other, which has several disadvantages: - there is duplication as we maintain code to create and search the separate lists of external drivers and funcname patterns - it is difficult to add new profile options, since it is unclear where they should go - the code is difficult to follow, as we rely on the "check if this file is binary" code to find the funcname pattern as a side effect. This is the first step in refactoring the binary-checking code. This patch factors out these diff profiles into "userdiff" drivers. A file with "diff=foo" uses the "foo" driver, which is specified by a single struct. Note that one major difference between the two pieces of code is that the funcname patterns are always loaded, whereas external drivers are loaded only for the "git diff" porcelain; the new code takes care to retain that situation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-05 23:43:21 +02:00			`return NULL;`

			`if (ATTR_TRUE(check.value))`
			`return &driver_true;`
			`if (ATTR_FALSE(check.value))`
			`return &driver_false;`
			`if (ATTR_UNSET(check.value))`
			`return NULL;`
			`return userdiff_find_by_name(check.value);`
			`}`
refactor get_textconv to not require diff_filespec This function actually does two things: 1. Load the userdiff driver for the filespec. 2. Decide whether the driver has a textconv component, and initialize the textconv cache if applicable. Only part (1) requires the filespec object, and some callers may not have a filespec at all. So let's split them it into two functions, and put part (2) with the userdiff code, which is a better fit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-23 22:30:14 +02:00
			`struct userdiff_driver userdiff_get_textconv(struct userdiff_driver driver)`
			`{`
			`if (!driver->textconv)`
			`return NULL;`

			`if (driver->textconv_want_cache && !driver->textconv_cache) {`
			`struct notes_cache c = xmalloc(sizeof(c));`
			`struct strbuf name = STRBUF_INIT;`

			`strbuf_addf(&name, "textconv/%s", driver->name);`
			`notes_cache_init(c, name.buf, driver->textconv);`
			`driver->textconv_cache = c;`
			`}`

			`return driver;`
			`}`