mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-01 06:47:52 +01:00

1732 lines

40 KiB

C

Raw Normal View History

Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`#include "cache.h"`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`#include "attr.h"`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`#include "run-command.h"`
convert filter: supply path to external driver Filtering to support keyword expansion may need the name of the file being filtered. In particular, to support p4 keywords like $File: //depot/product/dir/script.sh $ the smudge filter needs to know the name of the file it is smudging. Allow "%f" in the custom filter command line specified in the configuration. This will be substituted by the filename inside a single-quote pair to be passed to the shell. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-22 15:40:13 +01:00			`#include "quote.h"`
Ignore SIGPIPE when running a filter driver If a filter is not defined or if it fails, git should behave as if the filter is a no-op passthru. However, if the filter exits before reading all the content, depending on the timing, git could be killed with SIGPIPE when it tries to write to the pipe connected to the filter. Ignore SIGPIPE while processing the filter to give us a chance to check the return value from a failed write, in order to detect and act on this mode of failure in a more controlled way. Signed-off-by: Jehan Bing <jehan@orb.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-20 21:53:37 +01:00			`#include "sigchain.h"`
convert: add filter.<driver>.process option Git's clean/smudge mechanism invokes an external filter process for every single blob that is affected by a filter. If Git filters a lot of blobs then the startup time of the external filter processes can become a significant part of the overall Git execution time. In a preliminary performance test this developer used a clean/smudge filter written in golang to filter 12,000 files. This process took 364s with the existing filter mechanism and 5s with the new mechanism. See details here: https://github.com/github/git-lfs/pull/1382 This patch adds the `filter.<driver>.process` string option which, if used, keeps the external filter process running and processes all blobs with the packet format (pkt-line) based protocol over standard input and standard output. The full protocol is explained in detail in `Documentation/gitattributes.txt`. A few key decisions: * The long running filter process is referred to as filter protocol version 2 because the existing single shot filter invocation is considered version 1. * Git sends a welcome message and expects a response right after the external filter process has started. This ensures that Git will not hang if a version 1 filter is incorrectly used with the filter.<driver>.process option for version 2 filters. In addition, Git can detect this kind of error and warn the user. * The status of a filter operation (e.g. "success" or "error) is set before the actual response and (if necessary!) re-set after the response. The advantage of this two step status response is that if the filter detects an error early, then the filter can communicate this and Git does not even need to create structures to read the response. * All status responses are pkt-line lists terminated with a flush packet. This allows us to send other status fields with the same protocol in the future. Helped-by: Martin-Louis Bright <mlbright@gmail.com> Reviewed-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:37 +02:00			`#include "pkt-line.h"`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`/*`
			`* convert.c - convert a file when checking it out and checking it in.`
			`*`
			`* This should use the pathname to decide on whether it wants to do some`
			`* more interesting conversions (automatic gzip/unzip, general format`
			`* conversions etc etc), but by default it just does automatic CRLF<->LF`
Add "core.eol" config variable Introduce a new configuration variable, "core.eol", that allows the user to set which line endings to use for end-of-line-normalized files in the working directory. It defaults to "native", which means CRLF on Windows and LF everywhere else. Note that "core.autocrlf" overrides core.eol. This means that [core] autocrlf = true puts CRLFs in the working directory even if core.eol is set to "lf". Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-06-04 21:29:08 +02:00			`* translation when the "text" attribute or "auto_crlf" option is set.`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`*/`

ls-files: add eol diagnostics When working in a cross-platform environment, a user may want to check if text files are stored normalized in the repository and if .gitattributes are set appropriately. Make it possible to let Git show the line endings in the index and in the working tree and the effective text/eol attributes. The end of line ("eolinfo") are shown like this: "-text" binary (or with bare CR) file "none" text file without any EOL "lf" text file with LF "crlf" text file with CRLF "mixed" text file with mixed line endings. The effective text/eol attribute is one of these: "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf" git ls-files --eol gives an output like this: i/none w/none attr/text=auto t/t5100/empty i/-text w/-text attr/-text t/test-binary-2.png i/lf w/lf attr/text eol=lf t/t5100/rfc2047-info-0007 i/lf w/crlf attr/text eol=crlf doit.bat i/mixed w/mixed attr/ locale/XX.po to show what eol convention is used in the data in the index ('i'), and in the working tree ('w'), and what attribute is in effect, for each path that is shown. Add test cases in t0027. Helped-By: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-01-16 07:50:02 +01:00			`/* Stat bits: When BIN is set, the txt bits are unset */`
			`#define CONVERT_STAT_BITS_TXT_LF 0x1`
			`#define CONVERT_STAT_BITS_TXT_CRLF 0x2`
			`#define CONVERT_STAT_BITS_BIN 0x4`

convert: give saner names to crlf/eol variables, types and functions Back when the conversion was only about the end-of-line convention, it might have made sense to call what we do upon seeing CR/LF simply an "action", but these days the conversion routines do a lot more than just tweaking the line ending. Raname "action" to "crlf_action". The function that decides what end of line conversion to use on the output codepath was called "determine_output_conversion", as if there is no other kind of output conversion. Rename it to "output_eol"; it is a function that returns what EOL convention is to be used. A function that decides what "crlf_action" needs to be used on the input codepath, given what conversion attribute is set to the path and global end-of-line convention, was called "determine_action". Rename it to "input_crlf_action". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:12:57 +02:00			`enum crlf_action {`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`CRLF_UNDEFINED,`
			`CRLF_BINARY,`
Add per-repository eol normalization Change the semantics of the "crlf" attribute so that it enables end-of-line normalization when it is set, regardless of "core.autocrlf". Add a new setting for "crlf": "auto", which enables end-of-line conversion but does not override the automatic text file detection. Add a new attribute "eol" with possible values "crlf" and "lf". When set, this attribute enables normalization and forces git to use CRLF or LF line endings in the working directory, respectively. The line ending style to be used for normalized text files in the working directory is set using "core.autocrlf". When it is set to "true", CRLFs are used in the working directory; when set to "input" or "false", LFs are used. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-05-19 22:43:10 +02:00			`CRLF_TEXT,`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`CRLF_TEXT_INPUT,`
			`CRLF_TEXT_CRLF,`
			`CRLF_AUTO,`
			`CRLF_AUTO_INPUT,`
			`CRLF_AUTO_CRLF`
Add per-repository eol normalization Change the semantics of the "crlf" attribute so that it enables end-of-line normalization when it is set, regardless of "core.autocrlf". Add a new setting for "crlf": "auto", which enables end-of-line conversion but does not override the automatic text file detection. Add a new attribute "eol" with possible values "crlf" and "lf". When set, this attribute enables normalization and forces git to use CRLF or LF line endings in the working directory, respectively. The line ending style to be used for normalized text files in the working directory is set using "core.autocrlf". When it is set to "true", CRLFs are used in the working directory; when set to "input" or "false", LFs are used. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-05-19 22:43:10 +02:00			`};`
Update 'crlf' attribute semantics. This updates the semantics of 'crlf' so that .gitattributes file can say "this is text, even though it may look funny". Setting the `crlf` attribute on a path is meant to mark the path as a "text" file. 'core.autocrlf' conversion takes place without guessing the content type by inspection. Unsetting the `crlf` attribute on a path is meant to mark the path as a "binary" file. The path never goes through line endings conversion upon checkin/checkout. Unspecified `crlf` attribute tells git to apply the `core.autocrlf` conversion when the file content looks like text. Setting the `crlf` attribut to string value "input" is similar to setting the attribute to `true`, but also forces git to act as if `core.autocrlf` is set to `input` for the path. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-20 07:37:19 +02:00
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`struct text_stat {`
treat any file with NUL as binary There are two heuristics in Git to detect whether a file is binary or text. One in xdiff-interface.c (which is taken from GNU diff) relies on existence of the NUL byte at the beginning. However, convert.c used a different heuristic, which relied on the percent of non-printable symbols (less than 1% for text files). Due to differences in detection whether a file is binary or not, it was possible that a file that diff treats as binary could be treated as text by CRLF conversion. This is very confusing for a user who sees that 'git diff' shows the file as binary expects it to be added as binary. This patch makes is_binary to consider any file that contains at least one NUL character as binary, to ensure that the heuristics used for CRLF conversion is tighter than what is used by diff. Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-16 02:59:12 +01:00			`/* NUL, CR, LF and CRLF counts */`
convert.c: simplify text_stat Simplify the statistics: lonecr counts the CR which is not followed by a LF, lonelf counts the LF which is not preceded by a CR, crlf counts CRLF combinations. This simplifies the evaluation of the statistics. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:43 +01:00			`unsigned nul, lonecr, lonelf, crlf;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
			`/* These are just approximations! */`
			`unsigned printable, nonprintable;`
			`};`

			`static void gather_stats(const char buf, unsigned long size, struct text_stat stats)`
			`{`
			`unsigned long i;`

			`memset(stats, 0, sizeof(*stats));`

			`for (i = 0; i < size; i++) {`
			`unsigned char c = buf[i];`
			`if (c == '\r') {`
convert.c: simplify text_stat Simplify the statistics: lonecr counts the CR which is not followed by a LF, lonelf counts the LF which is not preceded by a CR, crlf counts CRLF combinations. This simplifies the evaluation of the statistics. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:43 +01:00			`if (i+1 < size && buf[i+1] == '\n') {`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`stats->crlf++;`
convert.c: simplify text_stat Simplify the statistics: lonecr counts the CR which is not followed by a LF, lonelf counts the LF which is not preceded by a CR, crlf counts CRLF combinations. This simplifies the evaluation of the statistics. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:43 +01:00			`i++;`
			`} else`
			`stats->lonecr++;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`continue;`
			`}`
			`if (c == '\n') {`
convert.c: simplify text_stat Simplify the statistics: lonecr counts the CR which is not followed by a LF, lonelf counts the LF which is not preceded by a CR, crlf counts CRLF combinations. This simplifies the evaluation of the statistics. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:43 +01:00			`stats->lonelf++;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`continue;`
			`}`
			`if (c == 127)`
			`/* DEL */`
			`stats->nonprintable++;`
			`else if (c < 32) {`
			`switch (c) {`
			`/* BS, HT, ESC and FF */`
			`case '\b': case '\t': case '\033': case '\014':`
			`stats->printable++;`
			`break;`
treat any file with NUL as binary There are two heuristics in Git to detect whether a file is binary or text. One in xdiff-interface.c (which is taken from GNU diff) relies on existence of the NUL byte at the beginning. However, convert.c used a different heuristic, which relied on the percent of non-printable symbols (less than 1% for text files). Due to differences in detection whether a file is binary or not, it was possible that a file that diff treats as binary could be treated as text by CRLF conversion. This is very confusing for a user who sees that 'git diff' shows the file as binary expects it to be added as binary. This patch makes is_binary to consider any file that contains at least one NUL character as binary, to ensure that the heuristics used for CRLF conversion is tighter than what is used by diff. Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-16 02:59:12 +01:00			`case 0:`
			`stats->nul++;`
			`/* fall through */`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`default:`
			`stats->nonprintable++;`
			`}`
			`}`
			`else`
			`stats->printable++;`
			`}`
Fixed text file auto-detection: treat EOF character 032 at the end of file as printable Signed-off-by: Dmitry Kakurin <Dmitry.Kakurin@gmail.com> Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-11 18:48:16 +02:00
			`/* If file ends with EOF then don't count this EOF as non-printable. */`
			`if (size >= 1 && buf[size-1] == '\032')`
			`stats->nonprintable--;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`}`

			`/*`
			`* The same heuristics as diff.c::mmfile_is_binary()`
ls-files: add eol diagnostics When working in a cross-platform environment, a user may want to check if text files are stored normalized in the repository and if .gitattributes are set appropriately. Make it possible to let Git show the line endings in the index and in the working tree and the effective text/eol attributes. The end of line ("eolinfo") are shown like this: "-text" binary (or with bare CR) file "none" text file without any EOL "lf" text file with LF "crlf" text file with CRLF "mixed" text file with mixed line endings. The effective text/eol attribute is one of these: "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf" git ls-files --eol gives an output like this: i/none w/none attr/text=auto t/t5100/empty i/-text w/-text attr/-text t/test-binary-2.png i/lf w/lf attr/text eol=lf t/t5100/rfc2047-info-0007 i/lf w/crlf attr/text eol=crlf doit.bat i/mixed w/mixed attr/ locale/XX.po to show what eol convention is used in the data in the index ('i'), and in the working tree ('w'), and what attribute is in effect, for each path that is shown. Add test cases in t0027. Helped-By: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-01-16 07:50:02 +01:00			`* We treat files with bare CR as binary`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`*/`
ls-files: add eol diagnostics When working in a cross-platform environment, a user may want to check if text files are stored normalized in the repository and if .gitattributes are set appropriately. Make it possible to let Git show the line endings in the index and in the working tree and the effective text/eol attributes. The end of line ("eolinfo") are shown like this: "-text" binary (or with bare CR) file "none" text file without any EOL "lf" text file with LF "crlf" text file with CRLF "mixed" text file with mixed line endings. The effective text/eol attribute is one of these: "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf" git ls-files --eol gives an output like this: i/none w/none attr/text=auto t/t5100/empty i/-text w/-text attr/-text t/test-binary-2.png i/lf w/lf attr/text eol=lf t/t5100/rfc2047-info-0007 i/lf w/crlf attr/text eol=crlf doit.bat i/mixed w/mixed attr/ locale/XX.po to show what eol convention is used in the data in the index ('i'), and in the working tree ('w'), and what attribute is in effect, for each path that is shown. Add test cases in t0027. Helped-By: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-01-16 07:50:02 +01:00			`static int convert_is_binary(unsigned long size, const struct text_stat *stats)`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`{`
convert.c: simplify text_stat Simplify the statistics: lonecr counts the CR which is not followed by a LF, lonelf counts the LF which is not preceded by a CR, crlf counts CRLF combinations. This simplifies the evaluation of the statistics. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:43 +01:00			`if (stats->lonecr)`
ls-files: add eol diagnostics When working in a cross-platform environment, a user may want to check if text files are stored normalized in the repository and if .gitattributes are set appropriately. Make it possible to let Git show the line endings in the index and in the working tree and the effective text/eol attributes. The end of line ("eolinfo") are shown like this: "-text" binary (or with bare CR) file "none" text file without any EOL "lf" text file with LF "crlf" text file with CRLF "mixed" text file with mixed line endings. The effective text/eol attribute is one of these: "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf" git ls-files --eol gives an output like this: i/none w/none attr/text=auto t/t5100/empty i/-text w/-text attr/-text t/test-binary-2.png i/lf w/lf attr/text eol=lf t/t5100/rfc2047-info-0007 i/lf w/crlf attr/text eol=crlf doit.bat i/mixed w/mixed attr/ locale/XX.po to show what eol convention is used in the data in the index ('i'), and in the working tree ('w'), and what attribute is in effect, for each path that is shown. Add test cases in t0027. Helped-By: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-01-16 07:50:02 +01:00			`return 1;`
treat any file with NUL as binary There are two heuristics in Git to detect whether a file is binary or text. One in xdiff-interface.c (which is taken from GNU diff) relies on existence of the NUL byte at the beginning. However, convert.c used a different heuristic, which relied on the percent of non-printable symbols (less than 1% for text files). Due to differences in detection whether a file is binary or not, it was possible that a file that diff treats as binary could be treated as text by CRLF conversion. This is very confusing for a user who sees that 'git diff' shows the file as binary expects it to be added as binary. This patch makes is_binary to consider any file that contains at least one NUL character as binary, to ensure that the heuristics used for CRLF conversion is tighter than what is used by diff. Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-16 02:59:12 +01:00			`if (stats->nul)`
			`return 1;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`if ((stats->printable >> 7) < stats->nonprintable)`
			`return 1;`
			`return 0;`
			`}`

ls-files: add eol diagnostics When working in a cross-platform environment, a user may want to check if text files are stored normalized in the repository and if .gitattributes are set appropriately. Make it possible to let Git show the line endings in the index and in the working tree and the effective text/eol attributes. The end of line ("eolinfo") are shown like this: "-text" binary (or with bare CR) file "none" text file without any EOL "lf" text file with LF "crlf" text file with CRLF "mixed" text file with mixed line endings. The effective text/eol attribute is one of these: "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf" git ls-files --eol gives an output like this: i/none w/none attr/text=auto t/t5100/empty i/-text w/-text attr/-text t/test-binary-2.png i/lf w/lf attr/text eol=lf t/t5100/rfc2047-info-0007 i/lf w/crlf attr/text eol=crlf doit.bat i/mixed w/mixed attr/ locale/XX.po to show what eol convention is used in the data in the index ('i'), and in the working tree ('w'), and what attribute is in effect, for each path that is shown. Add test cases in t0027. Helped-By: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-01-16 07:50:02 +01:00			`static unsigned int gather_convert_stats(const char *data, unsigned long size)`
			`{`
			`struct text_stat stats;`
convert.c: simplify text_stat Simplify the statistics: lonecr counts the CR which is not followed by a LF, lonelf counts the LF which is not preceded by a CR, crlf counts CRLF combinations. This simplifies the evaluation of the statistics. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:43 +01:00			`int ret = 0;`
ls-files: add eol diagnostics When working in a cross-platform environment, a user may want to check if text files are stored normalized in the repository and if .gitattributes are set appropriately. Make it possible to let Git show the line endings in the index and in the working tree and the effective text/eol attributes. The end of line ("eolinfo") are shown like this: "-text" binary (or with bare CR) file "none" text file without any EOL "lf" text file with LF "crlf" text file with CRLF "mixed" text file with mixed line endings. The effective text/eol attribute is one of these: "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf" git ls-files --eol gives an output like this: i/none w/none attr/text=auto t/t5100/empty i/-text w/-text attr/-text t/test-binary-2.png i/lf w/lf attr/text eol=lf t/t5100/rfc2047-info-0007 i/lf w/crlf attr/text eol=crlf doit.bat i/mixed w/mixed attr/ locale/XX.po to show what eol convention is used in the data in the index ('i'), and in the working tree ('w'), and what attribute is in effect, for each path that is shown. Add test cases in t0027. Helped-By: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-01-16 07:50:02 +01:00			`if (!data \|\| !size)`
			`return 0;`
			`gather_stats(data, size, &stats);`
			`if (convert_is_binary(size, &stats))`
convert.c: simplify text_stat Simplify the statistics: lonecr counts the CR which is not followed by a LF, lonelf counts the LF which is not preceded by a CR, crlf counts CRLF combinations. This simplifies the evaluation of the statistics. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:43 +01:00			`ret \|= CONVERT_STAT_BITS_BIN;`
			`if (stats.crlf)`
			`ret \|= CONVERT_STAT_BITS_TXT_CRLF;`
			`if (stats.lonelf)`
			`ret \|= CONVERT_STAT_BITS_TXT_LF;`

			`return ret;`
ls-files: add eol diagnostics When working in a cross-platform environment, a user may want to check if text files are stored normalized in the repository and if .gitattributes are set appropriately. Make it possible to let Git show the line endings in the index and in the working tree and the effective text/eol attributes. The end of line ("eolinfo") are shown like this: "-text" binary (or with bare CR) file "none" text file without any EOL "lf" text file with LF "crlf" text file with CRLF "mixed" text file with mixed line endings. The effective text/eol attribute is one of these: "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf" git ls-files --eol gives an output like this: i/none w/none attr/text=auto t/t5100/empty i/-text w/-text attr/-text t/test-binary-2.png i/lf w/lf attr/text eol=lf t/t5100/rfc2047-info-0007 i/lf w/crlf attr/text eol=crlf doit.bat i/mixed w/mixed attr/ locale/XX.po to show what eol convention is used in the data in the index ('i'), and in the working tree ('w'), and what attribute is in effect, for each path that is shown. Add test cases in t0027. Helped-By: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-01-16 07:50:02 +01:00			`}`

			`static const char gather_convert_stats_ascii(const char data, unsigned long size)`
			`{`
			`unsigned int convert_stats = gather_convert_stats(data, size);`

			`if (convert_stats & CONVERT_STAT_BITS_BIN)`
			`return "-text";`
			`switch (convert_stats) {`
			`case CONVERT_STAT_BITS_TXT_LF:`
			`return "lf";`
			`case CONVERT_STAT_BITS_TXT_CRLF:`
			`return "crlf";`
			`case CONVERT_STAT_BITS_TXT_LF \| CONVERT_STAT_BITS_TXT_CRLF:`
			`return "mixed";`
			`default:`
			`return "none";`
			`}`
			`}`

			`const char get_cached_convert_stats_ascii(const char path)`
			`{`
			`const char *ret;`
			`unsigned long sz;`
			`void *data = read_blob_data_from_cache(path, &sz);`
			`ret = gather_convert_stats_ascii(data, sz);`
			`free(data);`
			`return ret;`
			`}`

			`const char get_wt_convert_stats_ascii(const char path)`
			`{`
			`const char *ret = "";`
			`struct strbuf sb = STRBUF_INIT;`
			`if (strbuf_read_file(&sb, path, 0) >= 0)`
			`ret = gather_convert_stats_ascii(sb.buf, sb.len);`
			`strbuf_release(&sb);`
			`return ret;`
			`}`

convert.c: use text_eol_is_crlf() Add a helper function to find out, which line endings text files should get at checkout, depending on core.autocrlf and core.eol configuration variables. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-05 17:13:25 +01:00			`static int text_eol_is_crlf(void)`
			`{`
			`if (auto_crlf == AUTO_CRLF_TRUE)`
			`return 1;`
			`else if (auto_crlf == AUTO_CRLF_INPUT)`
			`return 0;`
			`if (core_eol == EOL_CRLF)`
			`return 1;`
			`if (core_eol == EOL_UNSET && EOL_NATIVE == EOL_CRLF)`
			`return 1;`
			`return 0;`
			`}`

convert: give saner names to crlf/eol variables, types and functions Back when the conversion was only about the end-of-line convention, it might have made sense to call what we do upon seeing CR/LF simply an "action", but these days the conversion routines do a lot more than just tweaking the line ending. Raname "action" to "crlf_action". The function that decides what end of line conversion to use on the output codepath was called "determine_output_conversion", as if there is no other kind of output conversion. Rename it to "output_eol"; it is a function that returns what EOL convention is to be used. A function that decides what "crlf_action" needs to be used on the input codepath, given what conversion attribute is set to the path and global end-of-line convention, was called "determine_action". Rename it to "input_crlf_action". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:12:57 +02:00			`static enum eol output_eol(enum crlf_action crlf_action)`
Avoid conflicts when merging branches with mixed normalization Currently, merging across changes in line ending normalization is painful since files containing CRLF will conflict with normalized files, even if the only difference between the two versions is the line endings. Additionally, any "real" merge conflicts that exist are obscured because every line in the file has a conflict. Assume you start out with a repo that has a lot of text files with CRLF checked in (A): o---C / \ A---B---D B: Add "* text=auto" to .gitattributes and normalize all files to LF-only C: Modify some of the text files D: Try to merge C You will get a ridiculous number of LF/CRLF conflicts when trying to merge C into D, since the repository contents for C are "wrong" wrt the new .gitattributes file. Fix ll-merge so that the "base", "theirs" and "ours" stages are passed through convert_to_worktree() and convert_to_git() before a three-way merge. This ensures that all three stages are normalized in the same way, removing from consideration differences that are only due to normalization. This feature is optional for now since it changes a low-level mechanism and is not necessary for the majority of users. The "merge.renormalize" config variable enables it. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-07-02 21:20:47 +02:00			`{`
convert: give saner names to crlf/eol variables, types and functions Back when the conversion was only about the end-of-line convention, it might have made sense to call what we do upon seeing CR/LF simply an "action", but these days the conversion routines do a lot more than just tweaking the line ending. Raname "action" to "crlf_action". The function that decides what end of line conversion to use on the output codepath was called "determine_output_conversion", as if there is no other kind of output conversion. Rename it to "output_eol"; it is a function that returns what EOL convention is to be used. A function that decides what "crlf_action" needs to be used on the input codepath, given what conversion attribute is set to the path and global end-of-line convention, was called "determine_action". Rename it to "input_crlf_action". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:12:57 +02:00			`switch (crlf_action) {`
Add "core.eol" config variable Introduce a new configuration variable, "core.eol", that allows the user to set which line endings to use for end-of-line-normalized files in the working directory. It defaults to "native", which means CRLF on Windows and LF everywhere else. Note that "core.autocrlf" overrides core.eol. This means that [core] autocrlf = true puts CRLFs in the working directory even if core.eol is set to "lf". Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-06-04 21:29:08 +02:00			`case CRLF_BINARY:`
			`return EOL_UNSET;`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`case CRLF_TEXT_CRLF:`
Add "core.eol" config variable Introduce a new configuration variable, "core.eol", that allows the user to set which line endings to use for end-of-line-normalized files in the working directory. It defaults to "native", which means CRLF on Windows and LF everywhere else. Note that "core.autocrlf" overrides core.eol. This means that [core] autocrlf = true puts CRLFs in the working directory even if core.eol is set to "lf". Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-06-04 21:29:08 +02:00			`return EOL_CRLF;`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`case CRLF_TEXT_INPUT:`
Add "core.eol" config variable Introduce a new configuration variable, "core.eol", that allows the user to set which line endings to use for end-of-line-normalized files in the working directory. It defaults to "native", which means CRLF on Windows and LF everywhere else. Note that "core.autocrlf" overrides core.eol. This means that [core] autocrlf = true puts CRLFs in the working directory even if core.eol is set to "lf". Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-06-04 21:29:08 +02:00			`return EOL_LF;`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`case CRLF_UNDEFINED:`
			`case CRLF_AUTO_CRLF:`
convert: unify the "auto" handling of CRLF Before this change, $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes would have the same effect as $ echo "* text" >.gitattributes $ git config core.eol crlf Since the 'eol' attribute had higher priority than 'text=auto', this may corrupt binary files and is not what most users expect to happen. Make the 'eol' attribute to obey 'text=auto' and now $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes behaves the same as $ echo "* text=auto" >.gitattributes $ git config core.eol crlf In other words, $ echo "* text=auto eol=crlf" >.gitattributes has the same effect as $ git config core.autocrlf true and $ echo "* text=auto eol=lf" >.gitattributes has the same effect as $ git config core.autocrlf input Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-06-28 10:01:13 +02:00			`return EOL_CRLF;`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`case CRLF_AUTO_INPUT:`
convert: unify the "auto" handling of CRLF Before this change, $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes would have the same effect as $ echo "* text" >.gitattributes $ git config core.eol crlf Since the 'eol' attribute had higher priority than 'text=auto', this may corrupt binary files and is not what most users expect to happen. Make the 'eol' attribute to obey 'text=auto' and now $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes behaves the same as $ echo "* text=auto" >.gitattributes $ git config core.eol crlf In other words, $ echo "* text=auto eol=crlf" >.gitattributes has the same effect as $ git config core.autocrlf true and $ echo "* text=auto eol=lf" >.gitattributes has the same effect as $ git config core.autocrlf input Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-06-28 10:01:13 +02:00			`return EOL_LF;`
Add "core.eol" config variable Introduce a new configuration variable, "core.eol", that allows the user to set which line endings to use for end-of-line-normalized files in the working directory. It defaults to "native", which means CRLF on Windows and LF everywhere else. Note that "core.autocrlf" overrides core.eol. This means that [core] autocrlf = true puts CRLFs in the working directory even if core.eol is set to "lf". Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-06-04 21:29:08 +02:00			`case CRLF_TEXT:`
			`case CRLF_AUTO:`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`/* fall through */`
convert.c: use text_eol_is_crlf() Add a helper function to find out, which line endings text files should get at checkout, depending on core.autocrlf and core.eol configuration variables. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-05 17:13:25 +01:00			`return text_eol_is_crlf() ? EOL_CRLF : EOL_LF;`
Add "core.eol" config variable Introduce a new configuration variable, "core.eol", that allows the user to set which line endings to use for end-of-line-normalized files in the working directory. It defaults to "native", which means CRLF on Windows and LF everywhere else. Note that "core.autocrlf" overrides core.eol. This means that [core] autocrlf = true puts CRLFs in the working directory even if core.eol is set to "lf". Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-06-04 21:29:08 +02:00			`}`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`warning("Illegal crlf_action %d\n", (int)crlf_action);`
convert: rename the "eol" global variable to "core_eol" Yes, it is clear that "eol" wants to mean some sort of end-of-line thing, but as the name of a global variable, it is way too short to describe what kind of end-of-line thing it wants to represent. Besides, there are many codepaths that want to use their own local "char *eol" variable to point at the end of the current line they are processing. This global variable holds what we read from core.eol configuration variable. Name it as such. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 21:52:12 +02:00			`return core_eol;`
Add "core.eol" config variable Introduce a new configuration variable, "core.eol", that allows the user to set which line endings to use for end-of-line-normalized files in the working directory. It defaults to "native", which means CRLF on Windows and LF everywhere else. Note that "core.autocrlf" overrides core.eol. This means that [core] autocrlf = true puts CRLFs in the working directory even if core.eol is set to "lf". Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-06-04 21:29:08 +02:00			`}`

convert: give saner names to crlf/eol variables, types and functions Back when the conversion was only about the end-of-line convention, it might have made sense to call what we do upon seeing CR/LF simply an "action", but these days the conversion routines do a lot more than just tweaking the line ending. Raname "action" to "crlf_action". The function that decides what end of line conversion to use on the output codepath was called "determine_output_conversion", as if there is no other kind of output conversion. Rename it to "output_eol"; it is a function that returns what EOL convention is to be used. A function that decides what "crlf_action" needs to be used on the input codepath, given what conversion attribute is set to the path and global end-of-line convention, was called "determine_action". Rename it to "input_crlf_action". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:12:57 +02:00			`static void check_safe_crlf(const char *path, enum crlf_action crlf_action,`
convert: Correct NNO tests and missing `LF will be replaced by CRLF` When a non-reversible CRLF conversion is done in "git add", a warning is printed on stderr (or Git dies, depending on checksafe) The function commit_chk_wrnNNO() in t0027 was written to test this, but did the wrong thing: Instead of looking at the warning from "git add", it looked at the warning from "git commit". This is racy because "git commit" may not have to do CRLF conversion at all if it can use the sha1 value from the index (which depends on whether "add" and "commit" run in a single second). Correct t0027 and replace the commit for each and every file with a commit of all files in one go. The function commit_chk_wrnNNO() should be renamed in a separate commit. Now that t0027 does the right thing, it detects a bug in covert.c: This sequence should generate the warning `LF will be replaced by CRLF`, but does not: $ git init $ git config core.autocrlf false $ printf "Line\r\n" >file $ git add file $ git commit -m "commit with CRLF" $ git config core.autocrlf true $ printf "Line\n" >file $ git add file "git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf(). When has_cr_in_index(path) is true, crlf_to_git() returns too early and check_safe_crlf() is not called at all. Factor out the code which determines if "git checkout" converts LF->CRLF into will_convert_lf_to_crlf(). Update the logic around check_safe_crlf() and "simulate" the possible LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf(). Thanks to Jeff King <peff@peff.net> for analyzing t0027. Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-08-13 23:29:27 +02:00			`struct text_stat old_stats, struct text_stat new_stats,`
			`enum safe_crlf checksafe)`
safecrlf: Add mechanism to warn about irreversible crlf conversions CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data. If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately. Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data. This patch adds a mechanism that can either warn the user about an irreversible conversion or can even refuse to convert. The mechanism is controlled by the variable core.safecrlf, with the following values: - false: disable safecrlf mechanism - warn: warn about irreversible conversions - true: refuse irreversible conversions The default is to warn. Users are only affected by this default if core.autocrlf is set. But the current default of git is to leave core.autocrlf unset, so users will not see warnings unless they deliberately chose to activate the autocrlf mechanism. The safecrlf mechanism's details depend on the git command. The general principles when safecrlf is active (not false) are: - we warn/error out if files in the work tree can modified in an irreversible way without giving the user a chance to backup the original file. - for read-only operations that do not modify files in the work tree we do not not print annoying warnings. There are exceptions. Even though... - "git add" itself does not touch the files in the work tree, the next checkout would, so the safety triggers; - "git apply" to update a text file with a patch does touch the files in the work tree, but the operation is about text files and CRLF conversion is about fixing the line ending inconsistencies, so the safety does not trigger; - "git diff" itself does not touch the files in the work tree, it is often run to inspect the changes you intend to next "git add". To catch potential problems early, safety triggers. The concept of a safety check was originally proposed in a similar way by Linus Torvalds. Thanks to Dimitry Potapov for insisting on getting the naked LF/autocrlf=true case right. Signed-off-by: Steffen Prohaska <prohaska@zib.de> 2008-02-06 12:25:58 +01:00			`{`
convert: Correct NNO tests and missing `LF will be replaced by CRLF` When a non-reversible CRLF conversion is done in "git add", a warning is printed on stderr (or Git dies, depending on checksafe) The function commit_chk_wrnNNO() in t0027 was written to test this, but did the wrong thing: Instead of looking at the warning from "git add", it looked at the warning from "git commit". This is racy because "git commit" may not have to do CRLF conversion at all if it can use the sha1 value from the index (which depends on whether "add" and "commit" run in a single second). Correct t0027 and replace the commit for each and every file with a commit of all files in one go. The function commit_chk_wrnNNO() should be renamed in a separate commit. Now that t0027 does the right thing, it detects a bug in covert.c: This sequence should generate the warning `LF will be replaced by CRLF`, but does not: $ git init $ git config core.autocrlf false $ printf "Line\r\n" >file $ git add file $ git commit -m "commit with CRLF" $ git config core.autocrlf true $ printf "Line\n" >file $ git add file "git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf(). When has_cr_in_index(path) is true, crlf_to_git() returns too early and check_safe_crlf() is not called at all. Factor out the code which determines if "git checkout" converts LF->CRLF into will_convert_lf_to_crlf(). Update the logic around check_safe_crlf() and "simulate" the possible LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf(). Thanks to Jeff King <peff@peff.net> for analyzing t0027. Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-08-13 23:29:27 +02:00			`if (old_stats->crlf && !new_stats->crlf ) {`
safecrlf: Add mechanism to warn about irreversible crlf conversions CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data. If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately. Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data. This patch adds a mechanism that can either warn the user about an irreversible conversion or can even refuse to convert. The mechanism is controlled by the variable core.safecrlf, with the following values: - false: disable safecrlf mechanism - warn: warn about irreversible conversions - true: refuse irreversible conversions The default is to warn. Users are only affected by this default if core.autocrlf is set. But the current default of git is to leave core.autocrlf unset, so users will not see warnings unless they deliberately chose to activate the autocrlf mechanism. The safecrlf mechanism's details depend on the git command. The general principles when safecrlf is active (not false) are: - we warn/error out if files in the work tree can modified in an irreversible way without giving the user a chance to backup the original file. - for read-only operations that do not modify files in the work tree we do not not print annoying warnings. There are exceptions. Even though... - "git add" itself does not touch the files in the work tree, the next checkout would, so the safety triggers; - "git apply" to update a text file with a patch does touch the files in the work tree, but the operation is about text files and CRLF conversion is about fixing the line ending inconsistencies, so the safety does not trigger; - "git diff" itself does not touch the files in the work tree, it is often run to inspect the changes you intend to next "git add". To catch potential problems early, safety triggers. The concept of a safety check was originally proposed in a similar way by Linus Torvalds. Thanks to Dimitry Potapov for insisting on getting the naked LF/autocrlf=true case right. Signed-off-by: Steffen Prohaska <prohaska@zib.de> 2008-02-06 12:25:58 +01:00			`/*`
convert: Correct NNO tests and missing `LF will be replaced by CRLF` When a non-reversible CRLF conversion is done in "git add", a warning is printed on stderr (or Git dies, depending on checksafe) The function commit_chk_wrnNNO() in t0027 was written to test this, but did the wrong thing: Instead of looking at the warning from "git add", it looked at the warning from "git commit". This is racy because "git commit" may not have to do CRLF conversion at all if it can use the sha1 value from the index (which depends on whether "add" and "commit" run in a single second). Correct t0027 and replace the commit for each and every file with a commit of all files in one go. The function commit_chk_wrnNNO() should be renamed in a separate commit. Now that t0027 does the right thing, it detects a bug in covert.c: This sequence should generate the warning `LF will be replaced by CRLF`, but does not: $ git init $ git config core.autocrlf false $ printf "Line\r\n" >file $ git add file $ git commit -m "commit with CRLF" $ git config core.autocrlf true $ printf "Line\n" >file $ git add file "git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf(). When has_cr_in_index(path) is true, crlf_to_git() returns too early and check_safe_crlf() is not called at all. Factor out the code which determines if "git checkout" converts LF->CRLF into will_convert_lf_to_crlf(). Update the logic around check_safe_crlf() and "simulate" the possible LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf(). Thanks to Jeff King <peff@peff.net> for analyzing t0027. Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-08-13 23:29:27 +02:00			`* CRLFs would not be restored by checkout`
safecrlf: Add mechanism to warn about irreversible crlf conversions CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data. If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately. Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data. This patch adds a mechanism that can either warn the user about an irreversible conversion or can even refuse to convert. The mechanism is controlled by the variable core.safecrlf, with the following values: - false: disable safecrlf mechanism - warn: warn about irreversible conversions - true: refuse irreversible conversions The default is to warn. Users are only affected by this default if core.autocrlf is set. But the current default of git is to leave core.autocrlf unset, so users will not see warnings unless they deliberately chose to activate the autocrlf mechanism. The safecrlf mechanism's details depend on the git command. The general principles when safecrlf is active (not false) are: - we warn/error out if files in the work tree can modified in an irreversible way without giving the user a chance to backup the original file. - for read-only operations that do not modify files in the work tree we do not not print annoying warnings. There are exceptions. Even though... - "git add" itself does not touch the files in the work tree, the next checkout would, so the safety triggers; - "git apply" to update a text file with a patch does touch the files in the work tree, but the operation is about text files and CRLF conversion is about fixing the line ending inconsistencies, so the safety does not trigger; - "git diff" itself does not touch the files in the work tree, it is often run to inspect the changes you intend to next "git add". To catch potential problems early, safety triggers. The concept of a safety check was originally proposed in a similar way by Linus Torvalds. Thanks to Dimitry Potapov for insisting on getting the naked LF/autocrlf=true case right. Signed-off-by: Steffen Prohaska <prohaska@zib.de> 2008-02-06 12:25:58 +01:00			`*/`
convert: Correct NNO tests and missing `LF will be replaced by CRLF` When a non-reversible CRLF conversion is done in "git add", a warning is printed on stderr (or Git dies, depending on checksafe) The function commit_chk_wrnNNO() in t0027 was written to test this, but did the wrong thing: Instead of looking at the warning from "git add", it looked at the warning from "git commit". This is racy because "git commit" may not have to do CRLF conversion at all if it can use the sha1 value from the index (which depends on whether "add" and "commit" run in a single second). Correct t0027 and replace the commit for each and every file with a commit of all files in one go. The function commit_chk_wrnNNO() should be renamed in a separate commit. Now that t0027 does the right thing, it detects a bug in covert.c: This sequence should generate the warning `LF will be replaced by CRLF`, but does not: $ git init $ git config core.autocrlf false $ printf "Line\r\n" >file $ git add file $ git commit -m "commit with CRLF" $ git config core.autocrlf true $ printf "Line\n" >file $ git add file "git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf(). When has_cr_in_index(path) is true, crlf_to_git() returns too early and check_safe_crlf() is not called at all. Factor out the code which determines if "git checkout" converts LF->CRLF into will_convert_lf_to_crlf(). Update the logic around check_safe_crlf() and "simulate" the possible LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf(). Thanks to Jeff King <peff@peff.net> for analyzing t0027. Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-08-13 23:29:27 +02:00			`if (checksafe == SAFE_CRLF_WARN)`
i18n: convert mark error messages for translation Mark error messages about CRLF for translation. Update test to reflect changes. Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 15:15:27 +02:00			`warning(_("CRLF will be replaced by LF in %s.\n"`
			`"The file will have its original line"`
			`" endings in your working directory."), path);`
convert: Correct NNO tests and missing `LF will be replaced by CRLF` When a non-reversible CRLF conversion is done in "git add", a warning is printed on stderr (or Git dies, depending on checksafe) The function commit_chk_wrnNNO() in t0027 was written to test this, but did the wrong thing: Instead of looking at the warning from "git add", it looked at the warning from "git commit". This is racy because "git commit" may not have to do CRLF conversion at all if it can use the sha1 value from the index (which depends on whether "add" and "commit" run in a single second). Correct t0027 and replace the commit for each and every file with a commit of all files in one go. The function commit_chk_wrnNNO() should be renamed in a separate commit. Now that t0027 does the right thing, it detects a bug in covert.c: This sequence should generate the warning `LF will be replaced by CRLF`, but does not: $ git init $ git config core.autocrlf false $ printf "Line\r\n" >file $ git add file $ git commit -m "commit with CRLF" $ git config core.autocrlf true $ printf "Line\n" >file $ git add file "git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf(). When has_cr_in_index(path) is true, crlf_to_git() returns too early and check_safe_crlf() is not called at all. Factor out the code which determines if "git checkout" converts LF->CRLF into will_convert_lf_to_crlf(). Update the logic around check_safe_crlf() and "simulate" the possible LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf(). Thanks to Jeff King <peff@peff.net> for analyzing t0027. Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-08-13 23:29:27 +02:00			`else /* i.e. SAFE_CRLF_FAIL */`
i18n: convert mark error messages for translation Mark error messages about CRLF for translation. Update test to reflect changes. Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 15:15:27 +02:00			`die(_("CRLF would be replaced by LF in %s."), path);`
convert: Correct NNO tests and missing `LF will be replaced by CRLF` When a non-reversible CRLF conversion is done in "git add", a warning is printed on stderr (or Git dies, depending on checksafe) The function commit_chk_wrnNNO() in t0027 was written to test this, but did the wrong thing: Instead of looking at the warning from "git add", it looked at the warning from "git commit". This is racy because "git commit" may not have to do CRLF conversion at all if it can use the sha1 value from the index (which depends on whether "add" and "commit" run in a single second). Correct t0027 and replace the commit for each and every file with a commit of all files in one go. The function commit_chk_wrnNNO() should be renamed in a separate commit. Now that t0027 does the right thing, it detects a bug in covert.c: This sequence should generate the warning `LF will be replaced by CRLF`, but does not: $ git init $ git config core.autocrlf false $ printf "Line\r\n" >file $ git add file $ git commit -m "commit with CRLF" $ git config core.autocrlf true $ printf "Line\n" >file $ git add file "git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf(). When has_cr_in_index(path) is true, crlf_to_git() returns too early and check_safe_crlf() is not called at all. Factor out the code which determines if "git checkout" converts LF->CRLF into will_convert_lf_to_crlf(). Update the logic around check_safe_crlf() and "simulate" the possible LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf(). Thanks to Jeff King <peff@peff.net> for analyzing t0027. Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-08-13 23:29:27 +02:00			`} else if (old_stats->lonelf && !new_stats->lonelf ) {`
safecrlf: Add mechanism to warn about irreversible crlf conversions CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data. If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately. Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data. This patch adds a mechanism that can either warn the user about an irreversible conversion or can even refuse to convert. The mechanism is controlled by the variable core.safecrlf, with the following values: - false: disable safecrlf mechanism - warn: warn about irreversible conversions - true: refuse irreversible conversions The default is to warn. Users are only affected by this default if core.autocrlf is set. But the current default of git is to leave core.autocrlf unset, so users will not see warnings unless they deliberately chose to activate the autocrlf mechanism. The safecrlf mechanism's details depend on the git command. The general principles when safecrlf is active (not false) are: - we warn/error out if files in the work tree can modified in an irreversible way without giving the user a chance to backup the original file. - for read-only operations that do not modify files in the work tree we do not not print annoying warnings. There are exceptions. Even though... - "git add" itself does not touch the files in the work tree, the next checkout would, so the safety triggers; - "git apply" to update a text file with a patch does touch the files in the work tree, but the operation is about text files and CRLF conversion is about fixing the line ending inconsistencies, so the safety does not trigger; - "git diff" itself does not touch the files in the work tree, it is often run to inspect the changes you intend to next "git add". To catch potential problems early, safety triggers. The concept of a safety check was originally proposed in a similar way by Linus Torvalds. Thanks to Dimitry Potapov for insisting on getting the naked LF/autocrlf=true case right. Signed-off-by: Steffen Prohaska <prohaska@zib.de> 2008-02-06 12:25:58 +01:00			`/*`
convert: Correct NNO tests and missing `LF will be replaced by CRLF` When a non-reversible CRLF conversion is done in "git add", a warning is printed on stderr (or Git dies, depending on checksafe) The function commit_chk_wrnNNO() in t0027 was written to test this, but did the wrong thing: Instead of looking at the warning from "git add", it looked at the warning from "git commit". This is racy because "git commit" may not have to do CRLF conversion at all if it can use the sha1 value from the index (which depends on whether "add" and "commit" run in a single second). Correct t0027 and replace the commit for each and every file with a commit of all files in one go. The function commit_chk_wrnNNO() should be renamed in a separate commit. Now that t0027 does the right thing, it detects a bug in covert.c: This sequence should generate the warning `LF will be replaced by CRLF`, but does not: $ git init $ git config core.autocrlf false $ printf "Line\r\n" >file $ git add file $ git commit -m "commit with CRLF" $ git config core.autocrlf true $ printf "Line\n" >file $ git add file "git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf(). When has_cr_in_index(path) is true, crlf_to_git() returns too early and check_safe_crlf() is not called at all. Factor out the code which determines if "git checkout" converts LF->CRLF into will_convert_lf_to_crlf(). Update the logic around check_safe_crlf() and "simulate" the possible LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf(). Thanks to Jeff King <peff@peff.net> for analyzing t0027. Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-08-13 23:29:27 +02:00			`* CRLFs would be added by checkout`
safecrlf: Add mechanism to warn about irreversible crlf conversions CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data. If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately. Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data. This patch adds a mechanism that can either warn the user about an irreversible conversion or can even refuse to convert. The mechanism is controlled by the variable core.safecrlf, with the following values: - false: disable safecrlf mechanism - warn: warn about irreversible conversions - true: refuse irreversible conversions The default is to warn. Users are only affected by this default if core.autocrlf is set. But the current default of git is to leave core.autocrlf unset, so users will not see warnings unless they deliberately chose to activate the autocrlf mechanism. The safecrlf mechanism's details depend on the git command. The general principles when safecrlf is active (not false) are: - we warn/error out if files in the work tree can modified in an irreversible way without giving the user a chance to backup the original file. - for read-only operations that do not modify files in the work tree we do not not print annoying warnings. There are exceptions. Even though... - "git add" itself does not touch the files in the work tree, the next checkout would, so the safety triggers; - "git apply" to update a text file with a patch does touch the files in the work tree, but the operation is about text files and CRLF conversion is about fixing the line ending inconsistencies, so the safety does not trigger; - "git diff" itself does not touch the files in the work tree, it is often run to inspect the changes you intend to next "git add". To catch potential problems early, safety triggers. The concept of a safety check was originally proposed in a similar way by Linus Torvalds. Thanks to Dimitry Potapov for insisting on getting the naked LF/autocrlf=true case right. Signed-off-by: Steffen Prohaska <prohaska@zib.de> 2008-02-06 12:25:58 +01:00			`*/`
convert: Correct NNO tests and missing `LF will be replaced by CRLF` When a non-reversible CRLF conversion is done in "git add", a warning is printed on stderr (or Git dies, depending on checksafe) The function commit_chk_wrnNNO() in t0027 was written to test this, but did the wrong thing: Instead of looking at the warning from "git add", it looked at the warning from "git commit". This is racy because "git commit" may not have to do CRLF conversion at all if it can use the sha1 value from the index (which depends on whether "add" and "commit" run in a single second). Correct t0027 and replace the commit for each and every file with a commit of all files in one go. The function commit_chk_wrnNNO() should be renamed in a separate commit. Now that t0027 does the right thing, it detects a bug in covert.c: This sequence should generate the warning `LF will be replaced by CRLF`, but does not: $ git init $ git config core.autocrlf false $ printf "Line\r\n" >file $ git add file $ git commit -m "commit with CRLF" $ git config core.autocrlf true $ printf "Line\n" >file $ git add file "git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf(). When has_cr_in_index(path) is true, crlf_to_git() returns too early and check_safe_crlf() is not called at all. Factor out the code which determines if "git checkout" converts LF->CRLF into will_convert_lf_to_crlf(). Update the logic around check_safe_crlf() and "simulate" the possible LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf(). Thanks to Jeff King <peff@peff.net> for analyzing t0027. Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-08-13 23:29:27 +02:00			`if (checksafe == SAFE_CRLF_WARN)`
i18n: convert mark error messages for translation Mark error messages about CRLF for translation. Update test to reflect changes. Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 15:15:27 +02:00			`warning(_("LF will be replaced by CRLF in %s.\n"`
			`"The file will have its original line"`
			`" endings in your working directory."), path);`
convert: Correct NNO tests and missing `LF will be replaced by CRLF` When a non-reversible CRLF conversion is done in "git add", a warning is printed on stderr (or Git dies, depending on checksafe) The function commit_chk_wrnNNO() in t0027 was written to test this, but did the wrong thing: Instead of looking at the warning from "git add", it looked at the warning from "git commit". This is racy because "git commit" may not have to do CRLF conversion at all if it can use the sha1 value from the index (which depends on whether "add" and "commit" run in a single second). Correct t0027 and replace the commit for each and every file with a commit of all files in one go. The function commit_chk_wrnNNO() should be renamed in a separate commit. Now that t0027 does the right thing, it detects a bug in covert.c: This sequence should generate the warning `LF will be replaced by CRLF`, but does not: $ git init $ git config core.autocrlf false $ printf "Line\r\n" >file $ git add file $ git commit -m "commit with CRLF" $ git config core.autocrlf true $ printf "Line\n" >file $ git add file "git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf(). When has_cr_in_index(path) is true, crlf_to_git() returns too early and check_safe_crlf() is not called at all. Factor out the code which determines if "git checkout" converts LF->CRLF into will_convert_lf_to_crlf(). Update the logic around check_safe_crlf() and "simulate" the possible LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf(). Thanks to Jeff King <peff@peff.net> for analyzing t0027. Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-08-13 23:29:27 +02:00			`else /* i.e. SAFE_CRLF_FAIL */`
i18n: convert mark error messages for translation Mark error messages about CRLF for translation. Update test to reflect changes. Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 15:15:27 +02:00			`die(_("LF would be replaced by CRLF in %s"), path);`
safecrlf: Add mechanism to warn about irreversible crlf conversions CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data. If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately. Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data. This patch adds a mechanism that can either warn the user about an irreversible conversion or can even refuse to convert. The mechanism is controlled by the variable core.safecrlf, with the following values: - false: disable safecrlf mechanism - warn: warn about irreversible conversions - true: refuse irreversible conversions The default is to warn. Users are only affected by this default if core.autocrlf is set. But the current default of git is to leave core.autocrlf unset, so users will not see warnings unless they deliberately chose to activate the autocrlf mechanism. The safecrlf mechanism's details depend on the git command. The general principles when safecrlf is active (not false) are: - we warn/error out if files in the work tree can modified in an irreversible way without giving the user a chance to backup the original file. - for read-only operations that do not modify files in the work tree we do not not print annoying warnings. There are exceptions. Even though... - "git add" itself does not touch the files in the work tree, the next checkout would, so the safety triggers; - "git apply" to update a text file with a patch does touch the files in the work tree, but the operation is about text files and CRLF conversion is about fixing the line ending inconsistencies, so the safety does not trigger; - "git diff" itself does not touch the files in the work tree, it is often run to inspect the changes you intend to next "git add". To catch potential problems early, safety triggers. The concept of a safety check was originally proposed in a similar way by Linus Torvalds. Thanks to Dimitry Potapov for insisting on getting the naked LF/autocrlf=true case right. Signed-off-by: Steffen Prohaska <prohaska@zib.de> 2008-02-06 12:25:58 +01:00			`}`
			`}`

autocrlf: Make it work also for un-normalized repositories Previously, autocrlf would only work well for normalized repositories. Any text files that contained CRLF in the repository would cause problems, and would be modified when handled with core.autocrlf set. Change autocrlf to not do any conversions to files that in the repository already contain a CR. git with autocrlf set will never create such a file, or change a LF only file to contain CRs, so the (new) assumption is that if a file contains a CR, it is intentional, and autocrlf should not change that. The following sequence should now always be a NOP even with autocrlf set (assuming a clean working directory): git checkout <something> touch * git add -A . (will add nothing) git commit (nothing to commit) Previously this would break for any text file containing a CR. Some of you may have been folowing Eyvind's excellent thread about trying to make end-of-line translation in git a bit smoother. I decided to attack the problem from a different angle: Is it possible to make autocrlf behave non-destructively for all the previous problem cases? Stealing the problem from Eyvind's initial mail (paraphrased and summarized a bit): 1. Setting autocrlf globally is a pain since autocrlf does not work well with CRLF in the repo 2. Setting it in individual repos is hard since you do it "too late" (the clone will get it wrong) 3. If someone checks in a file with CRLF later, you get into problems again 4. If a repository once has contained CRLF, you can't tell autocrlf at which commit everything is sane again 5. autocrlf does needless work if you know that all your users want the same EOL style. I belive that this patch makes autocrlf a safe (and good) default setting for Windows, and this solves problems 1-4 (it solves 2 by being set by default, which is early enough for clone). I implemented it by looking for CR charactes in the index, and aborting any conversion attempt if this is found. Signed-off-by: Finn Arne Gangstad <finag@pvv.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-05-12 00:37:57 +02:00			`static int has_cr_in_index(const char *path)`
			`{`
			`unsigned long sz;`
			`void *data;`
			`int has_cr;`

convert.c: remove duplicate code The has_cr_in_index() function is an almost 1:1 copy of read_blob_data_from_index() with some additions. Use the latter instead of using copy-pasted code. Signed-off-by: Lukas Fleischer <git@cryptocrack.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-04-13 15:28:32 +02:00			`data = read_blob_data_from_cache(path, &sz);`
			`if (!data)`
autocrlf: Make it work also for un-normalized repositories Previously, autocrlf would only work well for normalized repositories. Any text files that contained CRLF in the repository would cause problems, and would be modified when handled with core.autocrlf set. Change autocrlf to not do any conversions to files that in the repository already contain a CR. git with autocrlf set will never create such a file, or change a LF only file to contain CRs, so the (new) assumption is that if a file contains a CR, it is intentional, and autocrlf should not change that. The following sequence should now always be a NOP even with autocrlf set (assuming a clean working directory): git checkout <something> touch * git add -A . (will add nothing) git commit (nothing to commit) Previously this would break for any text file containing a CR. Some of you may have been folowing Eyvind's excellent thread about trying to make end-of-line translation in git a bit smoother. I decided to attack the problem from a different angle: Is it possible to make autocrlf behave non-destructively for all the previous problem cases? Stealing the problem from Eyvind's initial mail (paraphrased and summarized a bit): 1. Setting autocrlf globally is a pain since autocrlf does not work well with CRLF in the repo 2. Setting it in individual repos is hard since you do it "too late" (the clone will get it wrong) 3. If someone checks in a file with CRLF later, you get into problems again 4. If a repository once has contained CRLF, you can't tell autocrlf at which commit everything is sane again 5. autocrlf does needless work if you know that all your users want the same EOL style. I belive that this patch makes autocrlf a safe (and good) default setting for Windows, and this solves problems 1-4 (it solves 2 by being set by default, which is early enough for clone). I implemented it by looking for CR charactes in the index, and aborting any conversion attempt if this is found. Signed-off-by: Finn Arne Gangstad <finag@pvv.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-05-12 00:37:57 +02:00			`return 0;`
			`has_cr = memchr(data, '\r', sz) != NULL;`
			`free(data);`
			`return has_cr;`
			`}`

convert: Correct NNO tests and missing `LF will be replaced by CRLF` When a non-reversible CRLF conversion is done in "git add", a warning is printed on stderr (or Git dies, depending on checksafe) The function commit_chk_wrnNNO() in t0027 was written to test this, but did the wrong thing: Instead of looking at the warning from "git add", it looked at the warning from "git commit". This is racy because "git commit" may not have to do CRLF conversion at all if it can use the sha1 value from the index (which depends on whether "add" and "commit" run in a single second). Correct t0027 and replace the commit for each and every file with a commit of all files in one go. The function commit_chk_wrnNNO() should be renamed in a separate commit. Now that t0027 does the right thing, it detects a bug in covert.c: This sequence should generate the warning `LF will be replaced by CRLF`, but does not: $ git init $ git config core.autocrlf false $ printf "Line\r\n" >file $ git add file $ git commit -m "commit with CRLF" $ git config core.autocrlf true $ printf "Line\n" >file $ git add file "git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf(). When has_cr_in_index(path) is true, crlf_to_git() returns too early and check_safe_crlf() is not called at all. Factor out the code which determines if "git checkout" converts LF->CRLF into will_convert_lf_to_crlf(). Update the logic around check_safe_crlf() and "simulate" the possible LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf(). Thanks to Jeff King <peff@peff.net> for analyzing t0027. Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-08-13 23:29:27 +02:00			`static int will_convert_lf_to_crlf(size_t len, struct text_stat *stats,`
			`enum crlf_action crlf_action)`
			`{`
			`if (output_eol(crlf_action) != EOL_CRLF)`
			`return 0;`
			`/* No "naked" LF? Nothing to convert, regardless. */`
			`if (!stats->lonelf)`
			`return 0;`

			`if (crlf_action == CRLF_AUTO \|\| crlf_action == CRLF_AUTO_INPUT \|\| crlf_action == CRLF_AUTO_CRLF) {`
			`/* If we have any CR or CRLF line endings, we do not touch it */`
			`/* This is the new safer autocrlf-handling */`
			`if (stats->lonecr \|\| stats->crlf)`
			`return 0;`

			`if (convert_is_binary(len, stats))`
			`return 0;`
			`}`
			`return 1;`

			`}`

Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`static int crlf_to_git(const char path, const char src, size_t len,`
convert: give saner names to crlf/eol variables, types and functions Back when the conversion was only about the end-of-line convention, it might have made sense to call what we do upon seeing CR/LF simply an "action", but these days the conversion routines do a lot more than just tweaking the line ending. Raname "action" to "crlf_action". The function that decides what end of line conversion to use on the output codepath was called "determine_output_conversion", as if there is no other kind of output conversion. Rename it to "output_eol"; it is a function that returns what EOL convention is to be used. A function that decides what "crlf_action" needs to be used on the input codepath, given what conversion attribute is set to the path and global end-of-line convention, was called "determine_action". Rename it to "input_crlf_action". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:12:57 +02:00			`struct strbuf *buf,`
			`enum crlf_action crlf_action, enum safe_crlf checksafe)`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`{`
			`struct text_stat stats;`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`char *dst;`
convert: Correct NNO tests and missing `LF will be replaced by CRLF` When a non-reversible CRLF conversion is done in "git add", a warning is printed on stderr (or Git dies, depending on checksafe) The function commit_chk_wrnNNO() in t0027 was written to test this, but did the wrong thing: Instead of looking at the warning from "git add", it looked at the warning from "git commit". This is racy because "git commit" may not have to do CRLF conversion at all if it can use the sha1 value from the index (which depends on whether "add" and "commit" run in a single second). Correct t0027 and replace the commit for each and every file with a commit of all files in one go. The function commit_chk_wrnNNO() should be renamed in a separate commit. Now that t0027 does the right thing, it detects a bug in covert.c: This sequence should generate the warning `LF will be replaced by CRLF`, but does not: $ git init $ git config core.autocrlf false $ printf "Line\r\n" >file $ git add file $ git commit -m "commit with CRLF" $ git config core.autocrlf true $ printf "Line\n" >file $ git add file "git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf(). When has_cr_in_index(path) is true, crlf_to_git() returns too early and check_safe_crlf() is not called at all. Factor out the code which determines if "git checkout" converts LF->CRLF into will_convert_lf_to_crlf(). Update the logic around check_safe_crlf() and "simulate" the possible LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf(). Thanks to Jeff King <peff@peff.net> for analyzing t0027. Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-08-13 23:29:27 +02:00			`int convert_crlf_into_lf;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
convert: give saner names to crlf/eol variables, types and functions Back when the conversion was only about the end-of-line convention, it might have made sense to call what we do upon seeing CR/LF simply an "action", but these days the conversion routines do a lot more than just tweaking the line ending. Raname "action" to "crlf_action". The function that decides what end of line conversion to use on the output codepath was called "determine_output_conversion", as if there is no other kind of output conversion. Rename it to "output_eol"; it is a function that returns what EOL convention is to be used. A function that decides what "crlf_action" needs to be used on the input codepath, given what conversion attribute is set to the path and global end-of-line convention, was called "determine_action". Rename it to "input_crlf_action". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:12:57 +02:00			`if (crlf_action == CRLF_BINARY \|\|`
teach dry-run convert_to_git not to require a src buffer When we call convert_to_git in dry-run mode, it may still want to look at the source buffer, because some CRLF conversion modes depend on analyzing the source to determine whether it is in fact convertible CRLF text. However, the main motivation for convert_to_git's dry-run mode is that we would decide which method to use to acquire the blob's data (streaming versus in-core). Requiring this source analysis creates a chicken-and-egg problem. We are better off simply guessing that anything we can't analyze will end up needing conversion. This patch lets a caller specify a NULL src buffer when using dry-run mode (and only dry-run mode). A non-zero return value goes from "we would convert" to "we might convert"; a zero return value remains "we would definitely not convert". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-24 23:05:03 +01:00			`(src && !len))`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return 0;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
teach dry-run convert_to_git not to require a src buffer When we call convert_to_git in dry-run mode, it may still want to look at the source buffer, because some CRLF conversion modes depend on analyzing the source to determine whether it is in fact convertible CRLF text. However, the main motivation for convert_to_git's dry-run mode is that we would decide which method to use to acquire the blob's data (streaming versus in-core). Requiring this source analysis creates a chicken-and-egg problem. We are better off simply guessing that anything we can't analyze will end up needing conversion. This patch lets a caller specify a NULL src buffer when using dry-run mode (and only dry-run mode). A non-zero return value goes from "we would convert" to "we might convert"; a zero return value remains "we would definitely not convert". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-24 23:05:03 +01:00			`/*`
			`* If we are doing a dry-run and have no source buffer, there is`
			`* nothing to analyze; we must assume we would convert.`
			`*/`
			`if (!buf && !src)`
			`return 1;`

Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`gather_stats(src, len, &stats);`
convert: Correct NNO tests and missing `LF will be replaced by CRLF` When a non-reversible CRLF conversion is done in "git add", a warning is printed on stderr (or Git dies, depending on checksafe) The function commit_chk_wrnNNO() in t0027 was written to test this, but did the wrong thing: Instead of looking at the warning from "git add", it looked at the warning from "git commit". This is racy because "git commit" may not have to do CRLF conversion at all if it can use the sha1 value from the index (which depends on whether "add" and "commit" run in a single second). Correct t0027 and replace the commit for each and every file with a commit of all files in one go. The function commit_chk_wrnNNO() should be renamed in a separate commit. Now that t0027 does the right thing, it detects a bug in covert.c: This sequence should generate the warning `LF will be replaced by CRLF`, but does not: $ git init $ git config core.autocrlf false $ printf "Line\r\n" >file $ git add file $ git commit -m "commit with CRLF" $ git config core.autocrlf true $ printf "Line\n" >file $ git add file "git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf(). When has_cr_in_index(path) is true, crlf_to_git() returns too early and check_safe_crlf() is not called at all. Factor out the code which determines if "git checkout" converts LF->CRLF into will_convert_lf_to_crlf(). Update the logic around check_safe_crlf() and "simulate" the possible LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf(). Thanks to Jeff King <peff@peff.net> for analyzing t0027. Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-08-13 23:29:27 +02:00			`/* Optimization: No CRLF? Nothing to convert, regardless. */`
			`convert_crlf_into_lf = !!stats.crlf;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`if (crlf_action == CRLF_AUTO \|\| crlf_action == CRLF_AUTO_INPUT \|\| crlf_action == CRLF_AUTO_CRLF) {`
ls-files: add eol diagnostics When working in a cross-platform environment, a user may want to check if text files are stored normalized in the repository and if .gitattributes are set appropriately. Make it possible to let Git show the line endings in the index and in the working tree and the effective text/eol attributes. The end of line ("eolinfo") are shown like this: "-text" binary (or with bare CR) file "none" text file without any EOL "lf" text file with LF "crlf" text file with CRLF "mixed" text file with mixed line endings. The effective text/eol attribute is one of these: "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf" git ls-files --eol gives an output like this: i/none w/none attr/text=auto t/t5100/empty i/-text w/-text attr/-text t/test-binary-2.png i/lf w/lf attr/text eol=lf t/t5100/rfc2047-info-0007 i/lf w/crlf attr/text eol=crlf doit.bat i/mixed w/mixed attr/ locale/XX.po to show what eol convention is used in the data in the index ('i'), and in the working tree ('w'), and what attribute is in effect, for each path that is shown. Add test cases in t0027. Helped-By: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-01-16 07:50:02 +01:00			`if (convert_is_binary(len, &stats))`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return 0;`
convert: unify the "auto" handling of CRLF Before this change, $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes would have the same effect as $ echo "* text" >.gitattributes $ git config core.eol crlf Since the 'eol' attribute had higher priority than 'text=auto', this may corrupt binary files and is not what most users expect to happen. Make the 'eol' attribute to obey 'text=auto' and now $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes behaves the same as $ echo "* text=auto" >.gitattributes $ git config core.eol crlf In other words, $ echo "* text=auto eol=crlf" >.gitattributes has the same effect as $ git config core.autocrlf true and $ echo "* text=auto eol=lf" >.gitattributes has the same effect as $ git config core.autocrlf input Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-06-28 10:01:13 +02:00			`/*`
convert: git cherry-pick -Xrenormalize did not work Working with a repo that used to be all CRLF. At some point it was changed to all LF, with `text=auto` in .gitattributes. Trying to cherry-pick a commit from before the switchover fails: $ git cherry-pick -Xrenormalize <commit> fatal: CRLF would be replaced by LF in [path] Commit 65237284 "unify the "auto" handling of CRLF" introduced a regression: Whenever crlf_action is CRLF_TEXT_XXX and not CRLF_AUTO_XXX, SAFE_CRLF_RENORMALIZE was feed into check_safe_crlf(). This is wrong because here everything else than SAFE_CRLF_WARN is treated as SAFE_CRLF_FAIL. Call check_safe_crlf() only if checksafe is SAFE_CRLF_WARN or SAFE_CRLF_FAIL. Reported-by: Eevee (Lexy Munroe) <eevee@veekun.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-11-30 18:02:32 +01:00			`* If the file in the index has any CR in it, do not`
			`* convert. This is the new safer autocrlf handling,`
			`* unless we want to renormalize in a merge or`
			`* cherry-pick.`
convert: unify the "auto" handling of CRLF Before this change, $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes would have the same effect as $ echo "* text" >.gitattributes $ git config core.eol crlf Since the 'eol' attribute had higher priority than 'text=auto', this may corrupt binary files and is not what most users expect to happen. Make the 'eol' attribute to obey 'text=auto' and now $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes behaves the same as $ echo "* text=auto" >.gitattributes $ git config core.eol crlf In other words, $ echo "* text=auto eol=crlf" >.gitattributes has the same effect as $ git config core.autocrlf true and $ echo "* text=auto eol=lf" >.gitattributes has the same effect as $ git config core.autocrlf input Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-06-28 10:01:13 +02:00			`*/`
convert: git cherry-pick -Xrenormalize did not work Working with a repo that used to be all CRLF. At some point it was changed to all LF, with `text=auto` in .gitattributes. Trying to cherry-pick a commit from before the switchover fails: $ git cherry-pick -Xrenormalize <commit> fatal: CRLF would be replaced by LF in [path] Commit 65237284 "unify the "auto" handling of CRLF" introduced a regression: Whenever crlf_action is CRLF_TEXT_XXX and not CRLF_AUTO_XXX, SAFE_CRLF_RENORMALIZE was feed into check_safe_crlf(). This is wrong because here everything else than SAFE_CRLF_WARN is treated as SAFE_CRLF_FAIL. Call check_safe_crlf() only if checksafe is SAFE_CRLF_WARN or SAFE_CRLF_FAIL. Reported-by: Eevee (Lexy Munroe) <eevee@veekun.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-11-30 18:02:32 +01:00			`if ((checksafe != SAFE_CRLF_RENORMALIZE) && has_cr_in_index(path))`
convert: Correct NNO tests and missing `LF will be replaced by CRLF` When a non-reversible CRLF conversion is done in "git add", a warning is printed on stderr (or Git dies, depending on checksafe) The function commit_chk_wrnNNO() in t0027 was written to test this, but did the wrong thing: Instead of looking at the warning from "git add", it looked at the warning from "git commit". This is racy because "git commit" may not have to do CRLF conversion at all if it can use the sha1 value from the index (which depends on whether "add" and "commit" run in a single second). Correct t0027 and replace the commit for each and every file with a commit of all files in one go. The function commit_chk_wrnNNO() should be renamed in a separate commit. Now that t0027 does the right thing, it detects a bug in covert.c: This sequence should generate the warning `LF will be replaced by CRLF`, but does not: $ git init $ git config core.autocrlf false $ printf "Line\r\n" >file $ git add file $ git commit -m "commit with CRLF" $ git config core.autocrlf true $ printf "Line\n" >file $ git add file "git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf(). When has_cr_in_index(path) is true, crlf_to_git() returns too early and check_safe_crlf() is not called at all. Factor out the code which determines if "git checkout" converts LF->CRLF into will_convert_lf_to_crlf(). Update the logic around check_safe_crlf() and "simulate" the possible LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf(). Thanks to Jeff King <peff@peff.net> for analyzing t0027. Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-08-13 23:29:27 +02:00			`convert_crlf_into_lf = 0;`
Fix 'crlf' attribute semantics. Earlier we said 'crlf lets the path go through core.autocrlf process while !crlf disables it altogether'. This fixes the semantics to: - Lack of 'crlf' attribute makes core.autocrlf to apply (i.e. we guess based on the contents and if platform expresses its desire to have CRLF line endings via core.autocrlf, we do so). - Setting 'crlf' attribute to true forces CRLF line endings in working tree files, even if blob does not look like text (e.g. contains NUL or other bytes we consider binary). - Setting 'crlf' attribute to false disables conversion. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-15 22:35:45 +02:00			`}`
convert: git cherry-pick -Xrenormalize did not work Working with a repo that used to be all CRLF. At some point it was changed to all LF, with `text=auto` in .gitattributes. Trying to cherry-pick a commit from before the switchover fails: $ git cherry-pick -Xrenormalize <commit> fatal: CRLF would be replaced by LF in [path] Commit 65237284 "unify the "auto" handling of CRLF" introduced a regression: Whenever crlf_action is CRLF_TEXT_XXX and not CRLF_AUTO_XXX, SAFE_CRLF_RENORMALIZE was feed into check_safe_crlf(). This is wrong because here everything else than SAFE_CRLF_WARN is treated as SAFE_CRLF_FAIL. Call check_safe_crlf() only if checksafe is SAFE_CRLF_WARN or SAFE_CRLF_FAIL. Reported-by: Eevee (Lexy Munroe) <eevee@veekun.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-11-30 18:02:32 +01:00			`if ((checksafe == SAFE_CRLF_WARN \|\|`
			`(checksafe == SAFE_CRLF_FAIL)) && len) {`
convert: Correct NNO tests and missing `LF will be replaced by CRLF` When a non-reversible CRLF conversion is done in "git add", a warning is printed on stderr (or Git dies, depending on checksafe) The function commit_chk_wrnNNO() in t0027 was written to test this, but did the wrong thing: Instead of looking at the warning from "git add", it looked at the warning from "git commit". This is racy because "git commit" may not have to do CRLF conversion at all if it can use the sha1 value from the index (which depends on whether "add" and "commit" run in a single second). Correct t0027 and replace the commit for each and every file with a commit of all files in one go. The function commit_chk_wrnNNO() should be renamed in a separate commit. Now that t0027 does the right thing, it detects a bug in covert.c: This sequence should generate the warning `LF will be replaced by CRLF`, but does not: $ git init $ git config core.autocrlf false $ printf "Line\r\n" >file $ git add file $ git commit -m "commit with CRLF" $ git config core.autocrlf true $ printf "Line\n" >file $ git add file "git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf(). When has_cr_in_index(path) is true, crlf_to_git() returns too early and check_safe_crlf() is not called at all. Factor out the code which determines if "git checkout" converts LF->CRLF into will_convert_lf_to_crlf(). Update the logic around check_safe_crlf() and "simulate" the possible LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf(). Thanks to Jeff King <peff@peff.net> for analyzing t0027. Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-08-13 23:29:27 +02:00			`struct text_stat new_stats;`
			`memcpy(&new_stats, &stats, sizeof(new_stats));`
			`/* simulate "git add" */`
			`if (convert_crlf_into_lf) {`
			`new_stats.lonelf += new_stats.crlf;`
			`new_stats.crlf = 0;`
			`}`
			`/* simulate "git checkout" */`
			`if (will_convert_lf_to_crlf(len, &new_stats, crlf_action)) {`
			`new_stats.crlf += new_stats.lonelf;`
			`new_stats.lonelf = 0;`
			`}`
			`check_safe_crlf(path, crlf_action, &stats, &new_stats, checksafe);`
			`}`
			`if (!convert_crlf_into_lf)`
safecrlf: Add mechanism to warn about irreversible crlf conversions CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data. If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately. Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data. This patch adds a mechanism that can either warn the user about an irreversible conversion or can even refuse to convert. The mechanism is controlled by the variable core.safecrlf, with the following values: - false: disable safecrlf mechanism - warn: warn about irreversible conversions - true: refuse irreversible conversions The default is to warn. Users are only affected by this default if core.autocrlf is set. But the current default of git is to leave core.autocrlf unset, so users will not see warnings unless they deliberately chose to activate the autocrlf mechanism. The safecrlf mechanism's details depend on the git command. The general principles when safecrlf is active (not false) are: - we warn/error out if files in the work tree can modified in an irreversible way without giving the user a chance to backup the original file. - for read-only operations that do not modify files in the work tree we do not not print annoying warnings. There are exceptions. Even though... - "git add" itself does not touch the files in the work tree, the next checkout would, so the safety triggers; - "git apply" to update a text file with a patch does touch the files in the work tree, but the operation is about text files and CRLF conversion is about fixing the line ending inconsistencies, so the safety does not trigger; - "git diff" itself does not touch the files in the work tree, it is often run to inspect the changes you intend to next "git add". To catch potential problems early, safety triggers. The concept of a safety check was originally proposed in a similar way by Linus Torvalds. Thanks to Dimitry Potapov for insisting on getting the naked LF/autocrlf=true case right. Signed-off-by: Steffen Prohaska <prohaska@zib.de> 2008-02-06 12:25:58 +01:00			`return 0;`

teach convert_to_git a "dry run" mode Some callers may want to know whether convert_to_git will actually do anything before performing the conversion itself (e.g., to decide whether to stream or handle blobs in-core). This patch lets callers specify the dry run mode by passing a NULL destination buffer. The return value, instead of indicating whether conversion happened, will indicate whether conversion would occur. For readability, we also include a wrapper function which makes it more obvious we are not actually performing the conversion. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-24 23:02:37 +01:00			`/*`
			`* At this point all of our source analysis is done, and we are sure we`
			`* would convert. If we are in dry-run mode, we can give an answer.`
			`*/`
			`if (!buf)`
			`return 1;`

Fix in-place editing functions in convert.c * crlf_to_git and ident_to_git: Don't grow the buffer if there is enough space in the first place. As a side effect, when the editing is done "in place", we don't grow, so the buffer pointer doesn't changes, and `src' isn't invalidated anymore. Thanks to Bernt Hansen for the bug report. * apply_filter: Fix memory leak due to fake in-place editing that didn't collected the old buffer when the filter succeeds. Also a cosmetic fix. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Lars Hjemli <hjemli@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-05 10:11:59 +02:00			`/* only grow if not in place */`
			`if (strbuf_avail(buf) + buf->len < len)`
			`strbuf_grow(buf, len - buf->len);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`dst = buf->buf;`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`if (crlf_action == CRLF_AUTO \|\| crlf_action == CRLF_AUTO_INPUT \|\| crlf_action == CRLF_AUTO_CRLF) {`
Update 'crlf' attribute semantics. This updates the semantics of 'crlf' so that .gitattributes file can say "this is text, even though it may look funny". Setting the `crlf` attribute on a path is meant to mark the path as a "text" file. 'core.autocrlf' conversion takes place without guessing the content type by inspection. Unsetting the `crlf` attribute on a path is meant to mark the path as a "binary" file. The path never goes through line endings conversion upon checkin/checkout. Unspecified `crlf` attribute tells git to apply the `core.autocrlf` conversion when the file content looks like text. Setting the `crlf` attribut to string value "input" is similar to setting the attribute to `true`, but also forces git to act as if `core.autocrlf` is set to `input` for the path. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-20 07:37:19 +02:00			`/*`
			`* If we guessed, we already know we rejected a file with`
			`* lone CR, and we can strip a CR without looking at what`
			`* follow it.`
			`*/`
Fix 'crlf' attribute semantics. Earlier we said 'crlf lets the path go through core.autocrlf process while !crlf disables it altogether'. This fixes the semantics to: - Lack of 'crlf' attribute makes core.autocrlf to apply (i.e. we guess based on the contents and if platform expresses its desire to have CRLF line endings via core.autocrlf, we do so). - Setting 'crlf' attribute to true forces CRLF line endings in working tree files, even if blob does not look like text (e.g. contains NUL or other bytes we consider binary). - Setting 'crlf' attribute to false disables conversion. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-15 22:35:45 +02:00			`do {`
Simplify calling of CR/LF conversion routines Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-19 02:05:03 +02:00			`unsigned char c = *src++;`
Fix 'crlf' attribute semantics. Earlier we said 'crlf lets the path go through core.autocrlf process while !crlf disables it altogether'. This fixes the semantics to: - Lack of 'crlf' attribute makes core.autocrlf to apply (i.e. we guess based on the contents and if platform expresses its desire to have CRLF line endings via core.autocrlf, we do so). - Setting 'crlf' attribute to true forces CRLF line endings in working tree files, even if blob does not look like text (e.g. contains NUL or other bytes we consider binary). - Setting 'crlf' attribute to false disables conversion. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-15 22:35:45 +02:00			`if (c != '\r')`
Simplify calling of CR/LF conversion routines Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-19 02:05:03 +02:00			`*dst++ = c;`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`} while (--len);`
Fix 'crlf' attribute semantics. Earlier we said 'crlf lets the path go through core.autocrlf process while !crlf disables it altogether'. This fixes the semantics to: - Lack of 'crlf' attribute makes core.autocrlf to apply (i.e. we guess based on the contents and if platform expresses its desire to have CRLF line endings via core.autocrlf, we do so). - Setting 'crlf' attribute to true forces CRLF line endings in working tree files, even if blob does not look like text (e.g. contains NUL or other bytes we consider binary). - Setting 'crlf' attribute to false disables conversion. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-15 22:35:45 +02:00			`} else {`
			`do {`
Simplify calling of CR/LF conversion routines Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-19 02:05:03 +02:00			`unsigned char c = *src++;`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`if (! (c == '\r' && (1 < len && *src == '\n')))`
Simplify calling of CR/LF conversion routines Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-19 02:05:03 +02:00			`*dst++ = c;`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`} while (--len);`
Fix 'crlf' attribute semantics. Earlier we said 'crlf lets the path go through core.autocrlf process while !crlf disables it altogether'. This fixes the semantics to: - Lack of 'crlf' attribute makes core.autocrlf to apply (i.e. we guess based on the contents and if platform expresses its desire to have CRLF line endings via core.autocrlf, we do so). - Setting 'crlf' attribute to true forces CRLF line endings in working tree files, even if blob does not look like text (e.g. contains NUL or other bytes we consider binary). - Setting 'crlf' attribute to false disables conversion. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-15 22:35:45 +02:00			`}`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`strbuf_setlen(buf, dst - buf->buf);`
			`return 1;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`}`

Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`static int crlf_to_worktree(const char path, const char src, size_t len,`
convert: give saner names to crlf/eol variables, types and functions Back when the conversion was only about the end-of-line convention, it might have made sense to call what we do upon seeing CR/LF simply an "action", but these days the conversion routines do a lot more than just tweaking the line ending. Raname "action" to "crlf_action". The function that decides what end of line conversion to use on the output codepath was called "determine_output_conversion", as if there is no other kind of output conversion. Rename it to "output_eol"; it is a function that returns what EOL convention is to be used. A function that decides what "crlf_action" needs to be used on the input codepath, given what conversion attribute is set to the path and global end-of-line convention, was called "determine_action". Rename it to "input_crlf_action". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:12:57 +02:00			`struct strbuf *buf, enum crlf_action crlf_action)`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`{`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`char *to_free = NULL;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`struct text_stat stats;`

convert: give saner names to crlf/eol variables, types and functions Back when the conversion was only about the end-of-line convention, it might have made sense to call what we do upon seeing CR/LF simply an "action", but these days the conversion routines do a lot more than just tweaking the line ending. Raname "action" to "crlf_action". The function that decides what end of line conversion to use on the output codepath was called "determine_output_conversion", as if there is no other kind of output conversion. Rename it to "output_eol"; it is a function that returns what EOL convention is to be used. A function that decides what "crlf_action" needs to be used on the input codepath, given what conversion attribute is set to the path and global end-of-line convention, was called "determine_action". Rename it to "input_crlf_action". Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:12:57 +02:00			`if (!len \|\| output_eol(crlf_action) != EOL_CRLF)`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return 0;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`gather_stats(src, len, &stats);`
convert: Correct NNO tests and missing `LF will be replaced by CRLF` When a non-reversible CRLF conversion is done in "git add", a warning is printed on stderr (or Git dies, depending on checksafe) The function commit_chk_wrnNNO() in t0027 was written to test this, but did the wrong thing: Instead of looking at the warning from "git add", it looked at the warning from "git commit". This is racy because "git commit" may not have to do CRLF conversion at all if it can use the sha1 value from the index (which depends on whether "add" and "commit" run in a single second). Correct t0027 and replace the commit for each and every file with a commit of all files in one go. The function commit_chk_wrnNNO() should be renamed in a separate commit. Now that t0027 does the right thing, it detects a bug in covert.c: This sequence should generate the warning `LF will be replaced by CRLF`, but does not: $ git init $ git config core.autocrlf false $ printf "Line\r\n" >file $ git add file $ git commit -m "commit with CRLF" $ git config core.autocrlf true $ printf "Line\n" >file $ git add file "git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf(). When has_cr_in_index(path) is true, crlf_to_git() returns too early and check_safe_crlf() is not called at all. Factor out the code which determines if "git checkout" converts LF->CRLF into will_convert_lf_to_crlf(). Update the logic around check_safe_crlf() and "simulate" the possible LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf(). Thanks to Jeff King <peff@peff.net> for analyzing t0027. Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-08-13 23:29:27 +02:00			`if (!will_convert_lf_to_crlf(len, &stats, crlf_action))`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return 0;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`/* are we "faking" in place editing ? */`
			`if (src == buf->buf)`
strbuf change: be sure ->buf is never ever NULL. For that purpose, the ->buf is always initialized with a char * buf living in the strbuf module. It is made a char * so that we can sloppily accept things that perform: sb->buf[0] = '\0', and because you can't pass "" as an initializer for ->buf without making gcc unhappy for very good reasons. strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf anymore. as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying ->buf isn't an option anymore, if ->buf is going to escape from the scope, and eventually be free'd. API changes: * strbuf_setlen now always works, so just make strbuf_reset a convenience macro. * strbuf_detatch takes a size_t* optional argument (meaning it can be NULL) to copy the buffer's len, as it was needed for this refactor to make the code more readable, and working like the callers. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-27 12:58:23 +02:00			`to_free = strbuf_detach(buf, NULL);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00
convert.c: simplify text_stat Simplify the statistics: lonecr counts the CR which is not followed by a LF, lonelf counts the LF which is not preceded by a CR, crlf counts CRLF combinations. This simplifies the evaluation of the statistics. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:43 +01:00			`strbuf_grow(buf, len + stats.lonelf);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`for (;;) {`
			`const char *nl = memchr(src, '\n', len);`
			`if (!nl)`
			`break;`
			`if (nl > src && nl[-1] == '\r') {`
			`strbuf_add(buf, src, nl + 1 - src);`
			`} else {`
			`strbuf_add(buf, src, nl - src);`
			`strbuf_addstr(buf, "\r\n");`
			`}`
			`len -= nl + 1 - src;`
			`src = nl + 1;`
			`}`
			`strbuf_add(buf, src, len);`

			`free(to_free);`
			`return 1;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`}`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`struct filter_params {`
			`const char *src;`
			`unsigned long size;`
convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00			`int fd;`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`const char *cmd;`
convert filter: supply path to external driver Filtering to support keyword expansion may need the name of the file being filtered. In particular, to support p4 keywords like $File: //depot/product/dir/script.sh $ the smudge filter needs to know the name of the file it is smudging. Allow "%f" in the custom filter command line specified in the configuration. This will be substituted by the filename inside a single-quote pair to be passed to the shell. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-22 15:40:13 +01:00			`const char *path;`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`};`

convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00			`static int filter_buffer_or_fd(int in, int out, void *data)`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`{`
			`/*`
			`* Spawn cmd and feed the buffer contents through its stdin.`
			`*/`
run-command: introduce CHILD_PROCESS_INIT Most struct child_process variables are cleared using memset first after declaration. Provide a macro, CHILD_PROCESS_INIT, that can be used to initialize them statically instead. That's shorter, doesn't require a function call and is slightly more readable (especially given that we already have STRBUF_INIT, ARGV_ARRAY_INIT etc.). Helped-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-19 21:09:35 +02:00			`struct child_process child_process = CHILD_PROCESS_INIT;`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`struct filter_params params = (struct filter_params )data;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`int write_err, status;`
Rewrite dynamic structure initializations to runtime assignment Unfortunately, there are still plenty of production systems with vendor compilers that choke unless all compound declarations can be determined statically at compile time, for example hpux10.20 (I can provide a comprehensive list of our supported platforms that exhibit this problem if necessary). This patch simply breaks apart any compound declarations with dynamic initialisation expressions, and moves the initialisation until after the last declaration in the same block, in all the places necessary to have the offending compilers accept the code. Signed-off-by: Gary V. Vaughan <gary@thewrittenword.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-05-14 11:31:33 +02:00			`const char *argv[] = { NULL, NULL };`

convert filter: supply path to external driver Filtering to support keyword expansion may need the name of the file being filtered. In particular, to support p4 keywords like $File: //depot/product/dir/script.sh $ the smudge filter needs to know the name of the file it is smudging. Allow "%f" in the custom filter command line specified in the configuration. This will be substituted by the filename inside a single-quote pair to be passed to the shell. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-22 15:40:13 +01:00			`/* apply % substitution to cmd */`
			`struct strbuf cmd = STRBUF_INIT;`
			`struct strbuf path = STRBUF_INIT;`
			`struct strbuf_expand_dict_entry dict[] = {`
			`{ "f", NULL, },`
			`{ NULL, NULL, },`
			`};`

			`/* quote the path to preserve spaces, etc. */`
			`sq_quote_buf(&path, params->path);`
			`dict[0].value = path.buf;`

			`/* expand all %f with the quoted path */`
			`strbuf_expand(&cmd, params->cmd, strbuf_expand_dict_cb, &dict);`
			`strbuf_release(&path);`

			`argv[0] = cmd.buf;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
Use start_command() to run content filters instead of explicit fork/exec. The previous code already used finish_command() to wait for the process to terminate, but did not use start_command() to run it. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:47:55 +02:00			`child_process.argv = argv;`
run-command: convert simple callsites to use_shell Now that we have the use_shell feature, these callsites can all be converted with small changes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2009-12-30 11:53:57 +01:00			`child_process.use_shell = 1;`
Use start_command() to run content filters instead of explicit fork/exec. The previous code already used finish_command() to wait for the process to terminate, but did not use start_command() to run it. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:47:55 +02:00			`child_process.in = -1;`
run-command: support custom fd-set in async This patch adds the possibility to supply a set of non-0 file descriptors for async process communication instead of the default-created pipe. Additionally, we now support bi-directional communiction with the async procedure, by giving the async function both read and write file descriptors. To retain compatiblity and similar "API feel" with start_command, we require start_async callers to set .out = -1 to get a readable file descriptor. If either of .in or .out is 0, we supply no file descriptor to the async process. [sp: Note: Erik started this patch, and a huge bulk of it is his work. All bugs were introduced later by Shawn.] Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-02-05 21:57:38 +01:00			`child_process.out = out;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
Use start_command() to run content filters instead of explicit fork/exec. The previous code already used finish_command() to wait for the process to terminate, but did not use start_command() to run it. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:47:55 +02:00			`if (start_command(&child_process))`
convert: quote filter names in error messages Git filter driver commands with spaces (e.g. `filter.sh foo`) are hard to read in error messages. Quote them to improve the readability. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:25 +02:00			`return error("cannot fork to run external filter '%s'", params->cmd);`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
Ignore SIGPIPE when running a filter driver If a filter is not defined or if it fails, git should behave as if the filter is a no-op passthru. However, if the filter exits before reading all the content, depending on the timing, git could be killed with SIGPIPE when it tries to write to the pipe connected to the filter. Ignore SIGPIPE while processing the filter to give us a chance to check the return value from a failed write, in order to detect and act on this mode of failure in a more controlled way. Signed-off-by: Jehan Bing <jehan@orb.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-20 21:53:37 +01:00			`sigchain_push(SIGPIPE, SIG_IGN);`

convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00			`if (params->src) {`
filter_buffer_or_fd(): ignore EPIPE We are explicitly ignoring SIGPIPE, as we fully expect that the filter program may not read our output fully. Ignore EPIPE that may come from writing to it as well. A new test was stolen from Jeff's suggestion. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-05-19 20:08:23 +02:00			`write_err = (write_in_full(child_process.in,`
			`params->src, params->size) < 0);`
			`if (errno == EPIPE)`
			`write_err = 0;`
convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00			`} else {`
			`write_err = copy_fd(params->fd, child_process.in);`
filter_buffer_or_fd(): ignore EPIPE We are explicitly ignoring SIGPIPE, as we fully expect that the filter program may not read our output fully. Ignore EPIPE that may come from writing to it as well. A new test was stolen from Jeff's suggestion. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-05-19 20:08:23 +02:00			`if (write_err == COPY_WRITE_ERROR && errno == EPIPE)`
			`write_err = 0;`
convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00			`}`

Use start_command() to run content filters instead of explicit fork/exec. The previous code already used finish_command() to wait for the process to terminate, but did not use start_command() to run it. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:47:55 +02:00			`if (close(child_process.in))`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`write_err = 1;`
			`if (write_err)`
convert: quote filter names in error messages Git filter driver commands with spaces (e.g. `filter.sh foo`) are hard to read in error messages. Quote them to improve the readability. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:25 +02:00			`error("cannot feed the input to external filter '%s'", params->cmd);`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
Ignore SIGPIPE when running a filter driver If a filter is not defined or if it fails, git should behave as if the filter is a no-op passthru. However, if the filter exits before reading all the content, depending on the timing, git could be killed with SIGPIPE when it tries to write to the pipe connected to the filter. Ignore SIGPIPE while processing the filter to give us a chance to check the return value from a failed write, in order to detect and act on this mode of failure in a more controlled way. Signed-off-by: Jehan Bing <jehan@orb.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-20 21:53:37 +01:00			`sigchain_pop(SIGPIPE);`

Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`status = finish_command(&child_process);`
			`if (status)`
convert: quote filter names in error messages Git filter driver commands with spaces (e.g. `filter.sh foo`) are hard to read in error messages. Quote them to improve the readability. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:25 +02:00			`error("external filter '%s' failed %d", params->cmd, status);`
convert filter: supply path to external driver Filtering to support keyword expansion may need the name of the file being filtered. In particular, to support p4 keywords like $File: //depot/product/dir/script.sh $ the smudge filter needs to know the name of the file it is smudging. Allow "%f" in the custom filter command line specified in the configuration. This will be substituted by the filename inside a single-quote pair to be passed to the shell. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-22 15:40:13 +01:00
			`strbuf_release(&cmd);`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`return (write_err \|\| status);`
			`}`

convert: prepare filter.<driver>.process option Refactor the existing 'single shot filter mechanism' and prepare the new 'long running filter mechanism'. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:36 +02:00			`static int apply_single_file_filter(const char path, const char src, size_t len, int fd,`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`struct strbuf dst, const char cmd)`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`{`
			`/*`
			`* Create a pipeline to have the command filter the buffer's`
			`* contents.`
			`*`
			`* (child --> cmd) --> us`
			`*/`
convert: make apply_filter() adhere to standard Git error handling apply_filter() returns a boolean that tells the caller if it "did convert or did not convert". The variable `ret` was used throughout the function to track errors whereas `1` denoted success and `0` failure. This is unusual for the Git source where `0` denotes success. Rename the variable and flip its value to make the function easier readable for Git developers. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:35 +02:00			`int err = 0;`
Replace calls to strbuf_init(&foo, 0) with STRBUF_INIT initializer Many call sites use strbuf_init(&foo, 0) to initialize local strbuf variable "foo" which has not been accessed since its declaration. These can be replaced with a static initialization using the STRBUF_INIT macro which is just as readable, saves a function call, and takes up fewer lines. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-09 21:12:12 +02:00			`struct strbuf nbuf = STRBUF_INIT;`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`struct async async;`
			`struct filter_params params;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`memset(&async, 0, sizeof(async));`
convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00			`async.proc = filter_buffer_or_fd;`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`async.data = &params;`
run-command: support custom fd-set in async This patch adds the possibility to supply a set of non-0 file descriptors for async process communication instead of the default-created pipe. Additionally, we now support bi-directional communiction with the async procedure, by giving the async function both read and write file descriptors. To retain compatiblity and similar "API feel" with start_command, we require start_async callers to set .out = -1 to get a readable file descriptor. If either of .in or .out is 0, we supply no file descriptor to the async process. [sp: Note: Erik started this patch, and a huge bulk of it is his work. All bugs were introduced later by Shawn.] Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-02-05 21:57:38 +01:00			`async.out = -1;`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`params.src = src;`
			`params.size = len;`
convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00			`params.fd = fd;`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`params.cmd = cmd;`
convert filter: supply path to external driver Filtering to support keyword expansion may need the name of the file being filtered. In particular, to support p4 keywords like $File: //depot/product/dir/script.sh $ the smudge filter needs to know the name of the file it is smudging. Allow "%f" in the custom filter command line specified in the configuration. This will be substituted by the filename inside a single-quote pair to be passed to the shell. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-22 15:40:13 +01:00			`params.path = path;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
			`fflush(NULL);`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`if (start_async(&async))`
			`return 0; /* error was already reported */`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`if (strbuf_read(&nbuf, async.out, len) < 0) {`
convert: make apply_filter() adhere to standard Git error handling apply_filter() returns a boolean that tells the caller if it "did convert or did not convert". The variable `ret` was used throughout the function to track errors whereas `1` denoted success and `0` failure. This is unusual for the Git source where `0` denotes success. Rename the variable and flip its value to make the function easier readable for Git developers. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:35 +02:00			`err = error("read from external filter '%s' failed", cmd);`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`}`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`if (close(async.out)) {`
convert: make apply_filter() adhere to standard Git error handling apply_filter() returns a boolean that tells the caller if it "did convert or did not convert". The variable `ret` was used throughout the function to track errors whereas `1` denoted success and `0` failure. This is unusual for the Git source where `0` denotes success. Rename the variable and flip its value to make the function easier readable for Git developers. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:35 +02:00			`err = error("read from external filter '%s' failed", cmd);`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`}`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`if (finish_async(&async)) {`
convert: make apply_filter() adhere to standard Git error handling apply_filter() returns a boolean that tells the caller if it "did convert or did not convert". The variable `ret` was used throughout the function to track errors whereas `1` denoted success and `0` failure. This is unusual for the Git source where `0` denotes success. Rename the variable and flip its value to make the function easier readable for Git developers. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:35 +02:00			`err = error("external filter '%s' failed", cmd);`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`}`

convert: make apply_filter() adhere to standard Git error handling apply_filter() returns a boolean that tells the caller if it "did convert or did not convert". The variable `ret` was used throughout the function to track errors whereas `1` denoted success and `0` failure. This is unusual for the Git source where `0` denotes success. Rename the variable and flip its value to make the function easier readable for Git developers. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:35 +02:00			`if (!err) {`
Fix in-place editing functions in convert.c * crlf_to_git and ident_to_git: Don't grow the buffer if there is enough space in the first place. As a side effect, when the editing is done "in place", we don't grow, so the buffer pointer doesn't changes, and `src' isn't invalidated anymore. Thanks to Bernt Hansen for the bug report. * apply_filter: Fix memory leak due to fake in-place editing that didn't collected the old buffer when the filter succeeds. Also a cosmetic fix. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Lars Hjemli <hjemli@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-05 10:11:59 +02:00			`strbuf_swap(dst, &nbuf);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`}`
Fix in-place editing functions in convert.c * crlf_to_git and ident_to_git: Don't grow the buffer if there is enough space in the first place. As a side effect, when the editing is done "in place", we don't grow, so the buffer pointer doesn't changes, and `src' isn't invalidated anymore. Thanks to Bernt Hansen for the bug report. * apply_filter: Fix memory leak due to fake in-place editing that didn't collected the old buffer when the filter succeeds. Also a cosmetic fix. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Lars Hjemli <hjemli@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-05 10:11:59 +02:00			`strbuf_release(&nbuf);`
convert: make apply_filter() adhere to standard Git error handling apply_filter() returns a boolean that tells the caller if it "did convert or did not convert". The variable `ret` was used throughout the function to track errors whereas `1` denoted success and `0` failure. This is unusual for the Git source where `0` denotes success. Rename the variable and flip its value to make the function easier readable for Git developers. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:35 +02:00			`return !err;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`}`

convert: prepare filter.<driver>.process option Refactor the existing 'single shot filter mechanism' and prepare the new 'long running filter mechanism'. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:36 +02:00			`#define CAP_CLEAN (1u<<0)`
			`#define CAP_SMUDGE (1u<<1)`

convert: add filter.<driver>.process option Git's clean/smudge mechanism invokes an external filter process for every single blob that is affected by a filter. If Git filters a lot of blobs then the startup time of the external filter processes can become a significant part of the overall Git execution time. In a preliminary performance test this developer used a clean/smudge filter written in golang to filter 12,000 files. This process took 364s with the existing filter mechanism and 5s with the new mechanism. See details here: https://github.com/github/git-lfs/pull/1382 This patch adds the `filter.<driver>.process` string option which, if used, keeps the external filter process running and processes all blobs with the packet format (pkt-line) based protocol over standard input and standard output. The full protocol is explained in detail in `Documentation/gitattributes.txt`. A few key decisions: * The long running filter process is referred to as filter protocol version 2 because the existing single shot filter invocation is considered version 1. * Git sends a welcome message and expects a response right after the external filter process has started. This ensures that Git will not hang if a version 1 filter is incorrectly used with the filter.<driver>.process option for version 2 filters. In addition, Git can detect this kind of error and warn the user. * The status of a filter operation (e.g. "success" or "error) is set before the actual response and (if necessary!) re-set after the response. The advantage of this two step status response is that if the filter detects an error early, then the filter can communicate this and Git does not even need to create structures to read the response. * All status responses are pkt-line lists terminated with a flush packet. This allows us to send other status fields with the same protocol in the future. Helped-by: Martin-Louis Bright <mlbright@gmail.com> Reviewed-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:37 +02:00			`struct cmd2process {`
			`struct hashmap_entry ent; /* must be the first member! */`
			`unsigned int supported_capabilities;`
			`const char *cmd;`
			`struct child_process process;`
			`};`

			`static int cmd_process_map_initialized;`
			`static struct hashmap cmd_process_map;`

			`static int cmd2process_cmp(const struct cmd2process *e1,`
			`const struct cmd2process *e2,`
			`const void *unused)`
			`{`
			`return strcmp(e1->cmd, e2->cmd);`
			`}`

			`static struct cmd2process find_multi_file_filter_entry(struct hashmap hashmap, const char *cmd)`
			`{`
			`struct cmd2process key;`
			`hashmap_entry_init(&key, strhash(cmd));`
			`key.cmd = cmd;`
			`return hashmap_get(hashmap, &key, NULL);`
			`}`

			`static int packet_write_list(int fd, const char *line, ...)`
			`{`
			`va_list args;`
			`int err;`
			`va_start(args, line);`
			`for (;;) {`
			`if (!line)`
			`break;`
			`if (strlen(line) > LARGE_PACKET_DATA_MAX)`
			`return -1;`
			`err = packet_write_fmt_gently(fd, "%s\n", line);`
			`if (err)`
			`return err;`
			`line = va_arg(args, const char*);`
			`}`
			`va_end(args);`
			`return packet_flush_gently(fd);`
			`}`

			`static void read_multi_file_filter_status(int fd, struct strbuf *status)`
			`{`
			`struct strbuf **pair;`
			`char *line;`
			`for (;;) {`
			`line = packet_read_line(fd, NULL);`
			`if (!line)`
			`break;`
			`pair = strbuf_split_str(line, '=', 2);`
			`if (pair[0] && pair[0]->len && pair[1]) {`
			`/* the last "status=<foo>" line wins */`
			`if (!strcmp(pair[0]->buf, "status=")) {`
			`strbuf_reset(status);`
			`strbuf_addbuf(status, pair[1]);`
			`}`
			`}`
			`strbuf_list_free(pair);`
			`}`
			`}`

			`static void kill_multi_file_filter(struct hashmap hashmap, struct cmd2process entry)`
			`{`
			`if (!entry)`
			`return;`

			`entry->process.clean_on_exit = 0;`
			`kill(entry->process.pid, SIGTERM);`
			`finish_command(&entry->process);`

			`hashmap_remove(hashmap, entry, NULL);`
			`free(entry);`
			`}`

			`static void stop_multi_file_filter(struct child_process *process)`
			`{`
			`sigchain_push(SIGPIPE, SIG_IGN);`
			`/* Closing the pipe signals the filter to initiate a shutdown. */`
			`close(process->in);`
			`close(process->out);`
			`sigchain_pop(SIGPIPE);`
			`/* Finish command will wait until the shutdown is complete. */`
			`finish_command(process);`
			`}`

			`static struct cmd2process start_multi_file_filter(struct hashmap hashmap, const char *cmd)`
			`{`
			`int err;`
			`struct cmd2process *entry;`
			`struct child_process *process;`
			`const char *argv[] = { cmd, NULL };`
			`struct string_list cap_list = STRING_LIST_INIT_NODUP;`
			`char *cap_buf;`
			`const char *cap_name;`

			`entry = xmalloc(sizeof(*entry));`
			`entry->cmd = cmd;`
			`entry->supported_capabilities = 0;`
			`process = &entry->process;`

			`child_process_init(process);`
			`process->argv = argv;`
			`process->use_shell = 1;`
			`process->in = -1;`
			`process->out = -1;`
			`process->clean_on_exit = 1;`
			`process->clean_on_exit_handler = stop_multi_file_filter;`

			`if (start_command(process)) {`
			`error("cannot fork to run external filter '%s'", cmd);`
			`return NULL;`
			`}`

			`hashmap_entry_init(entry, strhash(cmd));`

			`sigchain_push(SIGPIPE, SIG_IGN);`

			`err = packet_write_list(process->in, "git-filter-client", "version=2", NULL);`
			`if (err)`
			`goto done;`

			`err = strcmp(packet_read_line(process->out, NULL), "git-filter-server");`
			`if (err) {`
			`error("external filter '%s' does not support filter protocol version 2", cmd);`
			`goto done;`
			`}`
			`err = strcmp(packet_read_line(process->out, NULL), "version=2");`
			`if (err)`
			`goto done;`
			`err = packet_read_line(process->out, NULL) != NULL;`
			`if (err)`
			`goto done;`

			`err = packet_write_list(process->in, "capability=clean", "capability=smudge", NULL);`

			`for (;;) {`
			`cap_buf = packet_read_line(process->out, NULL);`
			`if (!cap_buf)`
			`break;`
			`string_list_split_in_place(&cap_list, cap_buf, '=', 1);`

			`if (cap_list.nr != 2 \|\| strcmp(cap_list.items[0].string, "capability"))`
			`continue;`

			`cap_name = cap_list.items[1].string;`
			`if (!strcmp(cap_name, "clean")) {`
			`entry->supported_capabilities \|= CAP_CLEAN;`
			`} else if (!strcmp(cap_name, "smudge")) {`
			`entry->supported_capabilities \|= CAP_SMUDGE;`
			`} else {`
			`warning(`
			`"external filter '%s' requested unsupported filter capability '%s'",`
			`cmd, cap_name`
			`);`
			`}`

			`string_list_clear(&cap_list, 0);`
			`}`

			`done:`
			`sigchain_pop(SIGPIPE);`

			`if (err \|\| errno == EPIPE) {`
			`error("initialization for external filter '%s' failed", cmd);`
			`kill_multi_file_filter(hashmap, entry);`
			`return NULL;`
			`}`

			`hashmap_add(hashmap, entry);`
			`return entry;`
			`}`

			`static int apply_multi_file_filter(const char path, const char src, size_t len,`
			`int fd, struct strbuf dst, const char cmd,`
			`const unsigned int wanted_capability)`
			`{`
			`int err;`
			`struct cmd2process *entry;`
			`struct child_process *process;`
			`struct strbuf nbuf = STRBUF_INIT;`
			`struct strbuf filter_status = STRBUF_INIT;`
			`const char *filter_type;`

			`if (!cmd_process_map_initialized) {`
			`cmd_process_map_initialized = 1;`
			`hashmap_init(&cmd_process_map, (hashmap_cmp_fn) cmd2process_cmp, 0);`
			`entry = NULL;`
			`} else {`
			`entry = find_multi_file_filter_entry(&cmd_process_map, cmd);`
			`}`

			`fflush(NULL);`

			`if (!entry) {`
			`entry = start_multi_file_filter(&cmd_process_map, cmd);`
			`if (!entry)`
			`return 0;`
			`}`
			`process = &entry->process;`

			`if (!(wanted_capability & entry->supported_capabilities))`
			`return 0;`

			`if (CAP_CLEAN & wanted_capability)`
			`filter_type = "clean";`
			`else if (CAP_SMUDGE & wanted_capability)`
			`filter_type = "smudge";`
			`else`
			`die("unexpected filter type");`

			`sigchain_push(SIGPIPE, SIG_IGN);`

			`assert(strlen(filter_type) < LARGE_PACKET_DATA_MAX - strlen("command=\n"));`
			`err = packet_write_fmt_gently(process->in, "command=%s\n", filter_type);`
			`if (err)`
			`goto done;`

			`err = strlen(path) > LARGE_PACKET_DATA_MAX - strlen("pathname=\n");`
			`if (err) {`
			`error("path name too long for external filter");`
			`goto done;`
			`}`

			`err = packet_write_fmt_gently(process->in, "pathname=%s\n", path);`
			`if (err)`
			`goto done;`

			`err = packet_flush_gently(process->in);`
			`if (err)`
			`goto done;`

			`if (fd >= 0)`
			`err = write_packetized_from_fd(fd, process->in);`
			`else`
			`err = write_packetized_from_buf(src, len, process->in);`
			`if (err)`
			`goto done;`

			`read_multi_file_filter_status(process->out, &filter_status);`
			`err = strcmp(filter_status.buf, "success");`
			`if (err)`
			`goto done;`

			`err = read_packetized_to_strbuf(process->out, &nbuf) < 0;`
			`if (err)`
			`goto done;`

			`read_multi_file_filter_status(process->out, &filter_status);`
			`err = strcmp(filter_status.buf, "success");`

			`done:`
			`sigchain_pop(SIGPIPE);`

			`if (err \|\| errno == EPIPE) {`
			`if (!strcmp(filter_status.buf, "error")) {`
			`/* The filter signaled a problem with the file. */`
			`} else if (!strcmp(filter_status.buf, "abort")) {`
			`/*`
			`* The filter signaled a permanent problem. Don't try to filter`
			`* files with the same command for the lifetime of the current`
			`* Git process.`
			`*/`
			`entry->supported_capabilities &= ~wanted_capability;`
			`} else {`
			`/*`
			`* Something went wrong with the protocol filter.`
			`* Force shutdown and restart if another blob requires filtering.`
			`*/`
			`error("external filter '%s' failed", cmd);`
			`kill_multi_file_filter(&cmd_process_map, entry);`
			`}`
			`} else {`
			`strbuf_swap(dst, &nbuf);`
			`}`
			`strbuf_release(&nbuf);`
			`return !err;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`}`

			`static struct convert_driver {`
			`const char *name;`
			`struct convert_driver *next;`
convert.c: Use 'git_config_string' to get 'smudge' and 'clean' Signed-off-by: Brian Hetro <whee@smaertness.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-05 07:24:42 +02:00			`const char *smudge;`
			`const char *clean;`
convert: add filter.<driver>.process option Git's clean/smudge mechanism invokes an external filter process for every single blob that is affected by a filter. If Git filters a lot of blobs then the startup time of the external filter processes can become a significant part of the overall Git execution time. In a preliminary performance test this developer used a clean/smudge filter written in golang to filter 12,000 files. This process took 364s with the existing filter mechanism and 5s with the new mechanism. See details here: https://github.com/github/git-lfs/pull/1382 This patch adds the `filter.<driver>.process` string option which, if used, keeps the external filter process running and processes all blobs with the packet format (pkt-line) based protocol over standard input and standard output. The full protocol is explained in detail in `Documentation/gitattributes.txt`. A few key decisions: * The long running filter process is referred to as filter protocol version 2 because the existing single shot filter invocation is considered version 1. * Git sends a welcome message and expects a response right after the external filter process has started. This ensures that Git will not hang if a version 1 filter is incorrectly used with the filter.<driver>.process option for version 2 filters. In addition, Git can detect this kind of error and warn the user. * The status of a filter operation (e.g. "success" or "error) is set before the actual response and (if necessary!) re-set after the response. The advantage of this two step status response is that if the filter detects an error early, then the filter can communicate this and Git does not even need to create structures to read the response. * All status responses are pkt-line lists terminated with a flush packet. This allows us to send other status fields with the same protocol in the future. Helped-by: Martin-Louis Bright <mlbright@gmail.com> Reviewed-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:37 +02:00			`const char *process;`
Add a setting to require a filter to be successful By default, a missing filter driver or a failure from the filter driver is not an error, but merely makes the filter operation a no-op pass through. This is useful to massage the content into a shape that is more convenient for the platform, filesystem, and the user to use, and the content filter mechanism is not used to turn something unusable into usable. However, we could also use of the content filtering mechanism and store the content that cannot be directly used in the repository (e.g. a UUID that refers to the true content stored outside git, or an encrypted content) and turn it into a usable form upon checkout (e.g. download the external content, or decrypt the encrypted content). For such a use case, the content cannot be used when filter driver fails, and we need a way to tell Git to abort the whole operation for such a failing or missing filter driver. Add a new "filter.<driver>.required" configuration variable to mark the second use case. When it is set, git will abort the operation when the filter driver does not exist or exits with a non-zero status code. Signed-off-by: Jehan Bing <jehan@orb.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-17 02:19:03 +01:00			`int required;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`} user_convert, *user_convert_tail;`

convert: prepare filter.<driver>.process option Refactor the existing 'single shot filter mechanism' and prepare the new 'long running filter mechanism'. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:36 +02:00			`static int apply_filter(const char path, const char src, size_t len,`
			`int fd, struct strbuf dst, struct convert_driver drv,`
			`const unsigned int wanted_capability)`
			`{`
			`const char *cmd = NULL;`

			`if (!drv)`
			`return 0;`

			`if (!dst)`
			`return 1;`

convert: add filter.<driver>.process option Git's clean/smudge mechanism invokes an external filter process for every single blob that is affected by a filter. If Git filters a lot of blobs then the startup time of the external filter processes can become a significant part of the overall Git execution time. In a preliminary performance test this developer used a clean/smudge filter written in golang to filter 12,000 files. This process took 364s with the existing filter mechanism and 5s with the new mechanism. See details here: https://github.com/github/git-lfs/pull/1382 This patch adds the `filter.<driver>.process` string option which, if used, keeps the external filter process running and processes all blobs with the packet format (pkt-line) based protocol over standard input and standard output. The full protocol is explained in detail in `Documentation/gitattributes.txt`. A few key decisions: * The long running filter process is referred to as filter protocol version 2 because the existing single shot filter invocation is considered version 1. * Git sends a welcome message and expects a response right after the external filter process has started. This ensures that Git will not hang if a version 1 filter is incorrectly used with the filter.<driver>.process option for version 2 filters. In addition, Git can detect this kind of error and warn the user. * The status of a filter operation (e.g. "success" or "error) is set before the actual response and (if necessary!) re-set after the response. The advantage of this two step status response is that if the filter detects an error early, then the filter can communicate this and Git does not even need to create structures to read the response. * All status responses are pkt-line lists terminated with a flush packet. This allows us to send other status fields with the same protocol in the future. Helped-by: Martin-Louis Bright <mlbright@gmail.com> Reviewed-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:37 +02:00			`if ((CAP_CLEAN & wanted_capability) && !drv->process && drv->clean)`
convert: prepare filter.<driver>.process option Refactor the existing 'single shot filter mechanism' and prepare the new 'long running filter mechanism'. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:36 +02:00			`cmd = drv->clean;`
convert: add filter.<driver>.process option Git's clean/smudge mechanism invokes an external filter process for every single blob that is affected by a filter. If Git filters a lot of blobs then the startup time of the external filter processes can become a significant part of the overall Git execution time. In a preliminary performance test this developer used a clean/smudge filter written in golang to filter 12,000 files. This process took 364s with the existing filter mechanism and 5s with the new mechanism. See details here: https://github.com/github/git-lfs/pull/1382 This patch adds the `filter.<driver>.process` string option which, if used, keeps the external filter process running and processes all blobs with the packet format (pkt-line) based protocol over standard input and standard output. The full protocol is explained in detail in `Documentation/gitattributes.txt`. A few key decisions: * The long running filter process is referred to as filter protocol version 2 because the existing single shot filter invocation is considered version 1. * Git sends a welcome message and expects a response right after the external filter process has started. This ensures that Git will not hang if a version 1 filter is incorrectly used with the filter.<driver>.process option for version 2 filters. In addition, Git can detect this kind of error and warn the user. * The status of a filter operation (e.g. "success" or "error) is set before the actual response and (if necessary!) re-set after the response. The advantage of this two step status response is that if the filter detects an error early, then the filter can communicate this and Git does not even need to create structures to read the response. * All status responses are pkt-line lists terminated with a flush packet. This allows us to send other status fields with the same protocol in the future. Helped-by: Martin-Louis Bright <mlbright@gmail.com> Reviewed-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:37 +02:00			`else if ((CAP_SMUDGE & wanted_capability) && !drv->process && drv->smudge)`
convert: prepare filter.<driver>.process option Refactor the existing 'single shot filter mechanism' and prepare the new 'long running filter mechanism'. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:36 +02:00			`cmd = drv->smudge;`

			`if (cmd && *cmd)`
			`return apply_single_file_filter(path, src, len, fd, dst, cmd);`
convert: add filter.<driver>.process option Git's clean/smudge mechanism invokes an external filter process for every single blob that is affected by a filter. If Git filters a lot of blobs then the startup time of the external filter processes can become a significant part of the overall Git execution time. In a preliminary performance test this developer used a clean/smudge filter written in golang to filter 12,000 files. This process took 364s with the existing filter mechanism and 5s with the new mechanism. See details here: https://github.com/github/git-lfs/pull/1382 This patch adds the `filter.<driver>.process` string option which, if used, keeps the external filter process running and processes all blobs with the packet format (pkt-line) based protocol over standard input and standard output. The full protocol is explained in detail in `Documentation/gitattributes.txt`. A few key decisions: * The long running filter process is referred to as filter protocol version 2 because the existing single shot filter invocation is considered version 1. * Git sends a welcome message and expects a response right after the external filter process has started. This ensures that Git will not hang if a version 1 filter is incorrectly used with the filter.<driver>.process option for version 2 filters. In addition, Git can detect this kind of error and warn the user. * The status of a filter operation (e.g. "success" or "error) is set before the actual response and (if necessary!) re-set after the response. The advantage of this two step status response is that if the filter detects an error early, then the filter can communicate this and Git does not even need to create structures to read the response. * All status responses are pkt-line lists terminated with a flush packet. This allows us to send other status fields with the same protocol in the future. Helped-by: Martin-Louis Bright <mlbright@gmail.com> Reviewed-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:37 +02:00			`else if (drv->process && *drv->process)`
			`return apply_multi_file_filter(path, src, len, fd, dst, drv->process, wanted_capability);`
convert: prepare filter.<driver>.process option Refactor the existing 'single shot filter mechanism' and prepare the new 'long running filter mechanism'. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:36 +02:00
			`return 0;`
			`}`

Provide git_config with a callback-data parameter git_config() only had a function parameter, but no callback data parameter. This assumes that all callback functions only modify global variables. With this patch, every callback gets a void * parameter, and it is hoped that this will help the libification effort. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-14 19:46:53 +02:00			`static int read_convert_config(const char var, const char value, void *cb)`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`{`
convert some config callbacks to parse_config_key These callers can drop some inline pointer arithmetic and magic offset constants, making them more readable and less error-prone (those constants had to match the lengths of strings, but there is no automatic verification of that fact). The "ep" pointer (presumably for "end pointer"), which points to the final key segment of the config variable, is given the more standard name "key" to describe its function rather than its derivation. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-23 07:24:23 +01:00			`const char key, name;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`int namelen;`
			`struct convert_driver *drv;`

			`/*`
			`* External conversion drivers are configured using`
			`* "filter.<name>.variable".`
			`*/`
convert some config callbacks to parse_config_key These callers can drop some inline pointer arithmetic and magic offset constants, making them more readable and less error-prone (those constants had to match the lengths of strings, but there is no automatic verification of that fact). The "ep" pointer (presumably for "end pointer"), which points to the final key segment of the config variable, is given the more standard name "key" to describe its function rather than its derivation. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-23 07:24:23 +01:00			`if (parse_config_key(var, "filter", &name, &namelen, &key) < 0 \|\| !name)`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`return 0;`
			`for (drv = user_convert; drv; drv = drv->next)`
			`if (!strncmp(drv->name, name, namelen) && !drv->name[namelen])`
			`break;`
			`if (!drv) {`
			`drv = xcalloc(1, sizeof(struct convert_driver));`
Use xmemdupz() in many places. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 00:32:36 +02:00			`drv->name = xmemdupz(name, namelen);`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`*user_convert_tail = drv;`
			`user_convert_tail = &(drv->next);`
			`}`

			`/*`
			`* filter.<name>.smudge and filter.<name>.clean specifies`
			`* the command line:`
			`*`
			`* command-line`
			`*`
			`* The command-line will not be interpolated in any way.`
			`*/`

convert some config callbacks to parse_config_key These callers can drop some inline pointer arithmetic and magic offset constants, making them more readable and less error-prone (those constants had to match the lengths of strings, but there is no automatic verification of that fact). The "ep" pointer (presumably for "end pointer"), which points to the final key segment of the config variable, is given the more standard name "key" to describe its function rather than its derivation. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-23 07:24:23 +01:00			`if (!strcmp("smudge", key))`
convert.c: Use 'git_config_string' to get 'smudge' and 'clean' Signed-off-by: Brian Hetro <whee@smaertness.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-05 07:24:42 +02:00			`return git_config_string(&drv->smudge, var, value);`

convert some config callbacks to parse_config_key These callers can drop some inline pointer arithmetic and magic offset constants, making them more readable and less error-prone (those constants had to match the lengths of strings, but there is no automatic verification of that fact). The "ep" pointer (presumably for "end pointer"), which points to the final key segment of the config variable, is given the more standard name "key" to describe its function rather than its derivation. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-23 07:24:23 +01:00			`if (!strcmp("clean", key))`
convert.c: Use 'git_config_string' to get 'smudge' and 'clean' Signed-off-by: Brian Hetro <whee@smaertness.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-05 07:24:42 +02:00			`return git_config_string(&drv->clean, var, value);`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
convert: add filter.<driver>.process option Git's clean/smudge mechanism invokes an external filter process for every single blob that is affected by a filter. If Git filters a lot of blobs then the startup time of the external filter processes can become a significant part of the overall Git execution time. In a preliminary performance test this developer used a clean/smudge filter written in golang to filter 12,000 files. This process took 364s with the existing filter mechanism and 5s with the new mechanism. See details here: https://github.com/github/git-lfs/pull/1382 This patch adds the `filter.<driver>.process` string option which, if used, keeps the external filter process running and processes all blobs with the packet format (pkt-line) based protocol over standard input and standard output. The full protocol is explained in detail in `Documentation/gitattributes.txt`. A few key decisions: * The long running filter process is referred to as filter protocol version 2 because the existing single shot filter invocation is considered version 1. * Git sends a welcome message and expects a response right after the external filter process has started. This ensures that Git will not hang if a version 1 filter is incorrectly used with the filter.<driver>.process option for version 2 filters. In addition, Git can detect this kind of error and warn the user. * The status of a filter operation (e.g. "success" or "error) is set before the actual response and (if necessary!) re-set after the response. The advantage of this two step status response is that if the filter detects an error early, then the filter can communicate this and Git does not even need to create structures to read the response. * All status responses are pkt-line lists terminated with a flush packet. This allows us to send other status fields with the same protocol in the future. Helped-by: Martin-Louis Bright <mlbright@gmail.com> Reviewed-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:37 +02:00			`if (!strcmp("process", key))`
			`return git_config_string(&drv->process, var, value);`

convert some config callbacks to parse_config_key These callers can drop some inline pointer arithmetic and magic offset constants, making them more readable and less error-prone (those constants had to match the lengths of strings, but there is no automatic verification of that fact). The "ep" pointer (presumably for "end pointer"), which points to the final key segment of the config variable, is given the more standard name "key" to describe its function rather than its derivation. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-01-23 07:24:23 +01:00			`if (!strcmp("required", key)) {`
Add a setting to require a filter to be successful By default, a missing filter driver or a failure from the filter driver is not an error, but merely makes the filter operation a no-op pass through. This is useful to massage the content into a shape that is more convenient for the platform, filesystem, and the user to use, and the content filter mechanism is not used to turn something unusable into usable. However, we could also use of the content filtering mechanism and store the content that cannot be directly used in the repository (e.g. a UUID that refers to the true content stored outside git, or an encrypted content) and turn it into a usable form upon checkout (e.g. download the external content, or decrypt the encrypted content). For such a use case, the content cannot be used when filter driver fails, and we need a way to tell Git to abort the whole operation for such a failing or missing filter driver. Add a new "filter.<driver>.required" configuration variable to mark the second use case. When it is set, git will abort the operation when the filter driver does not exist or exits with a non-zero status code. Signed-off-by: Jehan Bing <jehan@orb.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-17 02:19:03 +01:00			`drv->required = git_config_bool(var, value);`
			`return 0;`
			`}`

Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`return 0;`
			`}`

Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`static int count_ident(const char *cp, unsigned long size)`
			`{`
			`/*`
Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-14 15:37:25 +02:00			`* "$Id: 0000000000000000000000000000000000000000 $" <=> "$Id$"`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`*/`
			`int cnt = 0;`
			`char ch;`

			`while (size) {`
			`ch = *cp++;`
			`size--;`
			`if (ch != '$')`
			`continue;`
Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-14 15:37:25 +02:00			`if (size < 3)`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`break;`
Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-14 15:37:25 +02:00			`if (memcmp("Id", cp, 2))`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`continue;`
Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-14 15:37:25 +02:00			`ch = cp[2];`
			`cp += 3;`
			`size -= 3;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`if (ch == '$')`
Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-14 15:37:25 +02:00			`cnt++; /* $Id$ */`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`if (ch != ':')`
			`continue;`

			`/*`
Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-14 15:37:25 +02:00			`* "$Id: ... "; scan up to the closing dollar sign and discard.`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`*/`
			`while (size) {`
			`ch = *cp++;`
			`size--;`
			`if (ch == '$') {`
			`cnt++;`
			`break;`
			`}`
convert: Safer handling of $Id$ contraction. The code to contract $Id:xxxxx$ strings could eat an arbitrary amount of source text if the terminating $ was lost. It now refuses to contract $Id:xxxxx$ strings spanning multiple lines. Signed-off-by: Henrik Grubbström <grubba@grubba.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-06 14:46:37 +02:00			`if (ch == '\n')`
			`break;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`}`
			`}`
			`return cnt;`
			`}`

Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`static int ident_to_git(const char path, const char src, size_t len,`
			`struct strbuf *buf, int ident)`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`{`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`char dst, dollar;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00
teach dry-run convert_to_git not to require a src buffer When we call convert_to_git in dry-run mode, it may still want to look at the source buffer, because some CRLF conversion modes depend on analyzing the source to determine whether it is in fact convertible CRLF text. However, the main motivation for convert_to_git's dry-run mode is that we would decide which method to use to acquire the blob's data (streaming versus in-core). Requiring this source analysis creates a chicken-and-egg problem. We are better off simply guessing that anything we can't analyze will end up needing conversion. This patch lets a caller specify a NULL src buffer when using dry-run mode (and only dry-run mode). A non-zero return value goes from "we would convert" to "we might convert"; a zero return value remains "we would definitely not convert". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-24 23:05:03 +01:00			`if (!ident \|\| (src && !count_ident(src, len)))`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return 0;`

teach convert_to_git a "dry run" mode Some callers may want to know whether convert_to_git will actually do anything before performing the conversion itself (e.g., to decide whether to stream or handle blobs in-core). This patch lets callers specify the dry run mode by passing a NULL destination buffer. The return value, instead of indicating whether conversion happened, will indicate whether conversion would occur. For readability, we also include a wrapper function which makes it more obvious we are not actually performing the conversion. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-24 23:02:37 +01:00			`if (!buf)`
			`return 1;`

Fix in-place editing functions in convert.c * crlf_to_git and ident_to_git: Don't grow the buffer if there is enough space in the first place. As a side effect, when the editing is done "in place", we don't grow, so the buffer pointer doesn't changes, and `src' isn't invalidated anymore. Thanks to Bernt Hansen for the bug report. * apply_filter: Fix memory leak due to fake in-place editing that didn't collected the old buffer when the filter succeeds. Also a cosmetic fix. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Lars Hjemli <hjemli@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-05 10:11:59 +02:00			`/* only grow if not in place */`
			`if (strbuf_avail(buf) + buf->len < len)`
			`strbuf_grow(buf, len - buf->len);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`dst = buf->buf;`
			`for (;;) {`
			`dollar = memchr(src, '$', len);`
			`if (!dollar)`
			`break;`
Use memmove in ident_to_git convert_to_git sets src=dst->buf if any of the preceding conversions actually did any work. Thus in ident_to_git we have to use memmove instead of memcpy as far as src->dst copying is concerned. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-29 22:06:04 +02:00			`memmove(dst, src, dollar + 1 - src);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`dst += dollar + 1 - src;`
			`len -= dollar + 1 - src;`
			`src = dollar + 1;`

			`if (len > 3 && !memcmp(src, "Id:", 3)) {`
			`dollar = memchr(src + 3, '$', len - 3);`
			`if (!dollar)`
			`break;`
convert: Safer handling of $Id$ contraction. The code to contract $Id:xxxxx$ strings could eat an arbitrary amount of source text if the terminating $ was lost. It now refuses to contract $Id:xxxxx$ strings spanning multiple lines. Signed-off-by: Henrik Grubbström <grubba@grubba.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-06 14:46:37 +02:00			`if (memchr(src + 3, '\n', dollar - src - 3)) {`
			`/* Line break before the next dollar. */`
			`continue;`
			`}`

Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-14 15:37:25 +02:00			`memcpy(dst, "Id$", 3);`
			`dst += 3;`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`len -= dollar + 1 - src;`
			`src = dollar + 1;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`}`
			`}`
Use memmove in ident_to_git convert_to_git sets src=dst->buf if any of the preceding conversions actually did any work. Thus in ident_to_git we have to use memmove instead of memcpy as far as src->dst copying is concerned. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-29 22:06:04 +02:00			`memmove(dst, src, len);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`strbuf_setlen(buf, dst + len - buf->buf);`
			`return 1;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`}`

Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`static int ident_to_worktree(const char path, const char src, size_t len,`
			`struct strbuf *buf, int ident)`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`{`
			`unsigned char sha1[20];`
convert: Keep foreign $Id$ on checkout. If there are foreign $Id$ keywords in the repository, they are most likely there for a reason. Let's keep them on checkout (which is also what the documentation indicates). Foreign $Id$ keywords are now recognized by there being multiple space separated fields in $Id:xxxxx$. Signed-off-by: Henrik Grubbström <grubba@grubba.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-06 14:46:38 +02:00			`char to_free = NULL, dollar, *spc;`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`int cnt;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00
			`if (!ident)`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return 0;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`cnt = count_ident(src, len);`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`if (!cnt)`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return 0;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`/* are we "faking" in place editing ? */`
			`if (src == buf->buf)`
strbuf change: be sure ->buf is never ever NULL. For that purpose, the ->buf is always initialized with a char * buf living in the strbuf module. It is made a char * so that we can sloppily accept things that perform: sb->buf[0] = '\0', and because you can't pass "" as an initializer for ->buf without making gcc unhappy for very good reasons. strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf anymore. as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying ->buf isn't an option anymore, if ->buf is going to escape from the scope, and eventually be free'd. API changes: * strbuf_setlen now always works, so just make strbuf_reset a convenience macro. * strbuf_detatch takes a size_t* optional argument (meaning it can be NULL) to copy the buffer's len, as it was needed for this refactor to make the code more readable, and working like the callers. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-27 12:58:23 +02:00			`to_free = strbuf_detach(buf, NULL);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`hash_sha1_file(src, len, "blob", sha1);`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`strbuf_grow(buf, len + cnt * 43);`
			`for (;;) {`
			`/* step 1: run to the next '$' */`
			`dollar = memchr(src, '$', len);`
			`if (!dollar)`
			`break;`
			`strbuf_add(buf, src, dollar + 1 - src);`
			`len -= dollar + 1 - src;`
			`src = dollar + 1;`
Fix mishandling of $Id$ expanded in the repository copy in convert.c If the repository contained an expanded ident keyword (i.e. $Id:XXXX$), then the wrong bytes were discarded, and the Id keyword was not expanded. The fault was in convert.c:ident_to_worktree(). Previously, when a "$Id:" was found in the repository version, ident_to_worktree() would search for the next "$" after this, and discarded everything it found until then. That was done with the loop: do { ch = cp++; if (ch == '$') break; rem--; } while (rem); The above loop left cp pointing one character _after_ the final "$" (because of ch = cp++). This was different from the non-expanded case, were cp is left pointing at the "$", and was different from the comment which stated "discard up to but not including the closing $". This patch fixes that by making the loop: do { ch = *cp; if (ch == '$') break; cp++; rem--; } while (rem); That is, cp is tested _then_ incremented. This loop exits if it finds a "$" or if it runs out of bytes in the source. After this loop, if there was no closing "$" the expansion is skipped, and the outer loop is allowed to continue leaving this non-keyword as it was. However, when the "$" is found, size is corrected, before running the expansion: size -= (cp - src); This is wrong; size is going to be corrected anyway after the expansion, so there is no need to do it here. This patch removes that redundant correction. To help find this bug, I heavily commented the routine; those comments are included here as a bonus. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-25 12:50:08 +02:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`/* step 2: does it looks like a bit like Id:xxx$ or Id$ ? */`
			`if (len < 3 \|\| memcmp("Id", src, 2))`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`continue;`

Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`/* step 3: skip over Id$ or Id:xxxxx$ */`
			`if (src[2] == '$') {`
			`src += 3;`
			`len -= 3;`
			`} else if (src[2] == ':') {`
			`/*`
			`* It's possible that an expanded Id has crept its way into the`
convert: Keep foreign $Id$ on checkout. If there are foreign $Id$ keywords in the repository, they are most likely there for a reason. Let's keep them on checkout (which is also what the documentation indicates). Foreign $Id$ keywords are now recognized by there being multiple space separated fields in $Id:xxxxx$. Signed-off-by: Henrik Grubbström <grubba@grubba.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-06 14:46:38 +02:00			`* repository, we cope with that by stripping the expansion out.`
			`* This is probably not a good idea, since it will cause changes`
			`* on checkout, which won't go away by stash, but let's keep it`
			`* for git-style ids.`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`*/`
			`dollar = memchr(src + 3, '$', len - 3);`
			`if (!dollar) {`
			`/* incomplete keyword, no more '$', so just quit the loop */`
			`break;`
			`}`
Fix mishandling of $Id$ expanded in the repository copy in convert.c If the repository contained an expanded ident keyword (i.e. $Id:XXXX$), then the wrong bytes were discarded, and the Id keyword was not expanded. The fault was in convert.c:ident_to_worktree(). Previously, when a "$Id:" was found in the repository version, ident_to_worktree() would search for the next "$" after this, and discarded everything it found until then. That was done with the loop: do { ch = cp++; if (ch == '$') break; rem--; } while (rem); The above loop left cp pointing one character _after_ the final "$" (because of ch = cp++). This was different from the non-expanded case, were cp is left pointing at the "$", and was different from the comment which stated "discard up to but not including the closing $". This patch fixes that by making the loop: do { ch = *cp; if (ch == '$') break; cp++; rem--; } while (rem); That is, cp is tested _then_ incremented. This loop exits if it finds a "$" or if it runs out of bytes in the source. After this loop, if there was no closing "$" the expansion is skipped, and the outer loop is allowed to continue leaving this non-keyword as it was. However, when the "$" is found, size is corrected, before running the expansion: size -= (cp - src); This is wrong; size is going to be corrected anyway after the expansion, so there is no need to do it here. This patch removes that redundant correction. To help find this bug, I heavily commented the routine; those comments are included here as a bonus. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-25 12:50:08 +02:00
convert: Safer handling of $Id$ contraction. The code to contract $Id:xxxxx$ strings could eat an arbitrary amount of source text if the terminating $ was lost. It now refuses to contract $Id:xxxxx$ strings spanning multiple lines. Signed-off-by: Henrik Grubbström <grubba@grubba.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-06 14:46:37 +02:00			`if (memchr(src + 3, '\n', dollar - src - 3)) {`
			`/* Line break before the next dollar. */`
			`continue;`
			`}`

convert: Keep foreign $Id$ on checkout. If there are foreign $Id$ keywords in the repository, they are most likely there for a reason. Let's keep them on checkout (which is also what the documentation indicates). Foreign $Id$ keywords are now recognized by there being multiple space separated fields in $Id:xxxxx$. Signed-off-by: Henrik Grubbström <grubba@grubba.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-06 14:46:38 +02:00			`spc = memchr(src + 4, ' ', dollar - src - 4);`
			`if (spc && spc < dollar-1) {`
			`/* There are spaces in unexpected places.`
			`* This is probably an id from some other`
			`* versioning system. Keep it for now.`
			`*/`
			`continue;`
			`}`

Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`len -= dollar + 1 - src;`
			`src = dollar + 1;`
			`} else {`
			`/* it wasn't a "Id$" or "Id:xxxx$" */`
			`continue;`
			`}`
Fix mishandling of $Id$ expanded in the repository copy in convert.c If the repository contained an expanded ident keyword (i.e. $Id:XXXX$), then the wrong bytes were discarded, and the Id keyword was not expanded. The fault was in convert.c:ident_to_worktree(). Previously, when a "$Id:" was found in the repository version, ident_to_worktree() would search for the next "$" after this, and discarded everything it found until then. That was done with the loop: do { ch = cp++; if (ch == '$') break; rem--; } while (rem); The above loop left cp pointing one character _after_ the final "$" (because of ch = cp++). This was different from the non-expanded case, were cp is left pointing at the "$", and was different from the comment which stated "discard up to but not including the closing $". This patch fixes that by making the loop: do { ch = *cp; if (ch == '$') break; cp++; rem--; } while (rem); That is, cp is tested _then_ incremented. This loop exits if it finds a "$" or if it runs out of bytes in the source. After this loop, if there was no closing "$" the expansion is skipped, and the outer loop is allowed to continue leaving this non-keyword as it was. However, when the "$" is found, size is corrected, before running the expansion: size -= (cp - src); This is wrong; size is going to be corrected anyway after the expansion, so there is no need to do it here. This patch removes that redundant correction. To help find this bug, I heavily commented the routine; those comments are included here as a bonus. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-25 12:50:08 +02:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`/* step 4: substitute */`
			`strbuf_addstr(buf, "Id: ");`
			`strbuf_add(buf, sha1_to_hex(sha1), 40);`
			`strbuf_addstr(buf, " $");`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`}`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`strbuf_add(buf, src, len);`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`free(to_free);`
			`return 1;`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`}`

convert.c: remove unused parameter 'path' Some functions get a parameter path, but don't use it. Remove the unused parameter. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-05 17:13:22 +01:00			`static enum crlf_action git_path_check_crlf(struct git_attr_check *check)`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`{`
convert.c: restructure the attribute checking part. This separates the checkattr() call and interpretation of the returned value specific to the 'crlf' attribute into separate routines, so that we can run a single call to checkattr() to check for more than one attributes, and then interprete what the returned settings mean separately. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 08:44:02 +02:00			`const char *value = check->value;`

			`if (ATTR_TRUE(value))`
			`return CRLF_TEXT;`
			`else if (ATTR_FALSE(value))`
			`return CRLF_BINARY;`
			`else if (ATTR_UNSET(value))`
			`;`
			`else if (!strcmp(value, "input"))`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`return CRLF_TEXT_INPUT;`
Add per-repository eol normalization Change the semantics of the "crlf" attribute so that it enables end-of-line normalization when it is set, regardless of "core.autocrlf". Add a new setting for "crlf": "auto", which enables end-of-line conversion but does not override the automatic text file detection. Add a new attribute "eol" with possible values "crlf" and "lf". When set, this attribute enables normalization and forces git to use CRLF or LF line endings in the working directory, respectively. The line ending style to be used for normalized text files in the working directory is set using "core.autocrlf". When it is set to "true", CRLFs are used in the working directory; when set to "input" or "false", LFs are used. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-05-19 22:43:10 +02:00			`else if (!strcmp(value, "auto"))`
			`return CRLF_AUTO;`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`return CRLF_UNDEFINED;`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`}`

convert.c: remove unused parameter 'path' Some functions get a parameter path, but don't use it. Remove the unused parameter. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-05 17:13:22 +01:00			`static enum eol git_path_check_eol(struct git_attr_check *check)`
Add per-repository eol normalization Change the semantics of the "crlf" attribute so that it enables end-of-line normalization when it is set, regardless of "core.autocrlf". Add a new setting for "crlf": "auto", which enables end-of-line conversion but does not override the automatic text file detection. Add a new attribute "eol" with possible values "crlf" and "lf". When set, this attribute enables normalization and forces git to use CRLF or LF line endings in the working directory, respectively. The line ending style to be used for normalized text files in the working directory is set using "core.autocrlf". When it is set to "true", CRLFs are used in the working directory; when set to "input" or "false", LFs are used. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-05-19 22:43:10 +02:00			`{`
			`const char *value = check->value;`

			`if (ATTR_UNSET(value))`
			`;`
			`else if (!strcmp(value, "lf"))`
			`return EOL_LF;`
			`else if (!strcmp(value, "crlf"))`
			`return EOL_CRLF;`
			`return EOL_UNSET;`
			`}`

convert.c: remove unused parameter 'path' Some functions get a parameter path, but don't use it. Remove the unused parameter. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-05 17:13:22 +01:00			`static struct convert_driver git_path_check_convert(struct git_attr_check check)`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`{`
			`const char *value = check->value;`
			`struct convert_driver *drv;`

			`if (ATTR_TRUE(value) \|\| ATTR_FALSE(value) \|\| ATTR_UNSET(value))`
			`return NULL;`
			`for (drv = user_convert; drv; drv = drv->next)`
			`if (!strcmp(value, drv->name))`
			`return drv;`
			`return NULL;`
			`}`

convert.c: remove unused parameter 'path' Some functions get a parameter path, but don't use it. Remove the unused parameter. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-05 17:13:22 +01:00			`static int git_path_check_ident(struct git_attr_check *check)`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`{`
			`const char *value = check->value;`

			`return !!ATTR_TRUE(value);`
			`}`

convert: make it harder to screw up adding a conversion attribute The current internal API requires the callers of setup_convert_check() to supply the git_attr_check structures (hence they need to know how many to allocate), but they grab the same set of attributes for given path. Define a new convert_attrs() API that fills a higher level information that the callers (convert_to_git and convert_to_working_tree) really want, and move the common code to interact with the attributes system to it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:58:31 +02:00			`struct conv_attrs {`
			`struct convert_driver *drv;`
convert.c: remove input_crlf_action() Integrate the code of input_crlf_action() into convert_attrs(), so that ca.crlf_action is always valid after calling convert_attrs(). Keep a copy of crlf_action in attr_action, this is needed for get_convert_attr_ascii(). Remove eol_attr from struct conv_attrs, as it is now used temporally. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-05 17:13:23 +01:00			`enum crlf_action attr_action; /* What attr says */`
			`enum crlf_action crlf_action; /* When no attr is set, use core.autocrlf */`
convert: make it harder to screw up adding a conversion attribute The current internal API requires the callers of setup_convert_check() to supply the git_attr_check structures (hence they need to know how many to allocate), but they grab the same set of attributes for given path. Define a new convert_attrs() API that fills a higher level information that the callers (convert_to_git and convert_to_working_tree) really want, and move the common code to interact with the attributes system to it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:58:31 +02:00			`int ident;`
			`};`

convert: make it safer to add conversion attributes The places that need to pass an array of "struct git_attr_check" needed to be careful to pass a large enough array and know what index each element lied. Make it safer and easier to code these. Besides, the hard-coded sequence of initializing various attributes was too ugly after we gained more than a few attributes. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 20:23:04 +02:00			`static const char *conv_attr_name[] = {`
			`"crlf", "ident", "filter", "eol", "text",`
			`};`
			`#define NUM_CONV_ATTRS ARRAY_SIZE(conv_attr_name)`

convert: make it harder to screw up adding a conversion attribute The current internal API requires the callers of setup_convert_check() to supply the git_attr_check structures (hence they need to know how many to allocate), but they grab the same set of attributes for given path. Define a new convert_attrs() API that fills a higher level information that the callers (convert_to_git and convert_to_working_tree) really want, and move the common code to interact with the attributes system to it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:58:31 +02:00			`static void convert_attrs(struct conv_attrs ca, const char path)`
convert: make it safer to add conversion attributes The places that need to pass an array of "struct git_attr_check" needed to be careful to pass a large enough array and know what index each element lied. Make it safer and easier to code these. Besides, the hard-coded sequence of initializing various attributes was too ugly after we gained more than a few attributes. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 20:23:04 +02:00			`{`
			`int i;`
			`static struct git_attr_check ccheck[NUM_CONV_ATTRS];`

			`if (!ccheck[0].attr) {`
			`for (i = 0; i < NUM_CONV_ATTRS; i++)`
			`ccheck[i].attr = git_attr(conv_attr_name[i]);`
			`user_convert_tail = &user_convert;`
			`git_config(read_convert_config, NULL);`
			`}`
convert: make it harder to screw up adding a conversion attribute The current internal API requires the callers of setup_convert_check() to supply the git_attr_check structures (hence they need to know how many to allocate), but they grab the same set of attributes for given path. Define a new convert_attrs() API that fills a higher level information that the callers (convert_to_git and convert_to_working_tree) really want, and move the common code to interact with the attributes system to it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:58:31 +02:00
Rename git_checkattr() to git_check_attr() Suggested by: Junio Hamano <gitster@pobox.com> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-08-04 06:36:33 +02:00			`if (!git_check_attr(path, NUM_CONV_ATTRS, ccheck)) {`
convert.c: remove input_crlf_action() Integrate the code of input_crlf_action() into convert_attrs(), so that ca.crlf_action is always valid after calling convert_attrs(). Keep a copy of crlf_action in attr_action, this is needed for get_convert_attr_ascii(). Remove eol_attr from struct conv_attrs, as it is now used temporally. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-05 17:13:23 +01:00			`ca->crlf_action = git_path_check_crlf(ccheck + 4);`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`if (ca->crlf_action == CRLF_UNDEFINED)`
convert.c: remove unused parameter 'path' Some functions get a parameter path, but don't use it. Remove the unused parameter. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-05 17:13:22 +01:00			`ca->crlf_action = git_path_check_crlf(ccheck + 0);`
convert.c: remove input_crlf_action() Integrate the code of input_crlf_action() into convert_attrs(), so that ca.crlf_action is always valid after calling convert_attrs(). Keep a copy of crlf_action in attr_action, this is needed for get_convert_attr_ascii(). Remove eol_attr from struct conv_attrs, as it is now used temporally. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-05 17:13:23 +01:00			`ca->attr_action = ca->crlf_action;`
convert.c: remove unused parameter 'path' Some functions get a parameter path, but don't use it. Remove the unused parameter. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-05 17:13:22 +01:00			`ca->ident = git_path_check_ident(ccheck + 1);`
			`ca->drv = git_path_check_convert(ccheck + 2);`
convert.c: correct attr_action() df747b81 (convert.c: refactor crlf_action, 2016-02-10) introduced a bug to "git ls-files --eol". The "text" attribute was shown as "text eol=lf" or "text eol=crlf", depending on core.autocrlf or core.eol. Correct this and add test cases in t0027. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-23 18:07:19 +01:00			`if (ca->crlf_action != CRLF_BINARY) {`
			`enum eol eol_attr = git_path_check_eol(ccheck + 3);`
convert: unify the "auto" handling of CRLF Before this change, $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes would have the same effect as $ echo "* text" >.gitattributes $ git config core.eol crlf Since the 'eol' attribute had higher priority than 'text=auto', this may corrupt binary files and is not what most users expect to happen. Make the 'eol' attribute to obey 'text=auto' and now $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes behaves the same as $ echo "* text=auto" >.gitattributes $ git config core.eol crlf In other words, $ echo "* text=auto eol=crlf" >.gitattributes has the same effect as $ git config core.autocrlf true and $ echo "* text=auto eol=lf" >.gitattributes has the same effect as $ git config core.autocrlf input Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-06-28 10:01:13 +02:00			`if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_LF)`
			`ca->crlf_action = CRLF_AUTO_INPUT;`
			`else if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_CRLF)`
			`ca->crlf_action = CRLF_AUTO_CRLF;`
			`else if (eol_attr == EOL_LF)`
convert.c: correct attr_action() df747b81 (convert.c: refactor crlf_action, 2016-02-10) introduced a bug to "git ls-files --eol". The "text" attribute was shown as "text eol=lf" or "text eol=crlf", depending on core.autocrlf or core.eol. Correct this and add test cases in t0027. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-23 18:07:19 +01:00			`ca->crlf_action = CRLF_TEXT_INPUT;`
			`else if (eol_attr == EOL_CRLF)`
			`ca->crlf_action = CRLF_TEXT_CRLF;`
			`}`
			`ca->attr_action = ca->crlf_action;`
convert: make it harder to screw up adding a conversion attribute The current internal API requires the callers of setup_convert_check() to supply the git_attr_check structures (hence they need to know how many to allocate), but they grab the same set of attributes for given path. Define a new convert_attrs() API that fills a higher level information that the callers (convert_to_git and convert_to_working_tree) really want, and move the common code to interact with the attributes system to it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:58:31 +02:00			`} else {`
			`ca->drv = NULL;`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`ca->crlf_action = CRLF_UNDEFINED;`
convert: make it harder to screw up adding a conversion attribute The current internal API requires the callers of setup_convert_check() to supply the git_attr_check structures (hence they need to know how many to allocate), but they grab the same set of attributes for given path. Define a new convert_attrs() API that fills a higher level information that the callers (convert_to_git and convert_to_working_tree) really want, and move the common code to interact with the attributes system to it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:58:31 +02:00			`ca->ident = 0;`
			`}`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`if (ca->crlf_action == CRLF_TEXT)`
			`ca->crlf_action = text_eol_is_crlf() ? CRLF_TEXT_CRLF : CRLF_TEXT_INPUT;`
			`if (ca->crlf_action == CRLF_UNDEFINED && auto_crlf == AUTO_CRLF_FALSE)`
			`ca->crlf_action = CRLF_BINARY;`
			`if (ca->crlf_action == CRLF_UNDEFINED && auto_crlf == AUTO_CRLF_TRUE)`
			`ca->crlf_action = CRLF_AUTO_CRLF;`
			`if (ca->crlf_action == CRLF_UNDEFINED && auto_crlf == AUTO_CRLF_INPUT)`
			`ca->crlf_action = CRLF_AUTO_INPUT;`
convert: make it safer to add conversion attributes The places that need to pass an array of "struct git_attr_check" needed to be careful to pass a large enough array and know what index each element lied. Make it safer and easier to code these. Besides, the hard-coded sequence of initializing various attributes was too ugly after we gained more than a few attributes. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 20:23:04 +02:00			`}`

convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00			`int would_convert_to_git_filter_fd(const char *path)`
			`{`
			`struct conv_attrs ca;`

			`convert_attrs(&ca, path);`
			`if (!ca.drv)`
			`return 0;`

			`/*`
			`* Apply a filter to an fd only if the filter is required to succeed.`
			`* We must die if the filter fails, because the original data before`
			`* filtering is not available.`
			`*/`
			`if (!ca.drv->required)`
			`return 0;`

convert: prepare filter.<driver>.process option Refactor the existing 'single shot filter mechanism' and prepare the new 'long running filter mechanism'. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:36 +02:00			`return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN);`
convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00			`}`

ls-files: add eol diagnostics When working in a cross-platform environment, a user may want to check if text files are stored normalized in the repository and if .gitattributes are set appropriately. Make it possible to let Git show the line endings in the index and in the working tree and the effective text/eol attributes. The end of line ("eolinfo") are shown like this: "-text" binary (or with bare CR) file "none" text file without any EOL "lf" text file with LF "crlf" text file with CRLF "mixed" text file with mixed line endings. The effective text/eol attribute is one of these: "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf" git ls-files --eol gives an output like this: i/none w/none attr/text=auto t/t5100/empty i/-text w/-text attr/-text t/test-binary-2.png i/lf w/lf attr/text eol=lf t/t5100/rfc2047-info-0007 i/lf w/crlf attr/text eol=crlf doit.bat i/mixed w/mixed attr/ locale/XX.po to show what eol convention is used in the data in the index ('i'), and in the working tree ('w'), and what attribute is in effect, for each path that is shown. Add test cases in t0027. Helped-By: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-01-16 07:50:02 +01:00			`const char get_convert_attr_ascii(const char path)`
			`{`
			`struct conv_attrs ca;`

			`convert_attrs(&ca, path);`
convert.c: remove input_crlf_action() Integrate the code of input_crlf_action() into convert_attrs(), so that ca.crlf_action is always valid after calling convert_attrs(). Keep a copy of crlf_action in attr_action, this is needed for get_convert_attr_ascii(). Remove eol_attr from struct conv_attrs, as it is now used temporally. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-05 17:13:23 +01:00			`switch (ca.attr_action) {`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`case CRLF_UNDEFINED:`
ls-files: add eol diagnostics When working in a cross-platform environment, a user may want to check if text files are stored normalized in the repository and if .gitattributes are set appropriately. Make it possible to let Git show the line endings in the index and in the working tree and the effective text/eol attributes. The end of line ("eolinfo") are shown like this: "-text" binary (or with bare CR) file "none" text file without any EOL "lf" text file with LF "crlf" text file with CRLF "mixed" text file with mixed line endings. The effective text/eol attribute is one of these: "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf" git ls-files --eol gives an output like this: i/none w/none attr/text=auto t/t5100/empty i/-text w/-text attr/-text t/test-binary-2.png i/lf w/lf attr/text eol=lf t/t5100/rfc2047-info-0007 i/lf w/crlf attr/text eol=crlf doit.bat i/mixed w/mixed attr/ locale/XX.po to show what eol convention is used in the data in the index ('i'), and in the working tree ('w'), and what attribute is in effect, for each path that is shown. Add test cases in t0027. Helped-By: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-01-16 07:50:02 +01:00			`return "";`
			`case CRLF_BINARY:`
			`return "-text";`
			`case CRLF_TEXT:`
			`return "text";`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`case CRLF_TEXT_INPUT:`
ls-files: add eol diagnostics When working in a cross-platform environment, a user may want to check if text files are stored normalized in the repository and if .gitattributes are set appropriately. Make it possible to let Git show the line endings in the index and in the working tree and the effective text/eol attributes. The end of line ("eolinfo") are shown like this: "-text" binary (or with bare CR) file "none" text file without any EOL "lf" text file with LF "crlf" text file with CRLF "mixed" text file with mixed line endings. The effective text/eol attribute is one of these: "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf" git ls-files --eol gives an output like this: i/none w/none attr/text=auto t/t5100/empty i/-text w/-text attr/-text t/test-binary-2.png i/lf w/lf attr/text eol=lf t/t5100/rfc2047-info-0007 i/lf w/crlf attr/text eol=crlf doit.bat i/mixed w/mixed attr/ locale/XX.po to show what eol convention is used in the data in the index ('i'), and in the working tree ('w'), and what attribute is in effect, for each path that is shown. Add test cases in t0027. Helped-By: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-01-16 07:50:02 +01:00			`return "text eol=lf";`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`case CRLF_TEXT_CRLF:`
			`return "text eol=crlf";`
ls-files: add eol diagnostics When working in a cross-platform environment, a user may want to check if text files are stored normalized in the repository and if .gitattributes are set appropriately. Make it possible to let Git show the line endings in the index and in the working tree and the effective text/eol attributes. The end of line ("eolinfo") are shown like this: "-text" binary (or with bare CR) file "none" text file without any EOL "lf" text file with LF "crlf" text file with CRLF "mixed" text file with mixed line endings. The effective text/eol attribute is one of these: "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf" git ls-files --eol gives an output like this: i/none w/none attr/text=auto t/t5100/empty i/-text w/-text attr/-text t/test-binary-2.png i/lf w/lf attr/text eol=lf t/t5100/rfc2047-info-0007 i/lf w/crlf attr/text eol=crlf doit.bat i/mixed w/mixed attr/ locale/XX.po to show what eol convention is used in the data in the index ('i'), and in the working tree ('w'), and what attribute is in effect, for each path that is shown. Add test cases in t0027. Helped-By: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-01-16 07:50:02 +01:00			`case CRLF_AUTO:`
			`return "text=auto";`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`case CRLF_AUTO_CRLF:`
convert: unify the "auto" handling of CRLF Before this change, $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes would have the same effect as $ echo "* text" >.gitattributes $ git config core.eol crlf Since the 'eol' attribute had higher priority than 'text=auto', this may corrupt binary files and is not what most users expect to happen. Make the 'eol' attribute to obey 'text=auto' and now $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes behaves the same as $ echo "* text=auto" >.gitattributes $ git config core.eol crlf In other words, $ echo "* text=auto eol=crlf" >.gitattributes has the same effect as $ git config core.autocrlf true and $ echo "* text=auto eol=lf" >.gitattributes has the same effect as $ git config core.autocrlf input Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-06-28 10:01:13 +02:00			`return "text=auto eol=crlf";`
convert.c: refactor crlf_action Refactor the determination and usage of crlf_action. Today, when no "crlf" attribute are set on a file, crlf_action is set to CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as before. After searching for line ending attributes, save the value in struct conv_attrs.crlf_action attr_action, so that get_convert_attr_ascii() is able report the attributes. Replace the old CRLF_GUESS usage: CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT Save the action in conv_attrs.crlf_action (as before) and change all callers. Make more clear, what is what, by defining: - CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf and core.eol is evaluated and one of CRLF_BINARY, CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected - CRLF_BINARY : No processing of line endings. - CRLF_TEXT : attribute "text" is set, line endings are processed. - CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text. - CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text. - CRLF_AUTO : attribute "auto" is set. - CRLF_AUTO_INPUT: core.autocrlf=input (no attributes) - CRLF_AUTO_CRLF : core.autocrlf=true (no attributes) Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-02-10 17:24:41 +01:00			`case CRLF_AUTO_INPUT:`
convert: unify the "auto" handling of CRLF Before this change, $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes would have the same effect as $ echo "* text" >.gitattributes $ git config core.eol crlf Since the 'eol' attribute had higher priority than 'text=auto', this may corrupt binary files and is not what most users expect to happen. Make the 'eol' attribute to obey 'text=auto' and now $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes behaves the same as $ echo "* text=auto" >.gitattributes $ git config core.eol crlf In other words, $ echo "* text=auto eol=crlf" >.gitattributes has the same effect as $ git config core.autocrlf true and $ echo "* text=auto eol=lf" >.gitattributes has the same effect as $ git config core.autocrlf input Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-06-28 10:01:13 +02:00			`return "text=auto eol=lf";`
ls-files: add eol diagnostics When working in a cross-platform environment, a user may want to check if text files are stored normalized in the repository and if .gitattributes are set appropriately. Make it possible to let Git show the line endings in the index and in the working tree and the effective text/eol attributes. The end of line ("eolinfo") are shown like this: "-text" binary (or with bare CR) file "none" text file without any EOL "lf" text file with LF "crlf" text file with CRLF "mixed" text file with mixed line endings. The effective text/eol attribute is one of these: "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf" git ls-files --eol gives an output like this: i/none w/none attr/text=auto t/t5100/empty i/-text w/-text attr/-text t/test-binary-2.png i/lf w/lf attr/text eol=lf t/t5100/rfc2047-info-0007 i/lf w/crlf attr/text eol=crlf doit.bat i/mixed w/mixed attr/ locale/XX.po to show what eol convention is used in the data in the index ('i'), and in the working tree ('w'), and what attribute is in effect, for each path that is shown. Add test cases in t0027. Helped-By: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-01-16 07:50:02 +01:00			`}`
			`return "";`
			`}`

safecrlf: Add mechanism to warn about irreversible crlf conversions CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data. If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately. Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data. This patch adds a mechanism that can either warn the user about an irreversible conversion or can even refuse to convert. The mechanism is controlled by the variable core.safecrlf, with the following values: - false: disable safecrlf mechanism - warn: warn about irreversible conversions - true: refuse irreversible conversions The default is to warn. Users are only affected by this default if core.autocrlf is set. But the current default of git is to leave core.autocrlf unset, so users will not see warnings unless they deliberately chose to activate the autocrlf mechanism. The safecrlf mechanism's details depend on the git command. The general principles when safecrlf is active (not false) are: - we warn/error out if files in the work tree can modified in an irreversible way without giving the user a chance to backup the original file. - for read-only operations that do not modify files in the work tree we do not not print annoying warnings. There are exceptions. Even though... - "git add" itself does not touch the files in the work tree, the next checkout would, so the safety triggers; - "git apply" to update a text file with a patch does touch the files in the work tree, but the operation is about text files and CRLF conversion is about fixing the line ending inconsistencies, so the safety does not trigger; - "git diff" itself does not touch the files in the work tree, it is often run to inspect the changes you intend to next "git add". To catch potential problems early, safety triggers. The concept of a safety check was originally proposed in a similar way by Linus Torvalds. Thanks to Dimitry Potapov for insisting on getting the naked LF/autocrlf=true case right. Signed-off-by: Steffen Prohaska <prohaska@zib.de> 2008-02-06 12:25:58 +01:00			`int convert_to_git(const char path, const char src, size_t len,`
			`struct strbuf *dst, enum safe_crlf checksafe)`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`{`
convert: make it harder to screw up adding a conversion attribute The current internal API requires the callers of setup_convert_check() to supply the git_attr_check structures (hence they need to know how many to allocate), but they grab the same set of attributes for given path. Define a new convert_attrs() API that fills a higher level information that the callers (convert_to_git and convert_to_working_tree) really want, and move the common code to interact with the attributes system to it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:58:31 +02:00			`int ret = 0;`
			`struct conv_attrs ca;`
convert.c: restructure the attribute checking part. This separates the checkattr() call and interpretation of the returned value specific to the 'crlf' attribute into separate routines, so that we can run a single call to checkattr() to check for more than one attributes, and then interprete what the returned settings mean separately. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 08:44:02 +02:00
convert: make it harder to screw up adding a conversion attribute The current internal API requires the callers of setup_convert_check() to supply the git_attr_check structures (hence they need to know how many to allocate), but they grab the same set of attributes for given path. Define a new convert_attrs() API that fills a higher level information that the callers (convert_to_git and convert_to_working_tree) really want, and move the common code to interact with the attributes system to it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:58:31 +02:00			`convert_attrs(&ca, path);`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00
convert: prepare filter.<driver>.process option Refactor the existing 'single shot filter mechanism' and prepare the new 'long running filter mechanism'. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:36 +02:00			`ret \|= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN);`
			`if (!ret && ca.drv && ca.drv->required)`
Add a setting to require a filter to be successful By default, a missing filter driver or a failure from the filter driver is not an error, but merely makes the filter operation a no-op pass through. This is useful to massage the content into a shape that is more convenient for the platform, filesystem, and the user to use, and the content filter mechanism is not used to turn something unusable into usable. However, we could also use of the content filtering mechanism and store the content that cannot be directly used in the repository (e.g. a UUID that refers to the true content stored outside git, or an encrypted content) and turn it into a usable form upon checkout (e.g. download the external content, or decrypt the encrypted content). For such a use case, the content cannot be used when filter driver fails, and we need a way to tell Git to abort the whole operation for such a failing or missing filter driver. Add a new "filter.<driver>.required" configuration variable to mark the second use case. When it is set, git will abort the operation when the filter driver does not exist or exits with a non-zero status code. Signed-off-by: Jehan Bing <jehan@orb.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-17 02:19:03 +01:00			`die("%s: clean filter '%s' failed", path, ca.drv->name);`

teach convert_to_git a "dry run" mode Some callers may want to know whether convert_to_git will actually do anything before performing the conversion itself (e.g., to decide whether to stream or handle blobs in-core). This patch lets callers specify the dry run mode by passing a NULL destination buffer. The return value, instead of indicating whether conversion happened, will indicate whether conversion would occur. For readability, we also include a wrapper function which makes it more obvious we are not actually performing the conversion. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-24 23:02:37 +01:00			`if (ret && dst) {`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`src = dst->buf;`
			`len = dst->len;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`}`
convert: make it harder to screw up adding a conversion attribute The current internal API requires the callers of setup_convert_check() to supply the git_attr_check structures (hence they need to know how many to allocate), but they grab the same set of attributes for given path. Define a new convert_attrs() API that fills a higher level information that the callers (convert_to_git and convert_to_working_tree) really want, and move the common code to interact with the attributes system to it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:58:31 +02:00			`ret \|= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);`
teach convert_to_git a "dry run" mode Some callers may want to know whether convert_to_git will actually do anything before performing the conversion itself (e.g., to decide whether to stream or handle blobs in-core). This patch lets callers specify the dry run mode by passing a NULL destination buffer. The return value, instead of indicating whether conversion happened, will indicate whether conversion would occur. For readability, we also include a wrapper function which makes it more obvious we are not actually performing the conversion. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-24 23:02:37 +01:00			`if (ret && dst) {`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`src = dst->buf;`
			`len = dst->len;`
convert.c: restructure the attribute checking part. This separates the checkattr() call and interpretation of the returned value specific to the 'crlf' attribute into separate routines, so that we can run a single call to checkattr() to check for more than one attributes, and then interprete what the returned settings mean separately. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 08:44:02 +02:00			`}`
convert: make it harder to screw up adding a conversion attribute The current internal API requires the callers of setup_convert_check() to supply the git_attr_check structures (hence they need to know how many to allocate), but they grab the same set of attributes for given path. Define a new convert_attrs() API that fills a higher level information that the callers (convert_to_git and convert_to_working_tree) really want, and move the common code to interact with the attributes system to it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:58:31 +02:00			`return ret \| ident_to_git(path, src, len, dst, ca.ident);`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`}`

convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00			`void convert_to_git_filter_fd(const char path, int fd, struct strbuf dst,`
			`enum safe_crlf checksafe)`
			`{`
			`struct conv_attrs ca;`
			`convert_attrs(&ca, path);`

			`assert(ca.drv);`
convert: add filter.<driver>.process option Git's clean/smudge mechanism invokes an external filter process for every single blob that is affected by a filter. If Git filters a lot of blobs then the startup time of the external filter processes can become a significant part of the overall Git execution time. In a preliminary performance test this developer used a clean/smudge filter written in golang to filter 12,000 files. This process took 364s with the existing filter mechanism and 5s with the new mechanism. See details here: https://github.com/github/git-lfs/pull/1382 This patch adds the `filter.<driver>.process` string option which, if used, keeps the external filter process running and processes all blobs with the packet format (pkt-line) based protocol over standard input and standard output. The full protocol is explained in detail in `Documentation/gitattributes.txt`. A few key decisions: * The long running filter process is referred to as filter protocol version 2 because the existing single shot filter invocation is considered version 1. * Git sends a welcome message and expects a response right after the external filter process has started. This ensures that Git will not hang if a version 1 filter is incorrectly used with the filter.<driver>.process option for version 2 filters. In addition, Git can detect this kind of error and warn the user. * The status of a filter operation (e.g. "success" or "error) is set before the actual response and (if necessary!) re-set after the response. The advantage of this two step status response is that if the filter detects an error early, then the filter can communicate this and Git does not even need to create structures to read the response. * All status responses are pkt-line lists terminated with a flush packet. This allows us to send other status fields with the same protocol in the future. Helped-by: Martin-Louis Bright <mlbright@gmail.com> Reviewed-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:37 +02:00			`assert(ca.drv->clean \|\| ca.drv->process);`
convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00
convert: prepare filter.<driver>.process option Refactor the existing 'single shot filter mechanism' and prepare the new 'long running filter mechanism'. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:36 +02:00			`if (!apply_filter(path, NULL, 0, fd, dst, ca.drv, CAP_CLEAN))`
convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00			`die("%s: clean filter '%s' failed", path, ca.drv->name);`

			`crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);`
			`ident_to_git(path, dst->buf, dst->len, dst, ca.ident);`
			`}`

Don't expand CRLFs when normalizing text during merge Disable CRLF expansion when convert_to_working_tree() is called from normalize_buffer(). This improves performance when merging branches with conflicting line endings when core.eol=crlf or core.autocrlf=true by making the normalization act as if core.eol=lf. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-07-02 21:20:49 +02:00			`static int convert_to_working_tree_internal(const char path, const char src,`
			`size_t len, struct strbuf *dst,`
			`int normalizing)`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`{`
Add a setting to require a filter to be successful By default, a missing filter driver or a failure from the filter driver is not an error, but merely makes the filter operation a no-op pass through. This is useful to massage the content into a shape that is more convenient for the platform, filesystem, and the user to use, and the content filter mechanism is not used to turn something unusable into usable. However, we could also use of the content filtering mechanism and store the content that cannot be directly used in the repository (e.g. a UUID that refers to the true content stored outside git, or an encrypted content) and turn it into a usable form upon checkout (e.g. download the external content, or decrypt the encrypted content). For such a use case, the content cannot be used when filter driver fails, and we need a way to tell Git to abort the whole operation for such a failing or missing filter driver. Add a new "filter.<driver>.required" configuration variable to mark the second use case. When it is set, git will abort the operation when the filter driver does not exist or exits with a non-zero status code. Signed-off-by: Jehan Bing <jehan@orb.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-17 02:19:03 +01:00			`int ret = 0, ret_filter = 0;`
convert: make it harder to screw up adding a conversion attribute The current internal API requires the callers of setup_convert_check() to supply the git_attr_check structures (hence they need to know how many to allocate), but they grab the same set of attributes for given path. Define a new convert_attrs() API that fills a higher level information that the callers (convert_to_git and convert_to_working_tree) really want, and move the common code to interact with the attributes system to it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:58:31 +02:00			`struct conv_attrs ca;`
convert.c: restructure the attribute checking part. This separates the checkattr() call and interpretation of the returned value specific to the 'crlf' attribute into separate routines, so that we can run a single call to checkattr() to check for more than one attributes, and then interprete what the returned settings mean separately. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 08:44:02 +02:00
convert: make it harder to screw up adding a conversion attribute The current internal API requires the callers of setup_convert_check() to supply the git_attr_check structures (hence they need to know how many to allocate), but they grab the same set of attributes for given path. Define a new convert_attrs() API that fills a higher level information that the callers (convert_to_git and convert_to_working_tree) really want, and move the common code to interact with the attributes system to it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:58:31 +02:00			`convert_attrs(&ca, path);`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00
convert: make it harder to screw up adding a conversion attribute The current internal API requires the callers of setup_convert_check() to supply the git_attr_check structures (hence they need to know how many to allocate), but they grab the same set of attributes for given path. Define a new convert_attrs() API that fills a higher level information that the callers (convert_to_git and convert_to_working_tree) really want, and move the common code to interact with the attributes system to it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:58:31 +02:00			`ret \|= ident_to_worktree(path, src, len, dst, ca.ident);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`if (ret) {`
			`src = dst->buf;`
			`len = dst->len;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`}`
Don't expand CRLFs when normalizing text during merge Disable CRLF expansion when convert_to_working_tree() is called from normalize_buffer(). This improves performance when merging branches with conflicting line endings when core.eol=crlf or core.autocrlf=true by making the normalization act as if core.eol=lf. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-07-02 21:20:49 +02:00			`/*`
			`* CRLF conversion can be skipped if normalizing, unless there`
convert: add filter.<driver>.process option Git's clean/smudge mechanism invokes an external filter process for every single blob that is affected by a filter. If Git filters a lot of blobs then the startup time of the external filter processes can become a significant part of the overall Git execution time. In a preliminary performance test this developer used a clean/smudge filter written in golang to filter 12,000 files. This process took 364s with the existing filter mechanism and 5s with the new mechanism. See details here: https://github.com/github/git-lfs/pull/1382 This patch adds the `filter.<driver>.process` string option which, if used, keeps the external filter process running and processes all blobs with the packet format (pkt-line) based protocol over standard input and standard output. The full protocol is explained in detail in `Documentation/gitattributes.txt`. A few key decisions: * The long running filter process is referred to as filter protocol version 2 because the existing single shot filter invocation is considered version 1. * Git sends a welcome message and expects a response right after the external filter process has started. This ensures that Git will not hang if a version 1 filter is incorrectly used with the filter.<driver>.process option for version 2 filters. In addition, Git can detect this kind of error and warn the user. * The status of a filter operation (e.g. "success" or "error) is set before the actual response and (if necessary!) re-set after the response. The advantage of this two step status response is that if the filter detects an error early, then the filter can communicate this and Git does not even need to create structures to read the response. * All status responses are pkt-line lists terminated with a flush packet. This allows us to send other status fields with the same protocol in the future. Helped-by: Martin-Louis Bright <mlbright@gmail.com> Reviewed-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:37 +02:00			`* is a smudge or process filter (even if the process filter doesn't`
			`* support smudge). The filters might expect CRLFs.`
Don't expand CRLFs when normalizing text during merge Disable CRLF expansion when convert_to_working_tree() is called from normalize_buffer(). This improves performance when merging branches with conflicting line endings when core.eol=crlf or core.autocrlf=true by making the normalization act as if core.eol=lf. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-07-02 21:20:49 +02:00			`*/`
convert: add filter.<driver>.process option Git's clean/smudge mechanism invokes an external filter process for every single blob that is affected by a filter. If Git filters a lot of blobs then the startup time of the external filter processes can become a significant part of the overall Git execution time. In a preliminary performance test this developer used a clean/smudge filter written in golang to filter 12,000 files. This process took 364s with the existing filter mechanism and 5s with the new mechanism. See details here: https://github.com/github/git-lfs/pull/1382 This patch adds the `filter.<driver>.process` string option which, if used, keeps the external filter process running and processes all blobs with the packet format (pkt-line) based protocol over standard input and standard output. The full protocol is explained in detail in `Documentation/gitattributes.txt`. A few key decisions: * The long running filter process is referred to as filter protocol version 2 because the existing single shot filter invocation is considered version 1. * Git sends a welcome message and expects a response right after the external filter process has started. This ensures that Git will not hang if a version 1 filter is incorrectly used with the filter.<driver>.process option for version 2 filters. In addition, Git can detect this kind of error and warn the user. * The status of a filter operation (e.g. "success" or "error) is set before the actual response and (if necessary!) re-set after the response. The advantage of this two step status response is that if the filter detects an error early, then the filter can communicate this and Git does not even need to create structures to read the response. * All status responses are pkt-line lists terminated with a flush packet. This allows us to send other status fields with the same protocol in the future. Helped-by: Martin-Louis Bright <mlbright@gmail.com> Reviewed-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:37 +02:00			`if ((ca.drv && (ca.drv->smudge \|\| ca.drv->process)) \|\| !normalizing) {`
convert: make it harder to screw up adding a conversion attribute The current internal API requires the callers of setup_convert_check() to supply the git_attr_check structures (hence they need to know how many to allocate), but they grab the same set of attributes for given path. Define a new convert_attrs() API that fills a higher level information that the callers (convert_to_git and convert_to_working_tree) really want, and move the common code to interact with the attributes system to it. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-09 22:58:31 +02:00			`ret \|= crlf_to_worktree(path, src, len, dst, ca.crlf_action);`
Don't expand CRLFs when normalizing text during merge Disable CRLF expansion when convert_to_working_tree() is called from normalize_buffer(). This improves performance when merging branches with conflicting line endings when core.eol=crlf or core.autocrlf=true by making the normalization act as if core.eol=lf. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-07-02 21:20:49 +02:00			`if (ret) {`
			`src = dst->buf;`
			`len = dst->len;`
			`}`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`}`
Add a setting to require a filter to be successful By default, a missing filter driver or a failure from the filter driver is not an error, but merely makes the filter operation a no-op pass through. This is useful to massage the content into a shape that is more convenient for the platform, filesystem, and the user to use, and the content filter mechanism is not used to turn something unusable into usable. However, we could also use of the content filtering mechanism and store the content that cannot be directly used in the repository (e.g. a UUID that refers to the true content stored outside git, or an encrypted content) and turn it into a usable form upon checkout (e.g. download the external content, or decrypt the encrypted content). For such a use case, the content cannot be used when filter driver fails, and we need a way to tell Git to abort the whole operation for such a failing or missing filter driver. Add a new "filter.<driver>.required" configuration variable to mark the second use case. When it is set, git will abort the operation when the filter driver does not exist or exits with a non-zero status code. Signed-off-by: Jehan Bing <jehan@orb.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-17 02:19:03 +01:00
convert: prepare filter.<driver>.process option Refactor the existing 'single shot filter mechanism' and prepare the new 'long running filter mechanism'. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:36 +02:00			`ret_filter = apply_filter(path, src, len, -1, dst, ca.drv, CAP_SMUDGE);`
			`if (!ret_filter && ca.drv && ca.drv->required)`
Add a setting to require a filter to be successful By default, a missing filter driver or a failure from the filter driver is not an error, but merely makes the filter operation a no-op pass through. This is useful to massage the content into a shape that is more convenient for the platform, filesystem, and the user to use, and the content filter mechanism is not used to turn something unusable into usable. However, we could also use of the content filtering mechanism and store the content that cannot be directly used in the repository (e.g. a UUID that refers to the true content stored outside git, or an encrypted content) and turn it into a usable form upon checkout (e.g. download the external content, or decrypt the encrypted content). For such a use case, the content cannot be used when filter driver fails, and we need a way to tell Git to abort the whole operation for such a failing or missing filter driver. Add a new "filter.<driver>.required" configuration variable to mark the second use case. When it is set, git will abort the operation when the filter driver does not exist or exits with a non-zero status code. Signed-off-by: Jehan Bing <jehan@orb.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-17 02:19:03 +01:00			`die("%s: smudge filter %s failed", path, ca.drv->name);`

			`return ret \| ret_filter;`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`}`
Avoid conflicts when merging branches with mixed normalization Currently, merging across changes in line ending normalization is painful since files containing CRLF will conflict with normalized files, even if the only difference between the two versions is the line endings. Additionally, any "real" merge conflicts that exist are obscured because every line in the file has a conflict. Assume you start out with a repo that has a lot of text files with CRLF checked in (A): o---C / \ A---B---D B: Add "* text=auto" to .gitattributes and normalize all files to LF-only C: Modify some of the text files D: Try to merge C You will get a ridiculous number of LF/CRLF conflicts when trying to merge C into D, since the repository contents for C are "wrong" wrt the new .gitattributes file. Fix ll-merge so that the "base", "theirs" and "ours" stages are passed through convert_to_worktree() and convert_to_git() before a three-way merge. This ensures that all three stages are normalized in the same way, removing from consideration differences that are only due to normalization. This feature is optional for now since it changes a low-level mechanism and is not necessary for the majority of users. The "merge.renormalize" config variable enables it. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-07-02 21:20:47 +02:00
Don't expand CRLFs when normalizing text during merge Disable CRLF expansion when convert_to_working_tree() is called from normalize_buffer(). This improves performance when merging branches with conflicting line endings when core.eol=crlf or core.autocrlf=true by making the normalization act as if core.eol=lf. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-07-02 21:20:49 +02:00			`int convert_to_working_tree(const char path, const char src, size_t len, struct strbuf *dst)`
			`{`
			`return convert_to_working_tree_internal(path, src, len, dst, 0);`
			`}`

Avoid conflicts when merging branches with mixed normalization Currently, merging across changes in line ending normalization is painful since files containing CRLF will conflict with normalized files, even if the only difference between the two versions is the line endings. Additionally, any "real" merge conflicts that exist are obscured because every line in the file has a conflict. Assume you start out with a repo that has a lot of text files with CRLF checked in (A): o---C / \ A---B---D B: Add "* text=auto" to .gitattributes and normalize all files to LF-only C: Modify some of the text files D: Try to merge C You will get a ridiculous number of LF/CRLF conflicts when trying to merge C into D, since the repository contents for C are "wrong" wrt the new .gitattributes file. Fix ll-merge so that the "base", "theirs" and "ours" stages are passed through convert_to_worktree() and convert_to_git() before a three-way merge. This ensures that all three stages are normalized in the same way, removing from consideration differences that are only due to normalization. This feature is optional for now since it changes a low-level mechanism and is not necessary for the majority of users. The "merge.renormalize" config variable enables it. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-07-02 21:20:47 +02:00			`int renormalize_buffer(const char path, const char src, size_t len, struct strbuf *dst)`
			`{`
Don't expand CRLFs when normalizing text during merge Disable CRLF expansion when convert_to_working_tree() is called from normalize_buffer(). This improves performance when merging branches with conflicting line endings when core.eol=crlf or core.autocrlf=true by making the normalization act as if core.eol=lf. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-07-02 21:20:49 +02:00			`int ret = convert_to_working_tree_internal(path, src, len, dst, 1);`
Avoid conflicts when merging branches with mixed normalization Currently, merging across changes in line ending normalization is painful since files containing CRLF will conflict with normalized files, even if the only difference between the two versions is the line endings. Additionally, any "real" merge conflicts that exist are obscured because every line in the file has a conflict. Assume you start out with a repo that has a lot of text files with CRLF checked in (A): o---C / \ A---B---D B: Add "* text=auto" to .gitattributes and normalize all files to LF-only C: Modify some of the text files D: Try to merge C You will get a ridiculous number of LF/CRLF conflicts when trying to merge C into D, since the repository contents for C are "wrong" wrt the new .gitattributes file. Fix ll-merge so that the "base", "theirs" and "ours" stages are passed through convert_to_worktree() and convert_to_git() before a three-way merge. This ensures that all three stages are normalized in the same way, removing from consideration differences that are only due to normalization. This feature is optional for now since it changes a low-level mechanism and is not necessary for the majority of users. The "merge.renormalize" config variable enables it. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-07-02 21:20:47 +02:00			`if (ret) {`
			`src = dst->buf;`
			`len = dst->len;`
			`}`
convert: unify the "auto" handling of CRLF Before this change, $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes would have the same effect as $ echo "* text" >.gitattributes $ git config core.eol crlf Since the 'eol' attribute had higher priority than 'text=auto', this may corrupt binary files and is not what most users expect to happen. Make the 'eol' attribute to obey 'text=auto' and now $ echo "* text=auto" >.gitattributes $ echo "* eol=crlf" >>.gitattributes behaves the same as $ echo "* text=auto" >.gitattributes $ git config core.eol crlf In other words, $ echo "* text=auto eol=crlf" >.gitattributes has the same effect as $ git config core.autocrlf true and $ echo "* text=auto eol=lf" >.gitattributes has the same effect as $ git config core.autocrlf input Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-06-28 10:01:13 +02:00			`return ret \| convert_to_git(path, src, len, dst, SAFE_CRLF_RENORMALIZE);`
Avoid conflicts when merging branches with mixed normalization Currently, merging across changes in line ending normalization is painful since files containing CRLF will conflict with normalized files, even if the only difference between the two versions is the line endings. Additionally, any "real" merge conflicts that exist are obscured because every line in the file has a conflict. Assume you start out with a repo that has a lot of text files with CRLF checked in (A): o---C / \ A---B---D B: Add "* text=auto" to .gitattributes and normalize all files to LF-only C: Modify some of the text files D: Try to merge C You will get a ridiculous number of LF/CRLF conflicts when trying to merge C into D, since the repository contents for C are "wrong" wrt the new .gitattributes file. Fix ll-merge so that the "base", "theirs" and "ours" stages are passed through convert_to_worktree() and convert_to_git() before a three-way merge. This ensures that all three stages are normalized in the same way, removing from consideration differences that are only due to normalization. This feature is optional for now since it changes a low-level mechanism and is not necessary for the majority of users. The "merge.renormalize" config variable enables it. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-07-02 21:20:47 +02:00			`}`
streaming_write_entry(): use streaming API in write_entry() When the output to a path does not have to be converted, we can read from the object database from the streaming API and write to the file in the working tree, without having to hold everything in the memory. The ident, auto- and safe- crlf conversions inherently require you to read the whole thing before deciding what to do, so while it is technically possible to support them by using a buffer of an unbound size or rewinding and reading the stream twice, it is less practical than the traditional "read the whole thing in core and convert" approach. Adding streaming filters for the other conversions on top of this should be doable by tweaking the can_bypass_conversion() function (it should be renamed to can_filter_stream() when it happens). Then the streaming API can be extended to wrap the git_istream streaming_write_entry() opens on the underlying object in another git_istream that reads from it, filters what is read, and let the streaming_write_entry() read the filtered result. But that is outside the scope of this series. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-12 23:31:08 +02:00
Add streaming filter API This introduces an API to plug custom filters to an input stream. The caller gets get_stream_filter("path") to obtain an appropriate filter for the path, and then uses it when opening an input stream via open_istream(). After that, the caller can read from the stream with read_istream(), and close it with close_istream(), just like an unfiltered stream. This only adds a "null" filter that is a pass-thru filter, but later changes can add LF-to-CRLF and other filters, and the callers of the streaming API do not have to change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-20 23:33:31 +02:00			`/*****************************************************************`
			`*`
typofix: in-code comments Signed-off-by: Ondřej Bílka <neleai@seznam.cz> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-07-22 23:02:23 +02:00			`* Streaming conversion support`
Add streaming filter API This introduces an API to plug custom filters to an input stream. The caller gets get_stream_filter("path") to obtain an appropriate filter for the path, and then uses it when opening an input stream via open_istream(). After that, the caller can read from the stream with read_istream(), and close it with close_istream(), just like an unfiltered stream. This only adds a "null" filter that is a pass-thru filter, but later changes can add LF-to-CRLF and other filters, and the callers of the streaming API do not have to change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-20 23:33:31 +02:00			`*`
			`*****************************************************************/`

			`typedef int (filter_fn)(struct stream_filter ,`
			`const char input, size_t isize_p,`
			`char output, size_t osize_p);`
			`typedef void (free_fn)(struct stream_filter );`

			`struct stream_filter_vtbl {`
			`filter_fn filter;`
			`free_fn free;`
			`};`

			`struct stream_filter {`
			`struct stream_filter_vtbl *vtbl;`
			`};`

			`static int null_filter_fn(struct stream_filter *filter,`
			`const char input, size_t isize_p,`
			`char output, size_t osize_p)`
			`{`
stream filter: add "no more input" to the filters Some filters may need to buffer the input and look-ahead inside it to decide what to output, and they may consume more than zero bytes of input and still not produce any output. After feeding all the input, pass NULL as input as keep calling stream_filter() to let such filters know there is no more input coming, and it is time for them to produce the remaining output based on the buffered input. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 23:05:51 +02:00			`size_t count;`

			`if (!input)`
			`return 0; /* we do not keep any states */`
			`count = *isize_p;`
Add streaming filter API This introduces an API to plug custom filters to an input stream. The caller gets get_stream_filter("path") to obtain an appropriate filter for the path, and then uses it when opening an input stream via open_istream(). After that, the caller can read from the stream with read_istream(), and close it with close_istream(), just like an unfiltered stream. This only adds a "null" filter that is a pass-thru filter, but later changes can add LF-to-CRLF and other filters, and the callers of the streaming API do not have to change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-20 23:33:31 +02:00			`if (*osize_p < count)`
			`count = *osize_p;`
			`if (count) {`
			`memmove(output, input, count);`
			`*isize_p -= count;`
			`*osize_p -= count;`
			`}`
			`return 0;`
			`}`

			`static void null_free_fn(struct stream_filter *filter)`
			`{`
			`; /* nothing -- null instances are shared */`
			`}`

			`static struct stream_filter_vtbl null_vtbl = {`
			`null_filter_fn,`
			`null_free_fn,`
			`};`

			`static struct stream_filter null_filter_singleton = {`
			`&null_vtbl,`
			`};`

			`int is_null_stream_filter(struct stream_filter *filter)`
			`{`
			`return filter == &null_filter_singleton;`
			`}`

streaming filter: ident filter Add support for "ident" filter on the output codepath. This does not work with lf-to-crlf filter together (yet). Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 03:28:00 +02:00
			`/*`
			`* LF-to-CRLF filter`
			`*/`
convert: track state in LF-to-CRLF filter There may not be enough space to store CRLF in the output. If we don't fill the buffer, then the filter will keep getting called with the same short buffer and will loop forever. Instead, always store the CR and record whether there's a missing LF if so we store it in the output buffer the next time the function gets called. Reported-by: Henrik Grubbström <grubba@roxen.com> Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-28 11:48:12 +01:00
			`struct lf_to_crlf_filter {`
			`struct stream_filter filter;`
lf_to_crlf_filter(): resurrect CRLF->CRLF hack The non-streaming version of the filter counts CRLF and LF in the whole buffer, and returns without doing anything when they match (i.e. what is recorded in the object store already uses CRLF). This was done to help people who added files from the DOS world before realizing they want to go cross platform and adding .gitattributes to tell Git that they only want CRLF in their working tree. The streaming version of the filter does not want to read the whole thing before starting to work, as that defeats the whole point of streaming. So we instead check what byte follows CR whenever we see one, and add CR before LF only when the LF does not immediately follow CR already to keep CRLF as is. Reported-and-tested-by: Ralf Thielow Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-17 00:44:18 +01:00			`unsigned has_held:1;`
			`char held;`
convert: track state in LF-to-CRLF filter There may not be enough space to store CRLF in the output. If we don't fill the buffer, then the filter will keep getting called with the same short buffer and will loop forever. Instead, always store the CR and record whether there's a missing LF if so we store it in the output buffer the next time the function gets called. Reported-by: Henrik Grubbström <grubba@roxen.com> Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-28 11:48:12 +01:00			`};`

Add LF-to-CRLF streaming conversion If we do not have to guess or validate by scanning the input, we can just stream this through. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 01:47:56 +02:00			`static int lf_to_crlf_filter_fn(struct stream_filter *filter,`
			`const char input, size_t isize_p,`
			`char output, size_t osize_p)`
			`{`
convert: track state in LF-to-CRLF filter There may not be enough space to store CRLF in the output. If we don't fill the buffer, then the filter will keep getting called with the same short buffer and will loop forever. Instead, always store the CR and record whether there's a missing LF if so we store it in the output buffer the next time the function gets called. Reported-by: Henrik Grubbström <grubba@roxen.com> Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-28 11:48:12 +01:00			`size_t count, o = 0;`
			`struct lf_to_crlf_filter lf_to_crlf = (struct lf_to_crlf_filter )filter;`

lf_to_crlf_filter(): resurrect CRLF->CRLF hack The non-streaming version of the filter counts CRLF and LF in the whole buffer, and returns without doing anything when they match (i.e. what is recorded in the object store already uses CRLF). This was done to help people who added files from the DOS world before realizing they want to go cross platform and adding .gitattributes to tell Git that they only want CRLF in their working tree. The streaming version of the filter does not want to read the whole thing before starting to work, as that defeats the whole point of streaming. So we instead check what byte follows CR whenever we see one, and add CR before LF only when the LF does not immediately follow CR already to keep CRLF as is. Reported-and-tested-by: Ralf Thielow Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-17 00:44:18 +01:00			`/*`
			`* We may be holding onto the CR to see if it is followed by a`
			`* LF, in which case we would need to go to the main loop.`
			`* Otherwise, just emit it to the output stream.`
			`*/`
			`if (lf_to_crlf->has_held && (lf_to_crlf->held != '\r' \|\| !input)) {`
			`output[o++] = lf_to_crlf->held;`
			`lf_to_crlf->has_held = 0;`
convert: track state in LF-to-CRLF filter There may not be enough space to store CRLF in the output. If we don't fill the buffer, then the filter will keep getting called with the same short buffer and will loop forever. Instead, always store the CR and record whether there's a missing LF if so we store it in the output buffer the next time the function gets called. Reported-by: Henrik Grubbström <grubba@roxen.com> Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-28 11:48:12 +01:00			`}`
Add LF-to-CRLF streaming conversion If we do not have to guess or validate by scanning the input, we can just stream this through. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 01:47:56 +02:00
lf_to_crlf_filter(): tell the caller we added "\n" when draining This can only happen when the input size is multiple of the buffer size of the cascade filter (16k) and ends with an LF, but in such a case, the code forgot to tell the caller that it added the "\n" it could not add during the last round. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-16 23:39:37 +01:00			`/* We are told to drain */`
			`if (!input) {`
			`*osize_p -= o;`
			`return 0;`
			`}`
Add LF-to-CRLF streaming conversion If we do not have to guess or validate by scanning the input, we can just stream this through. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 01:47:56 +02:00
			`count = *isize_p;`
lf_to_crlf_filter(): resurrect CRLF->CRLF hack The non-streaming version of the filter counts CRLF and LF in the whole buffer, and returns without doing anything when they match (i.e. what is recorded in the object store already uses CRLF). This was done to help people who added files from the DOS world before realizing they want to go cross platform and adding .gitattributes to tell Git that they only want CRLF in their working tree. The streaming version of the filter does not want to read the whole thing before starting to work, as that defeats the whole point of streaming. So we instead check what byte follows CR whenever we see one, and add CR before LF only when the LF does not immediately follow CR already to keep CRLF as is. Reported-and-tested-by: Ralf Thielow Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-17 00:44:18 +01:00			`if (count \|\| lf_to_crlf->has_held) {`
convert: track state in LF-to-CRLF filter There may not be enough space to store CRLF in the output. If we don't fill the buffer, then the filter will keep getting called with the same short buffer and will loop forever. Instead, always store the CR and record whether there's a missing LF if so we store it in the output buffer the next time the function gets called. Reported-by: Henrik Grubbström <grubba@roxen.com> Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-28 11:48:12 +01:00			`size_t i;`
lf_to_crlf_filter(): resurrect CRLF->CRLF hack The non-streaming version of the filter counts CRLF and LF in the whole buffer, and returns without doing anything when they match (i.e. what is recorded in the object store already uses CRLF). This was done to help people who added files from the DOS world before realizing they want to go cross platform and adding .gitattributes to tell Git that they only want CRLF in their working tree. The streaming version of the filter does not want to read the whole thing before starting to work, as that defeats the whole point of streaming. So we instead check what byte follows CR whenever we see one, and add CR before LF only when the LF does not immediately follow CR already to keep CRLF as is. Reported-and-tested-by: Ralf Thielow Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-17 00:44:18 +01:00			`int was_cr = 0;`

			`if (lf_to_crlf->has_held) {`
			`was_cr = 1;`
			`lf_to_crlf->has_held = 0;`
			`}`

convert: track state in LF-to-CRLF filter There may not be enough space to store CRLF in the output. If we don't fill the buffer, then the filter will keep getting called with the same short buffer and will loop forever. Instead, always store the CR and record whether there's a missing LF if so we store it in the output buffer the next time the function gets called. Reported-by: Henrik Grubbström <grubba@roxen.com> Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-28 11:48:12 +01:00			`for (i = 0; o < *osize_p && i < count; i++) {`
Add LF-to-CRLF streaming conversion If we do not have to guess or validate by scanning the input, we can just stream this through. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 01:47:56 +02:00			`char ch = input[i];`
lf_to_crlf_filter(): resurrect CRLF->CRLF hack The non-streaming version of the filter counts CRLF and LF in the whole buffer, and returns without doing anything when they match (i.e. what is recorded in the object store already uses CRLF). This was done to help people who added files from the DOS world before realizing they want to go cross platform and adding .gitattributes to tell Git that they only want CRLF in their working tree. The streaming version of the filter does not want to read the whole thing before starting to work, as that defeats the whole point of streaming. So we instead check what byte follows CR whenever we see one, and add CR before LF only when the LF does not immediately follow CR already to keep CRLF as is. Reported-and-tested-by: Ralf Thielow Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-17 00:44:18 +01:00
Add LF-to-CRLF streaming conversion If we do not have to guess or validate by scanning the input, we can just stream this through. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 01:47:56 +02:00			`if (ch == '\n') {`
convert: track state in LF-to-CRLF filter There may not be enough space to store CRLF in the output. If we don't fill the buffer, then the filter will keep getting called with the same short buffer and will loop forever. Instead, always store the CR and record whether there's a missing LF if so we store it in the output buffer the next time the function gets called. Reported-by: Henrik Grubbström <grubba@roxen.com> Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-28 11:48:12 +01:00			`output[o++] = '\r';`
lf_to_crlf_filter(): resurrect CRLF->CRLF hack The non-streaming version of the filter counts CRLF and LF in the whole buffer, and returns without doing anything when they match (i.e. what is recorded in the object store already uses CRLF). This was done to help people who added files from the DOS world before realizing they want to go cross platform and adding .gitattributes to tell Git that they only want CRLF in their working tree. The streaming version of the filter does not want to read the whole thing before starting to work, as that defeats the whole point of streaming. So we instead check what byte follows CR whenever we see one, and add CR before LF only when the LF does not immediately follow CR already to keep CRLF as is. Reported-and-tested-by: Ralf Thielow Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-17 00:44:18 +01:00			`} else if (was_cr) {`
			`/*`
			`* Previous round saw CR and it is not followed`
			`* by a LF; emit the CR before processing the`
			`* current character.`
			`*/`
			`output[o++] = '\r';`
Add LF-to-CRLF streaming conversion If we do not have to guess or validate by scanning the input, we can just stream this through. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 01:47:56 +02:00			`}`
lf_to_crlf_filter(): resurrect CRLF->CRLF hack The non-streaming version of the filter counts CRLF and LF in the whole buffer, and returns without doing anything when they match (i.e. what is recorded in the object store already uses CRLF). This was done to help people who added files from the DOS world before realizing they want to go cross platform and adding .gitattributes to tell Git that they only want CRLF in their working tree. The streaming version of the filter does not want to read the whole thing before starting to work, as that defeats the whole point of streaming. So we instead check what byte follows CR whenever we see one, and add CR before LF only when the LF does not immediately follow CR already to keep CRLF as is. Reported-and-tested-by: Ralf Thielow Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-17 00:44:18 +01:00
			`/*`
			`* We may have consumed the last output slot,`
			`* in which case we need to break out of this`
			`* loop; hold the current character before`
			`* returning.`
			`*/`
			`if (*osize_p <= o) {`
			`lf_to_crlf->has_held = 1;`
			`lf_to_crlf->held = ch;`
			`continue; /* break but increment i */`
			`}`

			`if (ch == '\r') {`
			`was_cr = 1;`
			`continue;`
			`}`

			`was_cr = 0;`
Add LF-to-CRLF streaming conversion If we do not have to guess or validate by scanning the input, we can just stream this through. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 01:47:56 +02:00			`output[o++] = ch;`
			`}`

			`*osize_p -= o;`
			`*isize_p -= i;`
lf_to_crlf_filter(): resurrect CRLF->CRLF hack The non-streaming version of the filter counts CRLF and LF in the whole buffer, and returns without doing anything when they match (i.e. what is recorded in the object store already uses CRLF). This was done to help people who added files from the DOS world before realizing they want to go cross platform and adding .gitattributes to tell Git that they only want CRLF in their working tree. The streaming version of the filter does not want to read the whole thing before starting to work, as that defeats the whole point of streaming. So we instead check what byte follows CR whenever we see one, and add CR before LF only when the LF does not immediately follow CR already to keep CRLF as is. Reported-and-tested-by: Ralf Thielow Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-17 00:44:18 +01:00
			`if (!lf_to_crlf->has_held && was_cr) {`
			`lf_to_crlf->has_held = 1;`
			`lf_to_crlf->held = '\r';`
			`}`
Add LF-to-CRLF streaming conversion If we do not have to guess or validate by scanning the input, we can just stream this through. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 01:47:56 +02:00			`}`
			`return 0;`
			`}`

convert: track state in LF-to-CRLF filter There may not be enough space to store CRLF in the output. If we don't fill the buffer, then the filter will keep getting called with the same short buffer and will loop forever. Instead, always store the CR and record whether there's a missing LF if so we store it in the output buffer the next time the function gets called. Reported-by: Henrik Grubbström <grubba@roxen.com> Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-28 11:48:12 +01:00			`static void lf_to_crlf_free_fn(struct stream_filter *filter)`
			`{`
			`free(filter);`
			`}`

Add LF-to-CRLF streaming conversion If we do not have to guess or validate by scanning the input, we can just stream this through. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 01:47:56 +02:00			`static struct stream_filter_vtbl lf_to_crlf_vtbl = {`
			`lf_to_crlf_filter_fn,`
convert: track state in LF-to-CRLF filter There may not be enough space to store CRLF in the output. If we don't fill the buffer, then the filter will keep getting called with the same short buffer and will loop forever. Instead, always store the CR and record whether there's a missing LF if so we store it in the output buffer the next time the function gets called. Reported-by: Henrik Grubbström <grubba@roxen.com> Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-28 11:48:12 +01:00			`lf_to_crlf_free_fn,`
Add LF-to-CRLF streaming conversion If we do not have to guess or validate by scanning the input, we can just stream this through. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 01:47:56 +02:00			`};`

convert: track state in LF-to-CRLF filter There may not be enough space to store CRLF in the output. If we don't fill the buffer, then the filter will keep getting called with the same short buffer and will loop forever. Instead, always store the CR and record whether there's a missing LF if so we store it in the output buffer the next time the function gets called. Reported-by: Henrik Grubbström <grubba@roxen.com> Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-28 11:48:12 +01:00			`static struct stream_filter *lf_to_crlf_filter(void)`
			`{`
lf_to_crlf_filter(): tell the caller we added "\n" when draining This can only happen when the input size is multiple of the buffer size of the cascade filter (16k) and ends with an LF, but in such a case, the code forgot to tell the caller that it added the "\n" it could not add during the last round. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-12-16 23:39:37 +01:00			`struct lf_to_crlf_filter lf_to_crlf = xcalloc(1, sizeof(lf_to_crlf));`
Add LF-to-CRLF streaming conversion If we do not have to guess or validate by scanning the input, we can just stream this through. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 01:47:56 +02:00
convert: track state in LF-to-CRLF filter There may not be enough space to store CRLF in the output. If we don't fill the buffer, then the filter will keep getting called with the same short buffer and will loop forever. Instead, always store the CR and record whether there's a missing LF if so we store it in the output buffer the next time the function gets called. Reported-by: Henrik Grubbström <grubba@roxen.com> Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-28 11:48:12 +01:00			`lf_to_crlf->filter.vtbl = &lf_to_crlf_vtbl;`
			`return (struct stream_filter *)lf_to_crlf;`
			`}`
streaming filter: ident filter Add support for "ident" filter on the output codepath. This does not work with lf-to-crlf filter together (yet). Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 03:28:00 +02:00
streaming: filter cascading This implements an internal "cascade" filter mechanism that plugs two filters in series. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-22 03:28:41 +02:00			`/*`
			`* Cascade filter`
			`*/`
			`#define FILTER_BUFFER 1024`
			`struct cascade_filter {`
			`struct stream_filter filter;`
			`struct stream_filter *one;`
			`struct stream_filter *two;`
			`char buf[FILTER_BUFFER];`
			`int end, ptr;`
			`};`

			`static int cascade_filter_fn(struct stream_filter *filter,`
			`const char input, size_t isize_p,`
			`char output, size_t osize_p)`
			`{`
			`struct cascade_filter cas = (struct cascade_filter ) filter;`
			`size_t filled = 0;`
			`size_t sz = *osize_p;`
			`size_t to_feed, remaining;`

			`/*`
			`* input -- (one) --> buf -- (two) --> output`
			`*/`
			`while (filled < sz) {`
			`remaining = sz - filled;`

			`/* do we already have something to feed two with? */`
			`if (cas->ptr < cas->end) {`
			`to_feed = cas->end - cas->ptr;`
			`if (stream_filter(cas->two,`
			`cas->buf + cas->ptr, &to_feed,`
			`output + filled, &remaining))`
			`return -1;`
			`cas->ptr += (cas->end - cas->ptr) - to_feed;`
			`filled = sz - remaining;`
			`continue;`
			`}`

			`/* feed one from upstream and have it emit into our buffer */`
			`to_feed = input ? *isize_p : 0;`
			`if (input && !to_feed)`
			`break;`
			`remaining = sizeof(cas->buf);`
			`if (stream_filter(cas->one,`
			`input, &to_feed,`
			`cas->buf, &remaining))`
			`return -1;`
			`cas->end = sizeof(cas->buf) - remaining;`
			`cas->ptr = 0;`
			`if (input) {`
			`size_t fed = *isize_p - to_feed;`
			`*isize_p -= fed;`
			`input += fed;`
			`}`

			`/* do we know that we drained one completely? */`
			`if (input \|\| cas->end)`
			`continue;`

			`/* tell two to drain; we have nothing more to give it */`
			`to_feed = 0;`
			`remaining = sz - filled;`
			`if (stream_filter(cas->two,`
			`NULL, &to_feed,`
			`output + filled, &remaining))`
			`return -1;`
			`if (remaining == (sz - filled))`
			`break; /* completely drained two */`
			`filled = sz - remaining;`
			`}`
			`*osize_p -= filled;`
			`return 0;`
			`}`

			`static void cascade_free_fn(struct stream_filter *filter)`
			`{`
			`struct cascade_filter cas = (struct cascade_filter )filter;`
			`free_stream_filter(cas->one);`
			`free_stream_filter(cas->two);`
			`free(filter);`
			`}`

			`static struct stream_filter_vtbl cascade_vtbl = {`
			`cascade_filter_fn,`
			`cascade_free_fn,`
			`};`

			`static struct stream_filter cascade_filter(struct stream_filter one,`
			`struct stream_filter *two)`
			`{`
			`struct cascade_filter *cascade;`

			`if (!one \|\| is_null_stream_filter(one))`
			`return two;`
			`if (!two \|\| is_null_stream_filter(two))`
			`return one;`

			`cascade = xmalloc(sizeof(*cascade));`
			`cascade->one = one;`
			`cascade->two = two;`
			`cascade->end = cascade->ptr = 0;`
			`cascade->filter.vtbl = &cascade_vtbl;`
			`return (struct stream_filter *)cascade;`
			`}`

streaming filter: ident filter Add support for "ident" filter on the output codepath. This does not work with lf-to-crlf filter together (yet). Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 03:28:00 +02:00			`/*`
			`* ident filter`
			`*/`
			`#define IDENT_DRAINING (-1)`
			`#define IDENT_SKIPPING (-2)`
			`struct ident_filter {`
			`struct stream_filter filter;`
			`struct strbuf left;`
			`int state;`
			`char ident[45]; /* ": x40 $" */`
			`};`

			`static int is_foreign_ident(const char *str)`
			`{`
			`int i;`

use skip_prefix to avoid magic numbers It's a common idiom to match a prefix and then skip past it with a magic number, like: if (starts_with(foo, "bar")) foo += 3; This is easy to get wrong, since you have to count the prefix string yourself, and there's no compiler check if the string changes. We can use skip_prefix to avoid the magic numbers here. Note that some of these conversions could be much shorter. For example: if (starts_with(arg, "--foo=")) { bar = arg + 6; continue; } could become: if (skip_prefix(arg, "--foo=", &bar)) continue; However, I have left it as: if (skip_prefix(arg, "--foo=", &v)) { bar = v; continue; } to visually match nearby cases which need to actually process the string. Like: if (skip_prefix(arg, "--foo=", &v)) { bar = atoi(v); continue; } Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-06-18 21:47:50 +02:00			`if (!skip_prefix(str, "$Id: ", &str))`
streaming filter: ident filter Add support for "ident" filter on the output codepath. This does not work with lf-to-crlf filter together (yet). Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 03:28:00 +02:00			`return 0;`
use skip_prefix to avoid magic numbers It's a common idiom to match a prefix and then skip past it with a magic number, like: if (starts_with(foo, "bar")) foo += 3; This is easy to get wrong, since you have to count the prefix string yourself, and there's no compiler check if the string changes. We can use skip_prefix to avoid the magic numbers here. Note that some of these conversions could be much shorter. For example: if (starts_with(arg, "--foo=")) { bar = arg + 6; continue; } could become: if (skip_prefix(arg, "--foo=", &bar)) continue; However, I have left it as: if (skip_prefix(arg, "--foo=", &v)) { bar = v; continue; } to visually match nearby cases which need to actually process the string. Like: if (skip_prefix(arg, "--foo=", &v)) { bar = atoi(v); continue; } Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-06-18 21:47:50 +02:00			`for (i = 0; str[i]; i++) {`
streaming filter: ident filter Add support for "ident" filter on the output codepath. This does not work with lf-to-crlf filter together (yet). Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 03:28:00 +02:00			`if (isspace(str[i]) && str[i+1] != '$')`
			`return 1;`
			`}`
			`return 0;`
			`}`

			`static void ident_drain(struct ident_filter ident, char output_p, size_t osize_p)`
			`{`
			`size_t to_drain = ident->left.len;`

			`if (*osize_p < to_drain)`
			`to_drain = *osize_p;`
			`if (to_drain) {`
			`memcpy(*output_p, ident->left.buf, to_drain);`
			`strbuf_remove(&ident->left, 0, to_drain);`
			`*output_p += to_drain;`
			`*osize_p -= to_drain;`
			`}`
			`if (!ident->left.len)`
			`ident->state = 0;`
			`}`

			`static int ident_filter_fn(struct stream_filter *filter,`
			`const char input, size_t isize_p,`
			`char output, size_t osize_p)`
			`{`
			`struct ident_filter ident = (struct ident_filter )filter;`
			`static const char head[] = "$Id";`

			`if (!input) {`
			`/* drain upon eof */`
			`switch (ident->state) {`
			`default:`
			`strbuf_add(&ident->left, head, ident->state);`
			`case IDENT_SKIPPING:`
			`/* fallthru */`
			`case IDENT_DRAINING:`
			`ident_drain(ident, &output, osize_p);`
			`}`
			`return 0;`
			`}`

			`while (*isize_p \|\| (ident->state == IDENT_DRAINING)) {`
			`int ch;`

			`if (ident->state == IDENT_DRAINING) {`
			`ident_drain(ident, &output, osize_p);`
			`if (!*osize_p)`
			`break;`
			`continue;`
			`}`

			`ch = *(input++);`
			`(*isize_p)--;`

			`if (ident->state == IDENT_SKIPPING) {`
			`/*`
			`* Skipping until '$' or LF, but keeping them`
			`* in case it is a foreign ident.`
			`*/`
			`strbuf_addch(&ident->left, ch);`
			`if (ch != '\n' && ch != '$')`
			`continue;`
			`if (ch == '$' && !is_foreign_ident(ident->left.buf)) {`
			`strbuf_setlen(&ident->left, sizeof(head) - 1);`
			`strbuf_addstr(&ident->left, ident->ident);`
			`}`
			`ident->state = IDENT_DRAINING;`
			`continue;`
			`}`

			`if (ident->state < sizeof(head) &&`
			`head[ident->state] == ch) {`
			`ident->state++;`
			`continue;`
			`}`

			`if (ident->state)`
			`strbuf_add(&ident->left, head, ident->state);`
			`if (ident->state == sizeof(head) - 1) {`
			`if (ch != ':' && ch != '$') {`
			`strbuf_addch(&ident->left, ch);`
			`ident->state = 0;`
			`continue;`
			`}`

			`if (ch == ':') {`
			`strbuf_addch(&ident->left, ch);`
			`ident->state = IDENT_SKIPPING;`
			`} else {`
			`strbuf_addstr(&ident->left, ident->ident);`
			`ident->state = IDENT_DRAINING;`
			`}`
			`continue;`
			`}`

			`strbuf_addch(&ident->left, ch);`
			`ident->state = IDENT_DRAINING;`
			`}`
			`return 0;`
			`}`

			`static void ident_free_fn(struct stream_filter *filter)`
			`{`
			`struct ident_filter ident = (struct ident_filter )filter;`
			`strbuf_release(&ident->left);`
			`free(filter);`
			`}`

			`static struct stream_filter_vtbl ident_vtbl = {`
			`ident_filter_fn,`
			`ident_free_fn,`
			`};`

			`static struct stream_filter ident_filter(const unsigned char sha1)`
			`{`
			`struct ident_filter ident = xmalloc(sizeof(ident));`

convert trivial sprintf / strcpy calls to xsnprintf We sometimes sprintf into fixed-size buffers when we know that the buffer is large enough to fit the input (either because it's a constant, or because it's numeric input that is bounded in size). Likewise with strcpy of constant strings. However, these sites make it hard to audit sprintf and strcpy calls for buffer overflows, as a reader has to cross-reference the size of the array with the input. Let's use xsnprintf instead, which communicates to a reader that we don't expect this to overflow (and catches the mistake in case we do). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-09-24 23:06:08 +02:00			`xsnprintf(ident->ident, sizeof(ident->ident),`
			`": %s $", sha1_to_hex(sha1));`
streaming filter: ident filter Add support for "ident" filter on the output codepath. This does not work with lf-to-crlf filter together (yet). Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 03:28:00 +02:00			`strbuf_init(&ident->left, 0);`
			`ident->filter.vtbl = &ident_vtbl;`
			`ident->state = 0;`
			`return (struct stream_filter *)ident;`
			`}`

streaming_write_entry(): use streaming API in write_entry() When the output to a path does not have to be converted, we can read from the object database from the streaming API and write to the file in the working tree, without having to hold everything in the memory. The ident, auto- and safe- crlf conversions inherently require you to read the whole thing before deciding what to do, so while it is technically possible to support them by using a buffer of an unbound size or rewinding and reading the stream twice, it is less practical than the traditional "read the whole thing in core and convert" approach. Adding streaming filters for the other conversions on top of this should be doable by tweaking the can_bypass_conversion() function (it should be renamed to can_filter_stream() when it happens). Then the streaming API can be extended to wrap the git_istream streaming_write_entry() opens on the underlying object in another git_istream that reads from it, filters what is read, and let the streaming_write_entry() read the filtered result. But that is outside the scope of this series. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-12 23:31:08 +02:00			`/*`
Add streaming filter API This introduces an API to plug custom filters to an input stream. The caller gets get_stream_filter("path") to obtain an appropriate filter for the path, and then uses it when opening an input stream via open_istream(). After that, the caller can read from the stream with read_istream(), and close it with close_istream(), just like an unfiltered stream. This only adds a "null" filter that is a pass-thru filter, but later changes can add LF-to-CRLF and other filters, and the callers of the streaming API do not have to change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-20 23:33:31 +02:00			`* Return an appropriately constructed filter for the path, or NULL if`
			`* the contents cannot be filtered without reading the whole thing`
			`* in-core.`
			`*`
			`* Note that you would be crazy to set CRLF, smuge/clean or ident to a`
			`* large binary blob you would want us not to slurp into the memory!`
streaming_write_entry(): use streaming API in write_entry() When the output to a path does not have to be converted, we can read from the object database from the streaming API and write to the file in the working tree, without having to hold everything in the memory. The ident, auto- and safe- crlf conversions inherently require you to read the whole thing before deciding what to do, so while it is technically possible to support them by using a buffer of an unbound size or rewinding and reading the stream twice, it is less practical than the traditional "read the whole thing in core and convert" approach. Adding streaming filters for the other conversions on top of this should be doable by tweaking the can_bypass_conversion() function (it should be renamed to can_filter_stream() when it happens). Then the streaming API can be extended to wrap the git_istream streaming_write_entry() opens on the underlying object in another git_istream that reads from it, filters what is read, and let the streaming_write_entry() read the filtered result. But that is outside the scope of this series. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-12 23:31:08 +02:00			`*/`
Add streaming filter API This introduces an API to plug custom filters to an input stream. The caller gets get_stream_filter("path") to obtain an appropriate filter for the path, and then uses it when opening an input stream via open_istream(). After that, the caller can read from the stream with read_istream(), and close it with close_istream(), just like an unfiltered stream. This only adds a "null" filter that is a pass-thru filter, but later changes can add LF-to-CRLF and other filters, and the callers of the streaming API do not have to change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-20 23:33:31 +02:00			`struct stream_filter get_stream_filter(const char path, const unsigned char *sha1)`
streaming_write_entry(): use streaming API in write_entry() When the output to a path does not have to be converted, we can read from the object database from the streaming API and write to the file in the working tree, without having to hold everything in the memory. The ident, auto- and safe- crlf conversions inherently require you to read the whole thing before deciding what to do, so while it is technically possible to support them by using a buffer of an unbound size or rewinding and reading the stream twice, it is less practical than the traditional "read the whole thing in core and convert" approach. Adding streaming filters for the other conversions on top of this should be doable by tweaking the can_bypass_conversion() function (it should be renamed to can_filter_stream() when it happens). Then the streaming API can be extended to wrap the git_istream streaming_write_entry() opens on the underlying object in another git_istream that reads from it, filters what is read, and let the streaming_write_entry() read the filtered result. But that is outside the scope of this series. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-12 23:31:08 +02:00			`{`
			`struct conv_attrs ca;`
streaming filter: ident filter Add support for "ident" filter on the output codepath. This does not work with lf-to-crlf filter together (yet). Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 03:28:00 +02:00			`struct stream_filter *filter = NULL;`
streaming_write_entry(): use streaming API in write_entry() When the output to a path does not have to be converted, we can read from the object database from the streaming API and write to the file in the working tree, without having to hold everything in the memory. The ident, auto- and safe- crlf conversions inherently require you to read the whole thing before deciding what to do, so while it is technically possible to support them by using a buffer of an unbound size or rewinding and reading the stream twice, it is less practical than the traditional "read the whole thing in core and convert" approach. Adding streaming filters for the other conversions on top of this should be doable by tweaking the can_bypass_conversion() function (it should be renamed to can_filter_stream() when it happens). Then the streaming API can be extended to wrap the git_istream streaming_write_entry() opens on the underlying object in another git_istream that reads from it, filters what is read, and let the streaming_write_entry() read the filtered result. But that is outside the scope of this series. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-12 23:31:08 +02:00
			`convert_attrs(&ca, path);`
convert: add filter.<driver>.process option Git's clean/smudge mechanism invokes an external filter process for every single blob that is affected by a filter. If Git filters a lot of blobs then the startup time of the external filter processes can become a significant part of the overall Git execution time. In a preliminary performance test this developer used a clean/smudge filter written in golang to filter 12,000 files. This process took 364s with the existing filter mechanism and 5s with the new mechanism. See details here: https://github.com/github/git-lfs/pull/1382 This patch adds the `filter.<driver>.process` string option which, if used, keeps the external filter process running and processes all blobs with the packet format (pkt-line) based protocol over standard input and standard output. The full protocol is explained in detail in `Documentation/gitattributes.txt`. A few key decisions: * The long running filter process is referred to as filter protocol version 2 because the existing single shot filter invocation is considered version 1. * Git sends a welcome message and expects a response right after the external filter process has started. This ensures that Git will not hang if a version 1 filter is incorrectly used with the filter.<driver>.process option for version 2 filters. In addition, Git can detect this kind of error and warn the user. * The status of a filter operation (e.g. "success" or "error) is set before the actual response and (if necessary!) re-set after the response. The advantage of this two step status response is that if the filter detects an error early, then the filter can communicate this and Git does not even need to create structures to read the response. * All status responses are pkt-line lists terminated with a flush packet. This allows us to send other status fields with the same protocol in the future. Helped-by: Martin-Louis Bright <mlbright@gmail.com> Reviewed-by: Jakub Narebski <jnareb@gmail.com> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-10-17 01:20:37 +02:00			`if (ca.drv && (ca.drv->process \|\| ca.drv->smudge \|\| ca.drv->clean))`
convert.c: ident + core.autocrlf didn't work When the ident attributes is set, get_stream_filter() did not obey core.autocrlf=true, and the file was checked out with LF. Change the rule when a streaming filter can be used: - if an external filter is specified, don't use a stream filter. - if the worktree eol is CRLF and "auto" is active, don't use a stream filter. - Otherwise the stream filter can be used. Add test cases in t0027. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-04-25 18:56:32 +02:00			`return NULL;`

			`if (ca.crlf_action == CRLF_AUTO \|\| ca.crlf_action == CRLF_AUTO_CRLF)`
			`return NULL;`
streaming filter: ident filter Add support for "ident" filter on the output codepath. This does not work with lf-to-crlf filter together (yet). Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 03:28:00 +02:00
			`if (ca.ident)`
			`filter = ident_filter(sha1);`
streaming_write_entry(): use streaming API in write_entry() When the output to a path does not have to be converted, we can read from the object database from the streaming API and write to the file in the working tree, without having to hold everything in the memory. The ident, auto- and safe- crlf conversions inherently require you to read the whole thing before deciding what to do, so while it is technically possible to support them by using a buffer of an unbound size or rewinding and reading the stream twice, it is less practical than the traditional "read the whole thing in core and convert" approach. Adding streaming filters for the other conversions on top of this should be doable by tweaking the can_bypass_conversion() function (it should be renamed to can_filter_stream() when it happens). Then the streaming API can be extended to wrap the git_istream streaming_write_entry() opens on the underlying object in another git_istream that reads from it, filters what is read, and let the streaming_write_entry() read the filtered result. But that is outside the scope of this series. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-12 23:31:08 +02:00
convert.c: ident + core.autocrlf didn't work When the ident attributes is set, get_stream_filter() did not obey core.autocrlf=true, and the file was checked out with LF. Change the rule when a streaming filter can be used: - if an external filter is specified, don't use a stream filter. - if the worktree eol is CRLF and "auto" is active, don't use a stream filter. - Otherwise the stream filter can be used. Add test cases in t0027. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-04-25 18:56:32 +02:00			`if (output_eol(ca.crlf_action) == EOL_CRLF)`
convert: track state in LF-to-CRLF filter There may not be enough space to store CRLF in the output. If we don't fill the buffer, then the filter will keep getting called with the same short buffer and will loop forever. Instead, always store the CR and record whether there's a missing LF if so we store it in the output buffer the next time the function gets called. Reported-by: Henrik Grubbström <grubba@roxen.com> Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-11-28 11:48:12 +01:00			`filter = cascade_filter(filter, lf_to_crlf_filter());`
convert.c: ident + core.autocrlf didn't work When the ident attributes is set, get_stream_filter() did not obey core.autocrlf=true, and the file was checked out with LF. Change the rule when a streaming filter can be used: - if an external filter is specified, don't use a stream filter. - if the worktree eol is CRLF and "auto" is active, don't use a stream filter. - Otherwise the stream filter can be used. Add test cases in t0027. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-04-25 18:56:32 +02:00			`else`
			`filter = cascade_filter(filter, &null_filter_singleton);`
Add LF-to-CRLF streaming conversion If we do not have to guess or validate by scanning the input, we can just stream this through. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 01:47:56 +02:00
streaming filter: ident filter Add support for "ident" filter on the output codepath. This does not work with lf-to-crlf filter together (yet). Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 03:28:00 +02:00			`return filter;`
Add streaming filter API This introduces an API to plug custom filters to an input stream. The caller gets get_stream_filter("path") to obtain an appropriate filter for the path, and then uses it when opening an input stream via open_istream(). After that, the caller can read from the stream with read_istream(), and close it with close_istream(), just like an unfiltered stream. This only adds a "null" filter that is a pass-thru filter, but later changes can add LF-to-CRLF and other filters, and the callers of the streaming API do not have to change. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-20 23:33:31 +02:00			`}`

			`void free_stream_filter(struct stream_filter *filter)`
			`{`
			`filter->vtbl->free(filter);`
			`}`

			`int stream_filter(struct stream_filter *filter,`
			`const char input, size_t isize_p,`
			`char output, size_t osize_p)`
			`{`
			`return filter->vtbl->filter(filter, input, isize_p, output, osize_p);`
streaming_write_entry(): use streaming API in write_entry() When the output to a path does not have to be converted, we can read from the object database from the streaming API and write to the file in the working tree, without having to hold everything in the memory. The ident, auto- and safe- crlf conversions inherently require you to read the whole thing before deciding what to do, so while it is technically possible to support them by using a buffer of an unbound size or rewinding and reading the stream twice, it is less practical than the traditional "read the whole thing in core and convert" approach. Adding streaming filters for the other conversions on top of this should be doable by tweaking the can_bypass_conversion() function (it should be renamed to can_filter_stream() when it happens). Then the streaming API can be extended to wrap the git_istream streaming_write_entry() opens on the underlying object in another git_istream that reads from it, filters what is read, and let the streaming_write_entry() read the filtered result. But that is outside the scope of this series. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-12 23:31:08 +02:00			`}`