mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-11-18 23:14:51 +01:00

629 lines

14 KiB

C

Raw Normal View History

Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`#include "cache.h"`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`#include "attr.h"`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`#include "run-command.h"`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`/*`
			`* convert.c - convert a file when checking it out and checking it in.`
			`*`
			`* This should use the pathname to decide on whether it wants to do some`
			`* more interesting conversions (automatic gzip/unzip, general format`
			`* conversions etc etc), but by default it just does automatic CRLF<->LF`
			`* translation when the "auto_crlf" option is set.`
			`*/`

Update 'crlf' attribute semantics. This updates the semantics of 'crlf' so that .gitattributes file can say "this is text, even though it may look funny". Setting the `crlf` attribute on a path is meant to mark the path as a "text" file. 'core.autocrlf' conversion takes place without guessing the content type by inspection. Unsetting the `crlf` attribute on a path is meant to mark the path as a "binary" file. The path never goes through line endings conversion upon checkin/checkout. Unspecified `crlf` attribute tells git to apply the `core.autocrlf` conversion when the file content looks like text. Setting the `crlf` attribut to string value "input" is similar to setting the attribute to `true`, but also forces git to act as if `core.autocrlf` is set to `input` for the path. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-20 07:37:19 +02:00			`#define CRLF_GUESS (-1)`
			`#define CRLF_BINARY 0`
			`#define CRLF_TEXT 1`
			`#define CRLF_INPUT 2`

Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`struct text_stat {`
treat any file with NUL as binary There are two heuristics in Git to detect whether a file is binary or text. One in xdiff-interface.c (which is taken from GNU diff) relies on existence of the NUL byte at the beginning. However, convert.c used a different heuristic, which relied on the percent of non-printable symbols (less than 1% for text files). Due to differences in detection whether a file is binary or not, it was possible that a file that diff treats as binary could be treated as text by CRLF conversion. This is very confusing for a user who sees that 'git diff' shows the file as binary expects it to be added as binary. This patch makes is_binary to consider any file that contains at least one NUL character as binary, to ensure that the heuristics used for CRLF conversion is tighter than what is used by diff. Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-16 02:59:12 +01:00			`/* NUL, CR, LF and CRLF counts */`
			`unsigned nul, cr, lf, crlf;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
			`/* These are just approximations! */`
			`unsigned printable, nonprintable;`
			`};`

			`static void gather_stats(const char buf, unsigned long size, struct text_stat stats)`
			`{`
			`unsigned long i;`

			`memset(stats, 0, sizeof(*stats));`

			`for (i = 0; i < size; i++) {`
			`unsigned char c = buf[i];`
			`if (c == '\r') {`
			`stats->cr++;`
			`if (i+1 < size && buf[i+1] == '\n')`
			`stats->crlf++;`
			`continue;`
			`}`
			`if (c == '\n') {`
			`stats->lf++;`
			`continue;`
			`}`
			`if (c == 127)`
			`/* DEL */`
			`stats->nonprintable++;`
			`else if (c < 32) {`
			`switch (c) {`
			`/* BS, HT, ESC and FF */`
			`case '\b': case '\t': case '\033': case '\014':`
			`stats->printable++;`
			`break;`
treat any file with NUL as binary There are two heuristics in Git to detect whether a file is binary or text. One in xdiff-interface.c (which is taken from GNU diff) relies on existence of the NUL byte at the beginning. However, convert.c used a different heuristic, which relied on the percent of non-printable symbols (less than 1% for text files). Due to differences in detection whether a file is binary or not, it was possible that a file that diff treats as binary could be treated as text by CRLF conversion. This is very confusing for a user who sees that 'git diff' shows the file as binary expects it to be added as binary. This patch makes is_binary to consider any file that contains at least one NUL character as binary, to ensure that the heuristics used for CRLF conversion is tighter than what is used by diff. Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-16 02:59:12 +01:00			`case 0:`
			`stats->nul++;`
			`/* fall through */`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`default:`
			`stats->nonprintable++;`
			`}`
			`}`
			`else`
			`stats->printable++;`
			`}`
Fixed text file auto-detection: treat EOF character 032 at the end of file as printable Signed-off-by: Dmitry Kakurin <Dmitry.Kakurin@gmail.com> Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-11 18:48:16 +02:00
			`/* If file ends with EOF then don't count this EOF as non-printable. */`
			`if (size >= 1 && buf[size-1] == '\032')`
			`stats->nonprintable--;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`}`

			`/*`
			`* The same heuristics as diff.c::mmfile_is_binary()`
			`*/`
			`static int is_binary(unsigned long size, struct text_stat *stats)`
			`{`

treat any file with NUL as binary There are two heuristics in Git to detect whether a file is binary or text. One in xdiff-interface.c (which is taken from GNU diff) relies on existence of the NUL byte at the beginning. However, convert.c used a different heuristic, which relied on the percent of non-printable symbols (less than 1% for text files). Due to differences in detection whether a file is binary or not, it was possible that a file that diff treats as binary could be treated as text by CRLF conversion. This is very confusing for a user who sees that 'git diff' shows the file as binary expects it to be added as binary. This patch makes is_binary to consider any file that contains at least one NUL character as binary, to ensure that the heuristics used for CRLF conversion is tighter than what is used by diff. Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-01-16 02:59:12 +01:00			`if (stats->nul)`
			`return 1;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`if ((stats->printable >> 7) < stats->nonprintable)`
			`return 1;`
			`/*`
			`* Other heuristics? Average line length might be relevant,`
			`* as might LF vs CR vs CRLF counts..`
			`*`
			`* NOTE! It might be normal to have a low ratio of CRLF to LF`
			`* (somebody starts with a LF-only file and edits it with an editor`
			`* that adds CRLF only to lines that are added..). But do we`
			`* want to support CR-only? Probably not.`
			`*/`
			`return 0;`
			`}`

safecrlf: Add mechanism to warn about irreversible crlf conversions CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data. If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately. Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data. This patch adds a mechanism that can either warn the user about an irreversible conversion or can even refuse to convert. The mechanism is controlled by the variable core.safecrlf, with the following values: - false: disable safecrlf mechanism - warn: warn about irreversible conversions - true: refuse irreversible conversions The default is to warn. Users are only affected by this default if core.autocrlf is set. But the current default of git is to leave core.autocrlf unset, so users will not see warnings unless they deliberately chose to activate the autocrlf mechanism. The safecrlf mechanism's details depend on the git command. The general principles when safecrlf is active (not false) are: - we warn/error out if files in the work tree can modified in an irreversible way without giving the user a chance to backup the original file. - for read-only operations that do not modify files in the work tree we do not not print annoying warnings. There are exceptions. Even though... - "git add" itself does not touch the files in the work tree, the next checkout would, so the safety triggers; - "git apply" to update a text file with a patch does touch the files in the work tree, but the operation is about text files and CRLF conversion is about fixing the line ending inconsistencies, so the safety does not trigger; - "git diff" itself does not touch the files in the work tree, it is often run to inspect the changes you intend to next "git add". To catch potential problems early, safety triggers. The concept of a safety check was originally proposed in a similar way by Linus Torvalds. Thanks to Dimitry Potapov for insisting on getting the naked LF/autocrlf=true case right. Signed-off-by: Steffen Prohaska <prohaska@zib.de> 2008-02-06 12:25:58 +01:00			`static void check_safe_crlf(const char *path, int action,`
			`struct text_stat *stats, enum safe_crlf checksafe)`
			`{`
			`if (!checksafe)`
			`return;`

			`if (action == CRLF_INPUT \|\| auto_crlf <= 0) {`
			`/*`
			`* CRLFs would not be restored by checkout:`
			`* check if we'd remove CRLFs`
			`*/`
			`if (stats->crlf) {`
			`if (checksafe == SAFE_CRLF_WARN)`
			`warning("CRLF will be replaced by LF in %s.", path);`
			`else /* i.e. SAFE_CRLF_FAIL */`
			`die("CRLF would be replaced by LF in %s.", path);`
			`}`
			`} else if (auto_crlf > 0) {`
			`/*`
			`* CRLFs would be added by checkout:`
			`* check if we have "naked" LFs`
			`*/`
			`if (stats->lf != stats->crlf) {`
			`if (checksafe == SAFE_CRLF_WARN)`
			`warning("LF will be replaced by CRLF in %s", path);`
			`else /* i.e. SAFE_CRLF_FAIL */`
			`die("LF would be replaced by CRLF in %s", path);`
			`}`
			`}`
			`}`

Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`static int crlf_to_git(const char path, const char src, size_t len,`
safecrlf: Add mechanism to warn about irreversible crlf conversions CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data. If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately. Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data. This patch adds a mechanism that can either warn the user about an irreversible conversion or can even refuse to convert. The mechanism is controlled by the variable core.safecrlf, with the following values: - false: disable safecrlf mechanism - warn: warn about irreversible conversions - true: refuse irreversible conversions The default is to warn. Users are only affected by this default if core.autocrlf is set. But the current default of git is to leave core.autocrlf unset, so users will not see warnings unless they deliberately chose to activate the autocrlf mechanism. The safecrlf mechanism's details depend on the git command. The general principles when safecrlf is active (not false) are: - we warn/error out if files in the work tree can modified in an irreversible way without giving the user a chance to backup the original file. - for read-only operations that do not modify files in the work tree we do not not print annoying warnings. There are exceptions. Even though... - "git add" itself does not touch the files in the work tree, the next checkout would, so the safety triggers; - "git apply" to update a text file with a patch does touch the files in the work tree, but the operation is about text files and CRLF conversion is about fixing the line ending inconsistencies, so the safety does not trigger; - "git diff" itself does not touch the files in the work tree, it is often run to inspect the changes you intend to next "git add". To catch potential problems early, safety triggers. The concept of a safety check was originally proposed in a similar way by Linus Torvalds. Thanks to Dimitry Potapov for insisting on getting the naked LF/autocrlf=true case right. Signed-off-by: Steffen Prohaska <prohaska@zib.de> 2008-02-06 12:25:58 +01:00			`struct strbuf *buf, int action, enum safe_crlf checksafe)`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`{`
			`struct text_stat stats;`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`char *dst;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`if ((action == CRLF_BINARY) \|\| !auto_crlf \|\| !len)`
			`return 0;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`gather_stats(src, len, &stats);`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
Update 'crlf' attribute semantics. This updates the semantics of 'crlf' so that .gitattributes file can say "this is text, even though it may look funny". Setting the `crlf` attribute on a path is meant to mark the path as a "text" file. 'core.autocrlf' conversion takes place without guessing the content type by inspection. Unsetting the `crlf` attribute on a path is meant to mark the path as a "binary" file. The path never goes through line endings conversion upon checkin/checkout. Unspecified `crlf` attribute tells git to apply the `core.autocrlf` conversion when the file content looks like text. Setting the `crlf` attribut to string value "input" is similar to setting the attribute to `true`, but also forces git to act as if `core.autocrlf` is set to `input` for the path. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-20 07:37:19 +02:00			`if (action == CRLF_GUESS) {`
Fix 'crlf' attribute semantics. Earlier we said 'crlf lets the path go through core.autocrlf process while !crlf disables it altogether'. This fixes the semantics to: - Lack of 'crlf' attribute makes core.autocrlf to apply (i.e. we guess based on the contents and if platform expresses its desire to have CRLF line endings via core.autocrlf, we do so). - Setting 'crlf' attribute to true forces CRLF line endings in working tree files, even if blob does not look like text (e.g. contains NUL or other bytes we consider binary). - Setting 'crlf' attribute to false disables conversion. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-15 22:35:45 +02:00			`/*`
			`* We're currently not going to even try to convert stuff`
			`* that has bare CR characters. Does anybody do that crazy`
			`* stuff?`
			`*/`
			`if (stats.cr != stats.crlf)`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return 0;`
Fix 'crlf' attribute semantics. Earlier we said 'crlf lets the path go through core.autocrlf process while !crlf disables it altogether'. This fixes the semantics to: - Lack of 'crlf' attribute makes core.autocrlf to apply (i.e. we guess based on the contents and if platform expresses its desire to have CRLF line endings via core.autocrlf, we do so). - Setting 'crlf' attribute to true forces CRLF line endings in working tree files, even if blob does not look like text (e.g. contains NUL or other bytes we consider binary). - Setting 'crlf' attribute to false disables conversion. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-15 22:35:45 +02:00
			`/*`
			`* And add some heuristics for binary vs text, of course...`
			`*/`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`if (is_binary(len, &stats))`
			`return 0;`
Fix 'crlf' attribute semantics. Earlier we said 'crlf lets the path go through core.autocrlf process while !crlf disables it altogether'. This fixes the semantics to: - Lack of 'crlf' attribute makes core.autocrlf to apply (i.e. we guess based on the contents and if platform expresses its desire to have CRLF line endings via core.autocrlf, we do so). - Setting 'crlf' attribute to true forces CRLF line endings in working tree files, even if blob does not look like text (e.g. contains NUL or other bytes we consider binary). - Setting 'crlf' attribute to false disables conversion. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-15 22:35:45 +02:00			`}`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
safecrlf: Add mechanism to warn about irreversible crlf conversions CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data. If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately. Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data. This patch adds a mechanism that can either warn the user about an irreversible conversion or can even refuse to convert. The mechanism is controlled by the variable core.safecrlf, with the following values: - false: disable safecrlf mechanism - warn: warn about irreversible conversions - true: refuse irreversible conversions The default is to warn. Users are only affected by this default if core.autocrlf is set. But the current default of git is to leave core.autocrlf unset, so users will not see warnings unless they deliberately chose to activate the autocrlf mechanism. The safecrlf mechanism's details depend on the git command. The general principles when safecrlf is active (not false) are: - we warn/error out if files in the work tree can modified in an irreversible way without giving the user a chance to backup the original file. - for read-only operations that do not modify files in the work tree we do not not print annoying warnings. There are exceptions. Even though... - "git add" itself does not touch the files in the work tree, the next checkout would, so the safety triggers; - "git apply" to update a text file with a patch does touch the files in the work tree, but the operation is about text files and CRLF conversion is about fixing the line ending inconsistencies, so the safety does not trigger; - "git diff" itself does not touch the files in the work tree, it is often run to inspect the changes you intend to next "git add". To catch potential problems early, safety triggers. The concept of a safety check was originally proposed in a similar way by Linus Torvalds. Thanks to Dimitry Potapov for insisting on getting the naked LF/autocrlf=true case right. Signed-off-by: Steffen Prohaska <prohaska@zib.de> 2008-02-06 12:25:58 +01:00			`check_safe_crlf(path, action, &stats, checksafe);`

			`/* Optimization: No CR? Nothing to convert, regardless. */`
			`if (!stats.cr)`
			`return 0;`

Fix in-place editing functions in convert.c * crlf_to_git and ident_to_git: Don't grow the buffer if there is enough space in the first place. As a side effect, when the editing is done "in place", we don't grow, so the buffer pointer doesn't changes, and `src' isn't invalidated anymore. Thanks to Bernt Hansen for the bug report. * apply_filter: Fix memory leak due to fake in-place editing that didn't collected the old buffer when the filter succeeds. Also a cosmetic fix. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Lars Hjemli <hjemli@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-05 10:11:59 +02:00			`/* only grow if not in place */`
			`if (strbuf_avail(buf) + buf->len < len)`
			`strbuf_grow(buf, len - buf->len);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`dst = buf->buf;`
Update 'crlf' attribute semantics. This updates the semantics of 'crlf' so that .gitattributes file can say "this is text, even though it may look funny". Setting the `crlf` attribute on a path is meant to mark the path as a "text" file. 'core.autocrlf' conversion takes place without guessing the content type by inspection. Unsetting the `crlf` attribute on a path is meant to mark the path as a "binary" file. The path never goes through line endings conversion upon checkin/checkout. Unspecified `crlf` attribute tells git to apply the `core.autocrlf` conversion when the file content looks like text. Setting the `crlf` attribut to string value "input" is similar to setting the attribute to `true`, but also forces git to act as if `core.autocrlf` is set to `input` for the path. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-20 07:37:19 +02:00			`if (action == CRLF_GUESS) {`
			`/*`
			`* If we guessed, we already know we rejected a file with`
			`* lone CR, and we can strip a CR without looking at what`
			`* follow it.`
			`*/`
Fix 'crlf' attribute semantics. Earlier we said 'crlf lets the path go through core.autocrlf process while !crlf disables it altogether'. This fixes the semantics to: - Lack of 'crlf' attribute makes core.autocrlf to apply (i.e. we guess based on the contents and if platform expresses its desire to have CRLF line endings via core.autocrlf, we do so). - Setting 'crlf' attribute to true forces CRLF line endings in working tree files, even if blob does not look like text (e.g. contains NUL or other bytes we consider binary). - Setting 'crlf' attribute to false disables conversion. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-15 22:35:45 +02:00			`do {`
Simplify calling of CR/LF conversion routines Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-19 02:05:03 +02:00			`unsigned char c = *src++;`
Fix 'crlf' attribute semantics. Earlier we said 'crlf lets the path go through core.autocrlf process while !crlf disables it altogether'. This fixes the semantics to: - Lack of 'crlf' attribute makes core.autocrlf to apply (i.e. we guess based on the contents and if platform expresses its desire to have CRLF line endings via core.autocrlf, we do so). - Setting 'crlf' attribute to true forces CRLF line endings in working tree files, even if blob does not look like text (e.g. contains NUL or other bytes we consider binary). - Setting 'crlf' attribute to false disables conversion. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-15 22:35:45 +02:00			`if (c != '\r')`
Simplify calling of CR/LF conversion routines Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-19 02:05:03 +02:00			`*dst++ = c;`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`} while (--len);`
Fix 'crlf' attribute semantics. Earlier we said 'crlf lets the path go through core.autocrlf process while !crlf disables it altogether'. This fixes the semantics to: - Lack of 'crlf' attribute makes core.autocrlf to apply (i.e. we guess based on the contents and if platform expresses its desire to have CRLF line endings via core.autocrlf, we do so). - Setting 'crlf' attribute to true forces CRLF line endings in working tree files, even if blob does not look like text (e.g. contains NUL or other bytes we consider binary). - Setting 'crlf' attribute to false disables conversion. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-15 22:35:45 +02:00			`} else {`
			`do {`
Simplify calling of CR/LF conversion routines Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-19 02:05:03 +02:00			`unsigned char c = *src++;`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`if (! (c == '\r' && (1 < len && *src == '\n')))`
Simplify calling of CR/LF conversion routines Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-19 02:05:03 +02:00			`*dst++ = c;`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`} while (--len);`
Fix 'crlf' attribute semantics. Earlier we said 'crlf lets the path go through core.autocrlf process while !crlf disables it altogether'. This fixes the semantics to: - Lack of 'crlf' attribute makes core.autocrlf to apply (i.e. we guess based on the contents and if platform expresses its desire to have CRLF line endings via core.autocrlf, we do so). - Setting 'crlf' attribute to true forces CRLF line endings in working tree files, even if blob does not look like text (e.g. contains NUL or other bytes we consider binary). - Setting 'crlf' attribute to false disables conversion. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-15 22:35:45 +02:00			`}`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`strbuf_setlen(buf, dst - buf->buf);`
			`return 1;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`}`

Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`static int crlf_to_worktree(const char path, const char src, size_t len,`
			`struct strbuf *buf, int action)`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`{`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`char *to_free = NULL;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`struct text_stat stats;`

Update 'crlf' attribute semantics. This updates the semantics of 'crlf' so that .gitattributes file can say "this is text, even though it may look funny". Setting the `crlf` attribute on a path is meant to mark the path as a "text" file. 'core.autocrlf' conversion takes place without guessing the content type by inspection. Unsetting the `crlf` attribute on a path is meant to mark the path as a "binary" file. The path never goes through line endings conversion upon checkin/checkout. Unspecified `crlf` attribute tells git to apply the `core.autocrlf` conversion when the file content looks like text. Setting the `crlf` attribut to string value "input" is similar to setting the attribute to `true`, but also forces git to act as if `core.autocrlf` is set to `input` for the path. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-20 07:37:19 +02:00			`if ((action == CRLF_BINARY) \|\| (action == CRLF_INPUT) \|\|`
Fix crlf attribute handling to match documentation gitattributes.txt says, of the crlf attribute: Set:: Setting the `crlf` attribute on a path is meant to mark the path as a "text" file. 'core.autocrlf' conversion takes place without guessing the content type by inspection. That is to say that the crlf attribute does not force the file to have CRLF line endings, instead it removes the autocrlf guesswork and forces the file to be treated as text. Then, whatever line ending is defined by the autocrlf setting is applied. However, that is not what convert.c was doing. The conversion to CRLF was being skipped in crlf_to_worktree() when the following condition was true: action == CRLF_GUESS && auto_crlf <= 0 That is to say conversion took place when not in guess mode (crlf attribute not specified) or core.autocrlf set to true. This was wrong. It meant that the crlf attribute being on for a given file _forced_ CRLF conversion, when actually it should force the file to be treated as text, and converted accordingly. The real test should simply be auto_crlf <= 0 That is to say, if core.autocrlf is falsei (or input), conversion from LF to CRLF is never done. When core.autocrlf is true, conversion from LF to CRLF is done only when in CRLF_GUESS (and the guess is "text"), or CRLF_TEXT mode. Similarly for crlf_to_worktree(), if core.autocrlf is false, no conversion should _ever_ take place. In reality it was only not taking place if core.autocrlf was false _and_ the crlf attribute was unspecified. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-18 14:33:32 +02:00			`auto_crlf <= 0)`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return 0;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`if (!len)`
			`return 0;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`gather_stats(src, len, &stats);`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
			`/* No LF? Nothing to convert, regardless. */`
			`if (!stats.lf)`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return 0;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
			`/* Was it already in CRLF format? */`
			`if (stats.lf == stats.crlf)`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return 0;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
Update 'crlf' attribute semantics. This updates the semantics of 'crlf' so that .gitattributes file can say "this is text, even though it may look funny". Setting the `crlf` attribute on a path is meant to mark the path as a "text" file. 'core.autocrlf' conversion takes place without guessing the content type by inspection. Unsetting the `crlf` attribute on a path is meant to mark the path as a "binary" file. The path never goes through line endings conversion upon checkin/checkout. Unspecified `crlf` attribute tells git to apply the `core.autocrlf` conversion when the file content looks like text. Setting the `crlf` attribut to string value "input" is similar to setting the attribute to `true`, but also forces git to act as if `core.autocrlf` is set to `input` for the path. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-20 07:37:19 +02:00			`if (action == CRLF_GUESS) {`
Fix 'crlf' attribute semantics. Earlier we said 'crlf lets the path go through core.autocrlf process while !crlf disables it altogether'. This fixes the semantics to: - Lack of 'crlf' attribute makes core.autocrlf to apply (i.e. we guess based on the contents and if platform expresses its desire to have CRLF line endings via core.autocrlf, we do so). - Setting 'crlf' attribute to true forces CRLF line endings in working tree files, even if blob does not look like text (e.g. contains NUL or other bytes we consider binary). - Setting 'crlf' attribute to false disables conversion. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-15 22:35:45 +02:00			`/* If we have any bare CR characters, we're not going to touch it */`
			`if (stats.cr != stats.crlf)`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return 0;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`if (is_binary(len, &stats))`
			`return 0;`
Fix 'crlf' attribute semantics. Earlier we said 'crlf lets the path go through core.autocrlf process while !crlf disables it altogether'. This fixes the semantics to: - Lack of 'crlf' attribute makes core.autocrlf to apply (i.e. we guess based on the contents and if platform expresses its desire to have CRLF line endings via core.autocrlf, we do so). - Setting 'crlf' attribute to true forces CRLF line endings in working tree files, even if blob does not look like text (e.g. contains NUL or other bytes we consider binary). - Setting 'crlf' attribute to false disables conversion. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-15 22:35:45 +02:00			`}`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`/* are we "faking" in place editing ? */`
			`if (src == buf->buf)`
strbuf change: be sure ->buf is never ever NULL. For that purpose, the ->buf is always initialized with a char * buf living in the strbuf module. It is made a char * so that we can sloppily accept things that perform: sb->buf[0] = '\0', and because you can't pass "" as an initializer for ->buf without making gcc unhappy for very good reasons. strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf anymore. as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying ->buf isn't an option anymore, if ->buf is going to escape from the scope, and eventually be free'd. API changes: * strbuf_setlen now always works, so just make strbuf_reset a convenience macro. * strbuf_detatch takes a size_t* optional argument (meaning it can be NULL) to copy the buffer's len, as it was needed for this refactor to make the code more readable, and working like the callers. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-27 12:58:23 +02:00			`to_free = strbuf_detach(buf, NULL);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00
			`strbuf_grow(buf, len + stats.lf - stats.crlf);`
			`for (;;) {`
			`const char *nl = memchr(src, '\n', len);`
			`if (!nl)`
			`break;`
			`if (nl > src && nl[-1] == '\r') {`
			`strbuf_add(buf, src, nl + 1 - src);`
			`} else {`
			`strbuf_add(buf, src, nl - src);`
			`strbuf_addstr(buf, "\r\n");`
			`}`
			`len -= nl + 1 - src;`
			`src = nl + 1;`
			`}`
			`strbuf_add(buf, src, len);`

			`free(to_free);`
			`return 1;`
Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-02-13 20:07:23 +01:00			`}`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`struct filter_params {`
			`const char *src;`
			`unsigned long size;`
			`const char *cmd;`
			`};`

			`static int filter_buffer(int fd, void *data)`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`{`
			`/*`
			`* Spawn cmd and feed the buffer contents through its stdin.`
			`*/`
			`struct child_process child_process;`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`struct filter_params params = (struct filter_params )data;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`int write_err, status;`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`const char *argv[] = { "sh", "-c", params->cmd, NULL };`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
			`memset(&child_process, 0, sizeof(child_process));`
Use start_command() to run content filters instead of explicit fork/exec. The previous code already used finish_command() to wait for the process to terminate, but did not use start_command() to run it. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:47:55 +02:00			`child_process.argv = argv;`
			`child_process.in = -1;`
Avoid a dup2(2) in apply_filter() - start_command() can do it for us. When apply_filter() runs the external (clean or smudge) filter program, it needs to pass the writable end of a pipe as its stdout. For this purpose, it used to dup2(2) the file descriptor explicitly to stdout. Now we use the facilities of start_command() to do it for us. Furthermore, the path argument of a subordinate function, filter_buffer(), was not used, so here we replace it to pass the fd instead. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:05 +02:00			`child_process.out = fd;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
Use start_command() to run content filters instead of explicit fork/exec. The previous code already used finish_command() to wait for the process to terminate, but did not use start_command() to run it. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:47:55 +02:00			`if (start_command(&child_process))`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`return error("cannot fork to run external filter %s", params->cmd);`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`write_err = (write_in_full(child_process.in, params->src, params->size) < 0);`
Use start_command() to run content filters instead of explicit fork/exec. The previous code already used finish_command() to wait for the process to terminate, but did not use start_command() to run it. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:47:55 +02:00			`if (close(child_process.in))`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`write_err = 1;`
			`if (write_err)`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`error("cannot feed the input to external filter %s", params->cmd);`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
			`status = finish_command(&child_process);`
			`if (status)`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`error("external filter %s failed %d", params->cmd, -status);`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`return (write_err \|\| status);`
			`}`

Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`static int apply_filter(const char path, const char src, size_t len,`
			`struct strbuf dst, const char cmd)`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`{`
			`/*`
			`* Create a pipeline to have the command filter the buffer's`
			`* contents.`
			`*`
			`* (child --> cmd) --> us`
			`*/`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`int ret = 1;`
Replace calls to strbuf_init(&foo, 0) with STRBUF_INIT initializer Many call sites use strbuf_init(&foo, 0) to initialize local strbuf variable "foo" which has not been accessed since its declaration. These can be replaced with a static initialization using the STRBUF_INIT macro which is just as readable, saves a function call, and takes up fewer lines. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2008-10-09 21:12:12 +02:00			`struct strbuf nbuf = STRBUF_INIT;`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`struct async async;`
			`struct filter_params params;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
			`if (!cmd)`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return 0;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`memset(&async, 0, sizeof(async));`
			`async.proc = filter_buffer;`
			`async.data = &params;`
			`params.src = src;`
			`params.size = len;`
			`params.cmd = cmd;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
			`fflush(NULL);`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`if (start_async(&async))`
			`return 0; /* error was already reported */`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`if (strbuf_read(&nbuf, async.out, len) < 0) {`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`error("read from external filter %s failed", cmd);`
			`ret = 0;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`}`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`if (close(async.out)) {`
Fix in-place editing functions in convert.c * crlf_to_git and ident_to_git: Don't grow the buffer if there is enough space in the first place. As a side effect, when the editing is done "in place", we don't grow, so the buffer pointer doesn't changes, and `src' isn't invalidated anymore. Thanks to Bernt Hansen for the bug report. * apply_filter: Fix memory leak due to fake in-place editing that didn't collected the old buffer when the filter succeeds. Also a cosmetic fix. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Lars Hjemli <hjemli@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-05 10:11:59 +02:00			`error("read from external filter %s failed", cmd);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`ret = 0;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`}`
Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:06 +02:00			`if (finish_async(&async)) {`
			`error("external filter %s failed", cmd);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`ret = 0;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`}`

Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`if (ret) {`
Fix in-place editing functions in convert.c * crlf_to_git and ident_to_git: Don't grow the buffer if there is enough space in the first place. As a side effect, when the editing is done "in place", we don't grow, so the buffer pointer doesn't changes, and `src' isn't invalidated anymore. Thanks to Bernt Hansen for the bug report. * apply_filter: Fix memory leak due to fake in-place editing that didn't collected the old buffer when the filter succeeds. Also a cosmetic fix. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Lars Hjemli <hjemli@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-05 10:11:59 +02:00			`strbuf_swap(dst, &nbuf);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`}`
Fix in-place editing functions in convert.c * crlf_to_git and ident_to_git: Don't grow the buffer if there is enough space in the first place. As a side effect, when the editing is done "in place", we don't grow, so the buffer pointer doesn't changes, and `src' isn't invalidated anymore. Thanks to Bernt Hansen for the bug report. * apply_filter: Fix memory leak due to fake in-place editing that didn't collected the old buffer when the filter succeeds. Also a cosmetic fix. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Lars Hjemli <hjemli@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-05 10:11:59 +02:00			`strbuf_release(&nbuf);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return ret;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`}`

			`static struct convert_driver {`
			`const char *name;`
			`struct convert_driver *next;`
convert.c: Use 'git_config_string' to get 'smudge' and 'clean' Signed-off-by: Brian Hetro <whee@smaertness.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-05 07:24:42 +02:00			`const char *smudge;`
			`const char *clean;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`} user_convert, *user_convert_tail;`

Provide git_config with a callback-data parameter git_config() only had a function parameter, but no callback data parameter. This assumes that all callback functions only modify global variables. With this patch, every callback gets a void * parameter, and it is hoped that this will help the libification effort. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-14 19:46:53 +02:00			`static int read_convert_config(const char var, const char value, void *cb)`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`{`
			`const char ep, name;`
			`int namelen;`
			`struct convert_driver *drv;`

			`/*`
			`* External conversion drivers are configured using`
			`* "filter.<name>.variable".`
			`*/`
			`if (prefixcmp(var, "filter.") \|\| (ep = strrchr(var, '.')) == var + 6)`
			`return 0;`
			`name = var + 7;`
			`namelen = ep - name;`
			`for (drv = user_convert; drv; drv = drv->next)`
			`if (!strncmp(drv->name, name, namelen) && !drv->name[namelen])`
			`break;`
			`if (!drv) {`
			`drv = xcalloc(1, sizeof(struct convert_driver));`
Use xmemdupz() in many places. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 00:32:36 +02:00			`drv->name = xmemdupz(name, namelen);`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`*user_convert_tail = drv;`
			`user_convert_tail = &(drv->next);`
			`}`

			`ep++;`

			`/*`
			`* filter.<name>.smudge and filter.<name>.clean specifies`
			`* the command line:`
			`*`
			`* command-line`
			`*`
			`* The command-line will not be interpolated in any way.`
			`*/`

convert.c: Use 'git_config_string' to get 'smudge' and 'clean' Signed-off-by: Brian Hetro <whee@smaertness.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-05 07:24:42 +02:00			`if (!strcmp("smudge", ep))`
			`return git_config_string(&drv->smudge, var, value);`

			`if (!strcmp("clean", ep))`
			`return git_config_string(&drv->clean, var, value);`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00
			`return 0;`
			`}`

convert.c: restructure the attribute checking part. This separates the checkattr() call and interpretation of the returned value specific to the 'crlf' attribute into separate routines, so that we can run a single call to checkattr() to check for more than one attributes, and then interprete what the returned settings mean separately. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 08:44:02 +02:00			`static void setup_convert_check(struct git_attr_check *check)`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`{`
			`static struct git_attr *attr_crlf;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`static struct git_attr *attr_ident;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`static struct git_attr *attr_filter;`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`if (!attr_crlf) {`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`attr_crlf = git_attr("crlf", 4);`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`attr_ident = git_attr("ident", 5);`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`attr_filter = git_attr("filter", 6);`
			`user_convert_tail = &user_convert;`
Provide git_config with a callback-data parameter git_config() only had a function parameter, but no callback data parameter. This assumes that all callback functions only modify global variables. With this patch, every callback gets a void * parameter, and it is hoped that this will help the libification effort. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-05-14 19:46:53 +02:00			`git_config(read_convert_config, NULL);`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`}`
			`check[0].attr = attr_crlf;`
			`check[1].attr = attr_ident;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`check[2].attr = attr_filter;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`}`

			`static int count_ident(const char *cp, unsigned long size)`
			`{`
			`/*`
Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-14 15:37:25 +02:00			`* "$Id: 0000000000000000000000000000000000000000 $" <=> "$Id$"`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`*/`
			`int cnt = 0;`
			`char ch;`

			`while (size) {`
			`ch = *cp++;`
			`size--;`
			`if (ch != '$')`
			`continue;`
Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-14 15:37:25 +02:00			`if (size < 3)`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`break;`
Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-14 15:37:25 +02:00			`if (memcmp("Id", cp, 2))`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`continue;`
Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-14 15:37:25 +02:00			`ch = cp[2];`
			`cp += 3;`
			`size -= 3;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`if (ch == '$')`
Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-14 15:37:25 +02:00			`cnt++; /* $Id$ */`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`if (ch != ':')`
			`continue;`

			`/*`
Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-14 15:37:25 +02:00			`* "$Id: ... "; scan up to the closing dollar sign and discard.`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`*/`
			`while (size) {`
			`ch = *cp++;`
			`size--;`
			`if (ch == '$') {`
			`cnt++;`
			`break;`
			`}`
			`}`
			`}`
			`return cnt;`
			`}`

Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`static int ident_to_git(const char path, const char src, size_t len,`
			`struct strbuf *buf, int ident)`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`{`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`char dst, dollar;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`if (!ident \|\| !count_ident(src, len))`
			`return 0;`

Fix in-place editing functions in convert.c * crlf_to_git and ident_to_git: Don't grow the buffer if there is enough space in the first place. As a side effect, when the editing is done "in place", we don't grow, so the buffer pointer doesn't changes, and `src' isn't invalidated anymore. Thanks to Bernt Hansen for the bug report. * apply_filter: Fix memory leak due to fake in-place editing that didn't collected the old buffer when the filter succeeds. Also a cosmetic fix. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Lars Hjemli <hjemli@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-05 10:11:59 +02:00			`/* only grow if not in place */`
			`if (strbuf_avail(buf) + buf->len < len)`
			`strbuf_grow(buf, len - buf->len);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`dst = buf->buf;`
			`for (;;) {`
			`dollar = memchr(src, '$', len);`
			`if (!dollar)`
			`break;`
			`memcpy(dst, src, dollar + 1 - src);`
			`dst += dollar + 1 - src;`
			`len -= dollar + 1 - src;`
			`src = dollar + 1;`

			`if (len > 3 && !memcmp(src, "Id:", 3)) {`
			`dollar = memchr(src + 3, '$', len - 3);`
			`if (!dollar)`
			`break;`
Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-14 15:37:25 +02:00			`memcpy(dst, "Id$", 3);`
			`dst += 3;`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`len -= dollar + 1 - src;`
			`src = dollar + 1;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`}`
			`}`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`memcpy(dst, src, len);`
			`strbuf_setlen(buf, dst + len - buf->buf);`
			`return 1;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`}`

Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`static int ident_to_worktree(const char path, const char src, size_t len,`
			`struct strbuf *buf, int ident)`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`{`
			`unsigned char sha1[20];`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`char to_free = NULL, dollar;`
			`int cnt;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00
			`if (!ident)`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return 0;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`cnt = count_ident(src, len);`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`if (!cnt)`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return 0;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`/* are we "faking" in place editing ? */`
			`if (src == buf->buf)`
strbuf change: be sure ->buf is never ever NULL. For that purpose, the ->buf is always initialized with a char * buf living in the strbuf module. It is made a char * so that we can sloppily accept things that perform: sb->buf[0] = '\0', and because you can't pass "" as an initializer for ->buf without making gcc unhappy for very good reasons. strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf anymore. as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying ->buf isn't an option anymore, if ->buf is going to escape from the scope, and eventually be free'd. API changes: * strbuf_setlen now always works, so just make strbuf_reset a convenience macro. * strbuf_detatch takes a size_t* optional argument (meaning it can be NULL) to copy the buffer's len, as it was needed for this refactor to make the code more readable, and working like the callers. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-27 12:58:23 +02:00			`to_free = strbuf_detach(buf, NULL);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`hash_sha1_file(src, len, "blob", sha1);`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`strbuf_grow(buf, len + cnt * 43);`
			`for (;;) {`
			`/* step 1: run to the next '$' */`
			`dollar = memchr(src, '$', len);`
			`if (!dollar)`
			`break;`
			`strbuf_add(buf, src, dollar + 1 - src);`
			`len -= dollar + 1 - src;`
			`src = dollar + 1;`
Fix mishandling of $Id$ expanded in the repository copy in convert.c If the repository contained an expanded ident keyword (i.e. $Id:XXXX$), then the wrong bytes were discarded, and the Id keyword was not expanded. The fault was in convert.c:ident_to_worktree(). Previously, when a "$Id:" was found in the repository version, ident_to_worktree() would search for the next "$" after this, and discarded everything it found until then. That was done with the loop: do { ch = cp++; if (ch == '$') break; rem--; } while (rem); The above loop left cp pointing one character _after_ the final "$" (because of ch = cp++). This was different from the non-expanded case, were cp is left pointing at the "$", and was different from the comment which stated "discard up to but not including the closing $". This patch fixes that by making the loop: do { ch = *cp; if (ch == '$') break; cp++; rem--; } while (rem); That is, cp is tested _then_ incremented. This loop exits if it finds a "$" or if it runs out of bytes in the source. After this loop, if there was no closing "$" the expansion is skipped, and the outer loop is allowed to continue leaving this non-keyword as it was. However, when the "$" is found, size is corrected, before running the expansion: size -= (cp - src); This is wrong; size is going to be corrected anyway after the expansion, so there is no need to do it here. This patch removes that redundant correction. To help find this bug, I heavily commented the routine; those comments are included here as a bonus. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-25 12:50:08 +02:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`/* step 2: does it looks like a bit like Id:xxx$ or Id$ ? */`
			`if (len < 3 \|\| memcmp("Id", src, 2))`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`continue;`

Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`/* step 3: skip over Id$ or Id:xxxxx$ */`
			`if (src[2] == '$') {`
			`src += 3;`
			`len -= 3;`
			`} else if (src[2] == ':') {`
			`/*`
			`* It's possible that an expanded Id has crept its way into the`
			`* repository, we cope with that by stripping the expansion out`
			`*/`
			`dollar = memchr(src + 3, '$', len - 3);`
			`if (!dollar) {`
			`/* incomplete keyword, no more '$', so just quit the loop */`
			`break;`
			`}`
Fix mishandling of $Id$ expanded in the repository copy in convert.c If the repository contained an expanded ident keyword (i.e. $Id:XXXX$), then the wrong bytes were discarded, and the Id keyword was not expanded. The fault was in convert.c:ident_to_worktree(). Previously, when a "$Id:" was found in the repository version, ident_to_worktree() would search for the next "$" after this, and discarded everything it found until then. That was done with the loop: do { ch = cp++; if (ch == '$') break; rem--; } while (rem); The above loop left cp pointing one character _after_ the final "$" (because of ch = cp++). This was different from the non-expanded case, were cp is left pointing at the "$", and was different from the comment which stated "discard up to but not including the closing $". This patch fixes that by making the loop: do { ch = *cp; if (ch == '$') break; cp++; rem--; } while (rem); That is, cp is tested _then_ incremented. This loop exits if it finds a "$" or if it runs out of bytes in the source. After this loop, if there was no closing "$" the expansion is skipped, and the outer loop is allowed to continue leaving this non-keyword as it was. However, when the "$" is found, size is corrected, before running the expansion: size -= (cp - src); This is wrong; size is going to be corrected anyway after the expansion, so there is no need to do it here. This patch removes that redundant correction. To help find this bug, I heavily commented the routine; those comments are included here as a bonus. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-25 12:50:08 +02:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`len -= dollar + 1 - src;`
			`src = dollar + 1;`
			`} else {`
			`/* it wasn't a "Id$" or "Id:xxxx$" */`
			`continue;`
			`}`
Fix mishandling of $Id$ expanded in the repository copy in convert.c If the repository contained an expanded ident keyword (i.e. $Id:XXXX$), then the wrong bytes were discarded, and the Id keyword was not expanded. The fault was in convert.c:ident_to_worktree(). Previously, when a "$Id:" was found in the repository version, ident_to_worktree() would search for the next "$" after this, and discarded everything it found until then. That was done with the loop: do { ch = cp++; if (ch == '$') break; rem--; } while (rem); The above loop left cp pointing one character _after_ the final "$" (because of ch = cp++). This was different from the non-expanded case, were cp is left pointing at the "$", and was different from the comment which stated "discard up to but not including the closing $". This patch fixes that by making the loop: do { ch = *cp; if (ch == '$') break; cp++; rem--; } while (rem); That is, cp is tested _then_ incremented. This loop exits if it finds a "$" or if it runs out of bytes in the source. After this loop, if there was no closing "$" the expansion is skipped, and the outer loop is allowed to continue leaving this non-keyword as it was. However, when the "$" is found, size is corrected, before running the expansion: size -= (cp - src); This is wrong; size is going to be corrected anyway after the expansion, so there is no need to do it here. This patch removes that redundant correction. To help find this bug, I heavily commented the routine; those comments are included here as a bonus. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-25 12:50:08 +02:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`/* step 4: substitute */`
			`strbuf_addstr(buf, "Id: ");`
			`strbuf_add(buf, sha1_to_hex(sha1), 40);`
			`strbuf_addstr(buf, " $");`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`}`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`strbuf_add(buf, src, len);`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`free(to_free);`
			`return 1;`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`}`

convert.c: restructure the attribute checking part. This separates the checkattr() call and interpretation of the returned value specific to the 'crlf' attribute into separate routines, so that we can run a single call to checkattr() to check for more than one attributes, and then interprete what the returned settings mean separately. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 08:44:02 +02:00			`static int git_path_check_crlf(const char path, struct git_attr_check check)`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`{`
convert.c: restructure the attribute checking part. This separates the checkattr() call and interpretation of the returned value specific to the 'crlf' attribute into separate routines, so that we can run a single call to checkattr() to check for more than one attributes, and then interprete what the returned settings mean separately. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 08:44:02 +02:00			`const char *value = check->value;`

			`if (ATTR_TRUE(value))`
			`return CRLF_TEXT;`
			`else if (ATTR_FALSE(value))`
			`return CRLF_BINARY;`
			`else if (ATTR_UNSET(value))`
			`;`
			`else if (!strcmp(value, "input"))`
			`return CRLF_INPUT;`
Update 'crlf' attribute semantics. This updates the semantics of 'crlf' so that .gitattributes file can say "this is text, even though it may look funny". Setting the `crlf` attribute on a path is meant to mark the path as a "text" file. 'core.autocrlf' conversion takes place without guessing the content type by inspection. Unsetting the `crlf` attribute on a path is meant to mark the path as a "binary" file. The path never goes through line endings conversion upon checkin/checkout. Unspecified `crlf` attribute tells git to apply the `core.autocrlf` conversion when the file content looks like text. Setting the `crlf` attribut to string value "input" is similar to setting the attribute to `true`, but also forces git to act as if `core.autocrlf` is set to `input` for the path. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-20 07:37:19 +02:00			`return CRLF_GUESS;`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`}`

Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`static struct convert_driver git_path_check_convert(const char path,`
			`struct git_attr_check *check)`
			`{`
			`const char *value = check->value;`
			`struct convert_driver *drv;`

			`if (ATTR_TRUE(value) \|\| ATTR_FALSE(value) \|\| ATTR_UNSET(value))`
			`return NULL;`
			`for (drv = user_convert; drv; drv = drv->next)`
			`if (!strcmp(value, drv->name))`
			`return drv;`
			`return NULL;`
			`}`

Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`static int git_path_check_ident(const char path, struct git_attr_check check)`
			`{`
			`const char *value = check->value;`

			`return !!ATTR_TRUE(value);`
			`}`

safecrlf: Add mechanism to warn about irreversible crlf conversions CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data. If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately. Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data. This patch adds a mechanism that can either warn the user about an irreversible conversion or can even refuse to convert. The mechanism is controlled by the variable core.safecrlf, with the following values: - false: disable safecrlf mechanism - warn: warn about irreversible conversions - true: refuse irreversible conversions The default is to warn. Users are only affected by this default if core.autocrlf is set. But the current default of git is to leave core.autocrlf unset, so users will not see warnings unless they deliberately chose to activate the autocrlf mechanism. The safecrlf mechanism's details depend on the git command. The general principles when safecrlf is active (not false) are: - we warn/error out if files in the work tree can modified in an irreversible way without giving the user a chance to backup the original file. - for read-only operations that do not modify files in the work tree we do not not print annoying warnings. There are exceptions. Even though... - "git add" itself does not touch the files in the work tree, the next checkout would, so the safety triggers; - "git apply" to update a text file with a patch does touch the files in the work tree, but the operation is about text files and CRLF conversion is about fixing the line ending inconsistencies, so the safety does not trigger; - "git diff" itself does not touch the files in the work tree, it is often run to inspect the changes you intend to next "git add". To catch potential problems early, safety triggers. The concept of a safety check was originally proposed in a similar way by Linus Torvalds. Thanks to Dimitry Potapov for insisting on getting the naked LF/autocrlf=true case right. Signed-off-by: Steffen Prohaska <prohaska@zib.de> 2008-02-06 12:25:58 +01:00			`int convert_to_git(const char path, const char src, size_t len,`
			`struct strbuf *dst, enum safe_crlf checksafe)`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`{`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`struct git_attr_check check[3];`
convert.c: restructure the attribute checking part. This separates the checkattr() call and interpretation of the returned value specific to the 'crlf' attribute into separate routines, so that we can run a single call to checkattr() to check for more than one attributes, and then interprete what the returned settings mean separately. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 08:44:02 +02:00			`int crlf = CRLF_GUESS;`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`int ident = 0, ret = 0;`
convert.c: Use 'git_config_string' to get 'smudge' and 'clean' Signed-off-by: Brian Hetro <whee@smaertness.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-05 07:24:42 +02:00			`const char *filter = NULL;`
convert.c: restructure the attribute checking part. This separates the checkattr() call and interpretation of the returned value specific to the 'crlf' attribute into separate routines, so that we can run a single call to checkattr() to check for more than one attributes, and then interprete what the returned settings mean separately. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 08:44:02 +02:00
			`setup_convert_check(check);`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`if (!git_checkattr(path, ARRAY_SIZE(check), check)) {`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`struct convert_driver *drv;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`crlf = git_path_check_crlf(path, check + 0);`
			`ident = git_path_check_ident(path, check + 1);`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`drv = git_path_check_convert(path, check + 2);`
			`if (drv && drv->clean)`
			`filter = drv->clean;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`}`

Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`ret \|= apply_filter(path, src, len, dst, filter);`
			`if (ret) {`
			`src = dst->buf;`
			`len = dst->len;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`}`
safecrlf: Add mechanism to warn about irreversible crlf conversions CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data. If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately. Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data. This patch adds a mechanism that can either warn the user about an irreversible conversion or can even refuse to convert. The mechanism is controlled by the variable core.safecrlf, with the following values: - false: disable safecrlf mechanism - warn: warn about irreversible conversions - true: refuse irreversible conversions The default is to warn. Users are only affected by this default if core.autocrlf is set. But the current default of git is to leave core.autocrlf unset, so users will not see warnings unless they deliberately chose to activate the autocrlf mechanism. The safecrlf mechanism's details depend on the git command. The general principles when safecrlf is active (not false) are: - we warn/error out if files in the work tree can modified in an irreversible way without giving the user a chance to backup the original file. - for read-only operations that do not modify files in the work tree we do not not print annoying warnings. There are exceptions. Even though... - "git add" itself does not touch the files in the work tree, the next checkout would, so the safety triggers; - "git apply" to update a text file with a patch does touch the files in the work tree, but the operation is about text files and CRLF conversion is about fixing the line ending inconsistencies, so the safety does not trigger; - "git diff" itself does not touch the files in the work tree, it is often run to inspect the changes you intend to next "git add". To catch potential problems early, safety triggers. The concept of a safety check was originally proposed in a similar way by Linus Torvalds. Thanks to Dimitry Potapov for insisting on getting the naked LF/autocrlf=true case right. Signed-off-by: Steffen Prohaska <prohaska@zib.de> 2008-02-06 12:25:58 +01:00			`ret \|= crlf_to_git(path, src, len, dst, crlf, checksafe);`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`if (ret) {`
			`src = dst->buf;`
			`len = dst->len;`
convert.c: restructure the attribute checking part. This separates the checkattr() call and interpretation of the returned value specific to the 'crlf' attribute into separate routines, so that we can run a single call to checkattr() to check for more than one attributes, and then interprete what the returned settings mean separately. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 08:44:02 +02:00			`}`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return ret \| ident_to_git(path, src, len, dst, ident);`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`}`

Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`int convert_to_working_tree(const char path, const char src, size_t len, struct strbuf *dst)`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`{`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`struct git_attr_check check[3];`
convert.c: restructure the attribute checking part. This separates the checkattr() call and interpretation of the returned value specific to the 'crlf' attribute into separate routines, so that we can run a single call to checkattr() to check for more than one attributes, and then interprete what the returned settings mean separately. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 08:44:02 +02:00			`int crlf = CRLF_GUESS;`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`int ident = 0, ret = 0;`
convert.c: Use 'git_config_string' to get 'smudge' and 'clean' Signed-off-by: Brian Hetro <whee@smaertness.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-07-05 07:24:42 +02:00			`const char *filter = NULL;`
convert.c: restructure the attribute checking part. This separates the checkattr() call and interpretation of the returned value specific to the 'crlf' attribute into separate routines, so that we can run a single call to checkattr() to check for more than one attributes, and then interprete what the returned settings mean separately. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 08:44:02 +02:00
			`setup_convert_check(check);`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`if (!git_checkattr(path, ARRAY_SIZE(check), check)) {`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`struct convert_driver *drv;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`crlf = git_path_check_crlf(path, check + 0);`
			`ident = git_path_check_ident(path, check + 1);`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`drv = git_path_check_convert(path, check + 2);`
			`if (drv && drv->smudge)`
			`filter = drv->smudge;`
convert.c: restructure the attribute checking part. This separates the checkattr() call and interpretation of the returned value specific to the 'crlf' attribute into separate routines, so that we can run a single call to checkattr() to check for more than one attributes, and then interprete what the returned settings mean separately. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 08:44:02 +02:00			`}`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`ret \|= ident_to_worktree(path, src, len, dst, ident);`
			`if (ret) {`
			`src = dst->buf;`
			`len = dst->len;`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`}`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`ret \|= crlf_to_worktree(path, src, len, dst, crlf);`
			`if (ret) {`
			`src = dst->buf;`
			`len = dst->len;`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`}`
Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2007-09-16 15:51:04 +02:00			`return ret \| apply_filter(path, src, len, dst, filter);`
Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-13 07:30:05 +02:00			`}`