mirrors/git - Incest Forge: Beyond sex. We incest.

mirrors/git

mirror of https://github.com/git/git.git synced 2024-10-30 05:47:53 +01:00

283 lines

7.8 KiB

Bash

Raw Normal View History

Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`#!/bin/sh`

			`test_description='blob conversion via gitattributes'`

			`. ./test-lib.sh`

t0021: use $SHELL_PATH for the filter script On Windows, we need the shbang line to correctly invoke shell scripts via a POSIX shell, except when the script is invoked via 'sh -c' because sh (a bash) does "the right thing". But the clean and smudge filters will not always be invoked via 'sh -c'; to futureproof, we should mark the the one in t0021-conversion with #!$SHELL_PATH. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-01-01 23:14:06 +01:00			`cat <<EOF >rot13.sh`
			`#!$SHELL_PATH`
t0021: tr portability fix for Solaris Solaris' /usr/bin/tr doesn't seem to like multiple character ranges in brackets (it simply prints "Bad string"). Instead, let's just enumerate the transformation we want. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2008-03-11 18:40:45 +01:00			`tr \`
			`'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' \`
			`'nopqrstuvwxyzabcdefghijklmNOPQRSTUVWXYZABCDEFGHIJKLM'`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`EOF`
			`chmod +x rot13.sh`

Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`test_expect_success setup '`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`git config filter.rot13.smudge ./rot13.sh &&`
			`git config filter.rot13.clean ./rot13.sh &&`

Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`{`
Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-21 12:14:13 +02:00			`echo "*.t filter=rot13"`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`echo "*.i ident"`
			`} >.gitattributes &&`

			`{`
			`echo a b c d e f g h i j k l m`
			`echo n o p q r s t u v w x y z`
Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-14 15:37:25 +02:00			`echo '\''$Id$'\''`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`} >test &&`
			`cat test >test.t &&`
			`cat test >test.o &&`
			`cat test >test.i &&`
			`git add test test.t test.i &&`
			`rm -f test test.t test.i &&`
			`git checkout -- test test.t test.i`
			`'`

Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-14 15:37:25 +02:00			`script='s/^\$Id: \([0-9a-f]*\) \$/\1/p'`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00
			`test_expect_success check '`

			`cmp test.o test &&`
			`cmp test.o test.t &&`

			`# ident should be stripped in the repository`
			`git diff --raw --exit-code :test :test.i &&`
			`id=$(git rev-parse --verify :test) &&`
			`embedded=$(sed -ne "$script" test.i) &&`
t0021-conversion.sh: Test that the clean filter really cleans content. This test uses a rot13 filter, which is its own inverse. It tested only that the content was the same as the original after both the 'clean' and the 'smudge' filter were applied. This way it would not detect whether any filter was run at all. Hence, here we add another test that checks that the repository contained content that was processed by the 'clean' filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> 2007-10-19 21:48:04 +02:00			`test "z$id" = "z$embedded" &&`

			`git cat-file blob :test.t > test.r &&`

			`./rot13.sh < test.o > test.t &&`
			`cmp test.r test.t`
Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`'`

Add test case for $Id$ expanded in the repository This test case would have caught the bug fixed by revision c23290d5. It puts various forms of $Id$ into a file in the repository, without allowing git to collapse them to uniformity. Then enables the $Id$ expansion on checkout, and checks that what is checked out has coped with the various forms. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-27 12:52:11 +02:00			`# If an expanded ident ever gets into the repository, we want to make sure that`
			`# it is collapsed before being expanded again on checkout`
			`test_expect_success expanded_in_repo '`
			`{`
			`echo "File with expanded keywords"`
			`echo "\$Id\$"`
			`echo "\$Id:\$"`
			`echo "\$Id: 0000000000000000000000000000000000000000 \$"`
			`echo "\$Id: NoSpaceAtEnd\$"`
			`echo "\$Id:NoSpaceAtFront \$"`
			`echo "\$Id:NoSpaceAtEitherEnd\$"`
			`echo "\$Id: NoTerminatingSymbol"`
convert: Safer handling of $Id$ contraction. The code to contract $Id:xxxxx$ strings could eat an arbitrary amount of source text if the terminating $ was lost. It now refuses to contract $Id:xxxxx$ strings spanning multiple lines. Signed-off-by: Henrik Grubbström <grubba@grubba.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-06 14:46:37 +02:00			`echo "\$Id: Foreign Commit With Spaces \$"`
t0021: test application of both crlf and ident Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-25 03:02:48 +02:00			`} >expanded-keywords.0 &&`
Add test case for $Id$ expanded in the repository This test case would have caught the bug fixed by revision c23290d5. It puts various forms of $Id$ into a file in the repository, without allowing git to collapse them to uniformity. Then enables the $Id$ expansion on checkout, and checks that what is checked out has coped with the various forms. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-27 12:52:11 +02:00
t0021: test application of both crlf and ident Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-25 03:02:48 +02:00			`{`
			`cat expanded-keywords.0 &&`
			`printf "\$Id: NoTerminatingSymbolAtEOF"`
			`} >expanded-keywords &&`
			`cat expanded-keywords >expanded-keywords-crlf &&`
			`git add expanded-keywords expanded-keywords-crlf &&`
t0021-conversion.sh: fix NoTerminatingSymbolAtEOF test The last line of the test file "expanded-keywords" ended in a newline, which is a valid terminator for ident. Use printf instead of echo to omit it and thus really test if a file that ends unexpectedly in the middle of an ident tag is handled properly. Also take the oppertunity to calculate the expected ID dynamically instead of hardcoding it into the test script. This should make future changes easier. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 23:25:06 +02:00			`git commit -m "File with keywords expanded" &&`
			`id=$(git rev-parse --verify :expanded-keywords) &&`

Add test case for $Id$ expanded in the repository This test case would have caught the bug fixed by revision c23290d5. It puts various forms of $Id$ into a file in the repository, without allowing git to collapse them to uniformity. Then enables the $Id$ expansion on checkout, and checks that what is checked out has coped with the various forms. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-27 12:52:11 +02:00			`{`
			`echo "File with expanded keywords"`
t0021-conversion.sh: fix NoTerminatingSymbolAtEOF test The last line of the test file "expanded-keywords" ended in a newline, which is a valid terminator for ident. Use printf instead of echo to omit it and thus really test if a file that ends unexpectedly in the middle of an ident tag is handled properly. Also take the oppertunity to calculate the expected ID dynamically instead of hardcoding it into the test script. This should make future changes easier. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 23:25:06 +02:00			`echo "\$Id: $id \$"`
			`echo "\$Id: $id \$"`
			`echo "\$Id: $id \$"`
			`echo "\$Id: $id \$"`
			`echo "\$Id: $id \$"`
			`echo "\$Id: $id \$"`
Add test case for $Id$ expanded in the repository This test case would have caught the bug fixed by revision c23290d5. It puts various forms of $Id$ into a file in the repository, without allowing git to collapse them to uniformity. Then enables the $Id$ expansion on checkout, and checks that what is checked out has coped with the various forms. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-27 12:52:11 +02:00			`echo "\$Id: NoTerminatingSymbol"`
convert: Keep foreign $Id$ on checkout. If there are foreign $Id$ keywords in the repository, they are most likely there for a reason. Let's keep them on checkout (which is also what the documentation indicates). Foreign $Id$ keywords are now recognized by there being multiple space separated fields in $Id:xxxxx$. Signed-off-by: Henrik Grubbström <grubba@grubba.org> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-04-06 14:46:38 +02:00			`echo "\$Id: Foreign Commit With Spaces \$"`
t0021: test application of both crlf and ident Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-25 03:02:48 +02:00			`} >expected-output.0 &&`
			`{`
			`cat expected-output.0 &&`
t0021-conversion.sh: fix NoTerminatingSymbolAtEOF test The last line of the test file "expanded-keywords" ended in a newline, which is a valid terminator for ident. Use printf instead of echo to omit it and thus really test if a file that ends unexpectedly in the middle of an ident tag is handled properly. Also take the oppertunity to calculate the expected ID dynamically instead of hardcoding it into the test script. This should make future changes easier. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-21 23:25:06 +02:00			`printf "\$Id: NoTerminatingSymbolAtEOF"`
t0021: test application of both crlf and ident Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-25 03:02:48 +02:00			`} >expected-output &&`
			`{`
			`append_cr <expected-output.0 &&`
			`printf "\$Id: NoTerminatingSymbolAtEOF"`
			`} >expected-output-crlf &&`
			`{`
			`echo "expanded-keywords ident"`
			`echo "expanded-keywords-crlf ident text eol=crlf"`
			`} >>.gitattributes &&`
Add test case for $Id$ expanded in the repository This test case would have caught the bug fixed by revision c23290d5. It puts various forms of $Id$ into a file in the repository, without allowing git to collapse them to uniformity. Then enables the $Id$ expansion on checkout, and checks that what is checked out has coped with the various forms. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-27 12:52:11 +02:00
t0021: test application of both crlf and ident Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-25 03:02:48 +02:00			`rm -f expanded-keywords expanded-keywords-crlf &&`
Add test case for $Id$ expanded in the repository This test case would have caught the bug fixed by revision c23290d5. It puts various forms of $Id$ into a file in the repository, without allowing git to collapse them to uniformity. Then enables the $Id$ expansion on checkout, and checks that what is checked out has coped with the various forms. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-27 12:52:11 +02:00
			`git checkout -- expanded-keywords &&`
t0021: test application of both crlf and ident Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-05-25 03:02:48 +02:00			`test_cmp expanded-keywords expected-output &&`

			`git checkout -- expanded-keywords-crlf &&`
			`test_cmp expanded-keywords-crlf expected-output-crlf`
Add test case for $Id$ expanded in the repository This test case would have caught the bug fixed by revision c23290d5. It puts various forms of $Id$ into a file in the repository, without allowing git to collapse them to uniformity. Then enables the $Id$ expansion on checkout, and checks that what is checked out has coped with the various forms. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-05-27 12:52:11 +02:00			`'`

convert filter: supply path to external driver Filtering to support keyword expansion may need the name of the file being filtered. In particular, to support p4 keywords like $File: //depot/product/dir/script.sh $ the smudge filter needs to know the name of the file it is smudging. Allow "%f" in the custom filter command line specified in the configuration. This will be substituted by the filename inside a single-quote pair to be passed to the shell. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-22 15:40:13 +01:00			`# The use of %f in a filter definition is expanded to the path to`
			`# the filename being smudged or cleaned. It must be shell escaped.`
			`# First, set up some interesting file names and pet them in`
			`# .gitattributes.`
			`test_expect_success 'filter shell-escaped filenames' '`
			`cat >argc.sh <<-EOF &&`
			`#!$SHELL_PATH`
t0021: avoid getting filter killed with SIGPIPE The fake filter did not read from the standard input at all, which caused the calling side to die with SIGPIPE, depending on the timing. Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-23 00:18:47 +01:00			`cat >/dev/null`
convert filter: supply path to external driver Filtering to support keyword expansion may need the name of the file being filtered. In particular, to support p4 keywords like $File: //depot/product/dir/script.sh $ the smudge filter needs to know the name of the file it is smudging. Allow "%f" in the custom filter command line specified in the configuration. This will be substituted by the filename inside a single-quote pair to be passed to the shell. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2010-12-22 15:40:13 +01:00			`echo argc: \$# "\$@"`
			`EOF`
			`normal=name-no-magic &&`
			`special="name with '\''sq'\'' and \$x" &&`
			`echo some test text >"$normal" &&`
			`echo some test text >"$special" &&`
			`git add "$normal" "$special" &&`
			`git commit -q -m "add files" &&`
			`echo "name* filter=argc" >.gitattributes &&`

			`# delete the files and check them out again, using a smudge filter`
			`# that will count the args and echo the command-line back to us`
			`git config filter.argc.smudge "sh ./argc.sh %f" &&`
			`rm "$normal" "$special" &&`
			`git checkout -- "$normal" "$special" &&`

			`# make sure argc.sh counted the right number of args`
			`echo "argc: 1 $normal" >expect &&`
			`test_cmp expect "$normal" &&`
			`echo "argc: 1 $special" >expect &&`
			`test_cmp expect "$special" &&`

			`# do the same thing, but with more args in the filter expression`
			`git config filter.argc.smudge "sh ./argc.sh %f --my-extra-arg" &&`
			`rm "$normal" "$special" &&`
			`git checkout -- "$normal" "$special" &&`

			`# make sure argc.sh counted the right number of args`
			`echo "argc: 2 $normal --my-extra-arg" >expect &&`
			`test_cmp expect "$normal" &&`
			`echo "argc: 2 $special --my-extra-arg" >expect &&`
			`test_cmp expect "$special" &&`
			`:`
			`'`

convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00			`test_expect_success 'required filter should filter data' '`
			`git config filter.required.smudge ./rot13.sh &&`
			`git config filter.required.clean ./rot13.sh &&`
Add a setting to require a filter to be successful By default, a missing filter driver or a failure from the filter driver is not an error, but merely makes the filter operation a no-op pass through. This is useful to massage the content into a shape that is more convenient for the platform, filesystem, and the user to use, and the content filter mechanism is not used to turn something unusable into usable. However, we could also use of the content filtering mechanism and store the content that cannot be directly used in the repository (e.g. a UUID that refers to the true content stored outside git, or an encrypted content) and turn it into a usable form upon checkout (e.g. download the external content, or decrypt the encrypted content). For such a use case, the content cannot be used when filter driver fails, and we need a way to tell Git to abort the whole operation for such a failing or missing filter driver. Add a new "filter.<driver>.required" configuration variable to mark the second use case. When it is set, git will abort the operation when the filter driver does not exist or exits with a non-zero status code. Signed-off-by: Jehan Bing <jehan@orb.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-17 02:19:03 +01:00			`git config filter.required.required true &&`

			`echo "*.r filter=required" >.gitattributes &&`

convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00			`cat test.o >test.r &&`
Add a setting to require a filter to be successful By default, a missing filter driver or a failure from the filter driver is not an error, but merely makes the filter operation a no-op pass through. This is useful to massage the content into a shape that is more convenient for the platform, filesystem, and the user to use, and the content filter mechanism is not used to turn something unusable into usable. However, we could also use of the content filtering mechanism and store the content that cannot be directly used in the repository (e.g. a UUID that refers to the true content stored outside git, or an encrypted content) and turn it into a usable form upon checkout (e.g. download the external content, or decrypt the encrypted content). For such a use case, the content cannot be used when filter driver fails, and we need a way to tell Git to abort the whole operation for such a failing or missing filter driver. Add a new "filter.<driver>.required" configuration variable to mark the second use case. When it is set, git will abort the operation when the filter driver does not exist or exits with a non-zero status code. Signed-off-by: Jehan Bing <jehan@orb.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-17 02:19:03 +01:00			`git add test.r &&`
convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00
Add a setting to require a filter to be successful By default, a missing filter driver or a failure from the filter driver is not an error, but merely makes the filter operation a no-op pass through. This is useful to massage the content into a shape that is more convenient for the platform, filesystem, and the user to use, and the content filter mechanism is not used to turn something unusable into usable. However, we could also use of the content filtering mechanism and store the content that cannot be directly used in the repository (e.g. a UUID that refers to the true content stored outside git, or an encrypted content) and turn it into a usable form upon checkout (e.g. download the external content, or decrypt the encrypted content). For such a use case, the content cannot be used when filter driver fails, and we need a way to tell Git to abort the whole operation for such a failing or missing filter driver. Add a new "filter.<driver>.required" configuration variable to mark the second use case. When it is set, git will abort the operation when the filter driver does not exist or exits with a non-zero status code. Signed-off-by: Jehan Bing <jehan@orb.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-17 02:19:03 +01:00			`rm -f test.r &&`
convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00			`git checkout -- test.r &&`
			`cmp test.o test.r &&`

			`./rot13.sh <test.o >expected &&`
			`git cat-file blob :test.r >actual &&`
			`cmp expected actual`
Add a setting to require a filter to be successful By default, a missing filter driver or a failure from the filter driver is not an error, but merely makes the filter operation a no-op pass through. This is useful to massage the content into a shape that is more convenient for the platform, filesystem, and the user to use, and the content filter mechanism is not used to turn something unusable into usable. However, we could also use of the content filtering mechanism and store the content that cannot be directly used in the repository (e.g. a UUID that refers to the true content stored outside git, or an encrypted content) and turn it into a usable form upon checkout (e.g. download the external content, or decrypt the encrypted content). For such a use case, the content cannot be used when filter driver fails, and we need a way to tell Git to abort the whole operation for such a failing or missing filter driver. Add a new "filter.<driver>.required" configuration variable to mark the second use case. When it is set, git will abort the operation when the filter driver does not exist or exits with a non-zero status code. Signed-off-by: Jehan Bing <jehan@orb.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2012-02-17 02:19:03 +01:00			`'`

			`test_expect_success 'required filter smudge failure' '`
			`git config filter.failsmudge.smudge false &&`
			`git config filter.failsmudge.clean cat &&`
			`git config filter.failsmudge.required true &&`

			`echo "*.fs filter=failsmudge" >.gitattributes &&`

			`echo test >test.fs &&`
			`git add test.fs &&`
			`rm -f test.fs &&`
			`test_must_fail git checkout -- test.fs`
			`'`

			`test_expect_success 'required filter clean failure' '`
			`git config filter.failclean.smudge cat &&`
			`git config filter.failclean.clean false &&`
			`git config filter.failclean.required true &&`

			`echo "*.fc filter=failclean" >.gitattributes &&`

			`echo test >test.fc &&`
			`test_must_fail git add test.fc`
			`'`

convert: stream from fd to required clean filter to reduce used address space The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. If it fails, the original data is not immediately available. The condition can easily be handled as a fatal error, which is expected for a required filter anyway. If the filter was not required, the condition would need to be handled in a different way, like seeking to 0 and reading the data. But this would require more restructuring of the code and is probably not worth it. The obvious approach of falling back to reading all data would not help achieving the main purpose of this patch, which is to handle large files with limited address space. If reading all data is an option, we can simply take the old code path right away and mmap the entire file. The environment variable GIT_MMAP_LIMIT, which has been introduced in a previous commit is used to test that the expected code path is taken. A related test that exercises required filters is modified to verify that the data actually has been modified on its way from the file system to the object store. Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2014-08-26 17:23:25 +02:00			`test_expect_success 'filtering large input to small output should use little memory' '`
			`git config filter.devnull.clean "cat >/dev/null" &&`
			`git config filter.devnull.required true &&`
			`for i in $(test_seq 1 30); do printf "%1048576d" 1; done >30MB &&`
			`echo "30MB filter=devnull" >.gitattributes &&`
			`GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add 30MB`
			`'`

filter_buffer_or_fd(): ignore EPIPE We are explicitly ignoring SIGPIPE, as we fully expect that the filter program may not read our output fully. Ignore EPIPE that may come from writing to it as well. A new test was stolen from Jeff's suggestion. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-05-19 20:08:23 +02:00			`test_expect_success 'filter that does not read is fine' '`
			`test-genrandom foo $((128 * 1024 + 1)) >big &&`
			`echo "big filter=epipe" >.gitattributes &&`
			`git config filter.epipe.clean "echo xyzzy" &&`
			`git add big &&`
			`git cat-file blob :big >actual &&`
			`echo xyzzy >expect &&`
			`test_cmp expect actual`
			`'`

xread, xwrite: limit size of IO to 8MB Checking out 2GB or more through an external filter (see test) fails on Mac OS X 10.8.4 (12E55) for a 64-bit executable with: error: read from external filter cat failed error: cannot feed the input to external filter cat error: cat died of signal 13 error: external filter cat failed 141 error: external filter cat failed The reason is that read() immediately returns with EINVAL when asked to read more than 2GB. According to POSIX [1], if the value of nbyte passed to read() is greater than SSIZE_MAX, the result is implementation-defined. The write function has the same restriction [2]. Since OS X still supports running 32-bit executables, the 32-bit limit (SSIZE_MAX = INT_MAX = 2GB - 1) seems to be also imposed on 64-bit executables under certain conditions. For write, the problem has been addressed earlier [6c642a]. Address the problem for read() and write() differently, by limiting size of IO chunks unconditionally on all platforms in xread() and xwrite(). Large chunks only cause problems, like causing latencies when killing the process, even if OS X was not buggy. Doing IO in reasonably sized smaller chunks should have no negative impact on performance. The compat wrapper clipped_write() introduced earlier [6c642a] is not needed anymore. It will be reverted in a separate commit. The new test catches read and write problems. Note that 'git add' exits with 0 even if it prints filtering errors to stderr. The test, therefore, checks stderr. 'git add' should probably be changed (sometime in another commit) to exit with nonzero if filtering fails. The test could then be changed to use test_must_fail. Thanks to the following people for suggestions and testing: Johannes Sixt <j6t@kdbg.org> John Keeping <john@keeping.me.uk> Jonathan Nieder <jrnieder@gmail.com> Kyle J. McKay <mackyle@gmail.com> Linus Torvalds <torvalds@linux-foundation.org> Torsten Bögershausen <tboegi@web.de> [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html [2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html [6c642a] commit 6c642a878688adf46b226903858b53e2d31ac5c3 compate/clipped-write.c: large write(2) fails on Mac OS X/XNU Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2013-08-20 08:43:54 +02:00			`test_expect_success EXPENSIVE 'filter large file' '`
			`git config filter.largefile.smudge cat &&`
			`git config filter.largefile.clean cat &&`
			`for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&`
			`echo "2GB filter=largefile" >.gitattributes &&`
			`git add 2GB 2>err &&`
			`! test -s err &&`
			`rm -f 2GB &&`
			`git checkout -- 2GB 2>err &&`
			`! test -s err`
			`'`

sha1_file: pass empty buffer to index empty file `git add` of an empty file with a filter pops complaints from `copy_fd` about a bad file descriptor. This traces back to these lines in sha1_file.c:index_core: if (!size) { ret = index_mem(sha1, NULL, size, type, path, flags); The problem here is that content to be added to the index can be supplied from an fd, or from a memory buffer, or from a pathname. This call is supplying a NULL buffer pointer and a zero size. Downstream logic takes the complete absence of a buffer to mean the data is to be found elsewhere -- for instance, these, from convert.c: if (params->src) { write_err = (write_in_full(child_process.in, params->src, params->size) < 0); } else { write_err = copy_fd(params->fd, child_process.in); } ~If there's a buffer, write from that, otherwise the data must be coming from an open fd.~ Perfectly reasonable logic in a routine that's going to write from either a buffer or an fd. So change `index_core` to supply an empty buffer when indexing an empty file. There's a patch out there that instead changes the logic quoted above to take a `-1` fd to mean "use the buffer", but it seems to me that the distinction between a missing buffer and an empty one carries intrinsic semantics, where the logic change is adapting the code to handle incorrect arguments. Signed-off-by: Jim Hill <gjthill@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2015-05-18 02:41:45 +02:00			`test_expect_success "filter: clean empty file" '`
			`git config filter.in-repo-header.clean "echo cleaned && cat" &&`
			`git config filter.in-repo-header.smudge "sed 1d" &&`

			`echo "empty-in-worktree filter=in-repo-header" >>.gitattributes &&`
			`>empty-in-worktree &&`

			`echo cleaned >expected &&`
			`git add empty-in-worktree &&`
			`git show :empty-in-worktree >actual &&`
			`test_cmp expected actual`
			`'`

			`test_expect_success "filter: smudge empty file" '`
			`git config filter.empty-in-repo.clean "cat >/dev/null" &&`
			`git config filter.empty-in-repo.smudge "echo smudged && cat" &&`

			`echo "empty-in-repo filter=empty-in-repo" >>.gitattributes &&`
			`echo dead data walking >empty-in-repo &&`
			`git add empty-in-repo &&`

			`echo smudged >expected &&`
			`git checkout-index --prefix=filtered- empty-in-repo &&`
			`test_cmp expected filtered-empty-in-repo`
			`'`

convert: treat an empty string for clean/smudge filters as "cat" Once a lower-priority configuration file defines a clean or smudge filter, there is no convenient way to override it to produce as-is output. Even though the configuration mechanism implements "the last one wins" semantics, you cannot set them to an empty string and expect them to work, as apply_filter() would try to run the empty string as an external command and fail. The conversion is not done, but the function would still report a failure to convert. Even though resetting the variable to "cat" (i.e. pass the data back as-is and report success) is an obvious and a viable way to solve this, it is wasteful to spawn an external process just as a workaround. Instead, teach apply_filter() to treat an empty string as a no-op filter that always returns successfully its input as-is without conversion. Signed-off-by: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-01-29 09:21:37 +01:00			`test_expect_success 'disable filter with empty override' '`
			`test_config_global filter.disable.smudge false &&`
			`test_config_global filter.disable.clean false &&`
			`test_config filter.disable.smudge false &&`
			`test_config filter.disable.clean false &&`

			`echo "*.disable filter=disable" >.gitattributes &&`

			`echo test >test.disable &&`
			`git -c filter.disable.clean= add test.disable 2>err &&`
			`test_must_be_empty err &&`
			`rm -f test.disable &&`
			`git -c filter.disable.smudge= checkout -- test.disable 2>err &&`
			`test_must_be_empty err`
			`'`

diff: do not reuse worktree files that need "clean" conversion When accessing a blob for a diff, we may try to reuse file contents in the working tree, under the theory that it is faster to mmap those file contents than it would be to extract the content from the object database. When we have to filter those contents, though, that assumption does not hold. Even for our internal conversions like CRLF, we have to allocate and fill a new buffer anyway. But much worse, for external clean filters we have to exec an arbitrary script, and we have no idea how expensive it may be to run. So let's skip this optimization when conversion into git's "clean" form is required. This applies whenever the "want_file" flag is false. When it's true, the caller actually wants the smudged worktree contents, which the reused file by definition already has (in fact, this is a key optimization going the other direction, since reusing the worktree file there lets us skip smudge filters). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-07-22 17:27:53 +02:00			`test_expect_success 'diff does not reuse worktree files that need cleaning' '`
			`test_config filter.counter.clean "echo . >>count; sed s/^/clean:/" &&`
			`echo "file filter=counter" >.gitattributes &&`
			`test_commit one file &&`
			`test_commit two file &&`

			`>count &&`
			`git diff-tree -p HEAD &&`
			`test_line_count = 0 count`
			`'`

Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net> 2007-04-22 04:09:02 +02:00			`test_done`