1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-11-05 08:47:56 +01:00
Commit graph

103 commits

Author SHA1 Message Date
Torsten Bögershausen
caa47adc5a convert.c: ident + core.autocrlf didn't work
When the ident attributes is set, get_stream_filter() did not obey
core.autocrlf=true, and the file was checked out with LF.

Change the rule when a streaming filter can be used:
- if an external filter is specified, don't use a stream filter.
- if the worktree eol is CRLF and "auto" is active, don't use a stream filter.
- Otherwise the stream filter can be used.

Add test cases in t0027.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-25 12:12:03 -07:00
Junio C Hamano
c6b94eb009 Merge branch 'tb/conversion'
Code simplification.

* tb/conversion:
  convert.c: correct attr_action()
  convert.c: simplify text_stat
  convert.c: refactor crlf_action
  convert.c: use text_eol_is_crlf()
  convert.c: remove input_crlf_action()
  convert.c: remove unused parameter 'path'
  t0027: add tests for get_stream_filter()
2016-02-26 13:37:23 -08:00
Torsten Bögershausen
817a0c7968 convert.c: correct attr_action()
df747b81 (convert.c: refactor crlf_action, 2016-02-10) introduced a
bug to "git ls-files --eol".

The "text" attribute was shown as "text eol=lf" or "text eol=crlf",
depending on core.autocrlf or core.eol.

Correct this and add test cases in t0027.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-23 12:53:15 -08:00
Torsten Bögershausen
6e336a530b convert.c: simplify text_stat
Simplify the statistics:
lonecr counts the CR which is not followed by a LF,
lonelf counts the LF which is not preceded by a CR,
crlf counts CRLF combinations.
This simplifies the evaluation of the statistics.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-10 15:54:20 -08:00
Torsten Bögershausen
df747b818f convert.c: refactor crlf_action
Refactor the determination and usage of crlf_action.
Today, when no "crlf" attribute are set on a file, crlf_action is set to
CRLF_GUESS. Use CRLF_UNDEFINED instead, and search for "text" or "eol" as
before.
After searching for line ending attributes, save the value in
struct conv_attrs.crlf_action attr_action,
so that get_convert_attr_ascii() is able report the attributes.

Replace the old CRLF_GUESS usage:
CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF
CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY
CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT

Save the action in conv_attrs.crlf_action (as before) and change
all callers.

Make more clear, what is what, by defining:

- CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf
                   and core.eol is evaluated and one of CRLF_BINARY,
                   CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected
- CRLF_BINARY    : No processing of line endings.
- CRLF_TEXT      : attribute "text" is set, line endings are processed.
- CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text.
- CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text.
- CRLF_AUTO      : attribute "auto" is set.
- CRLF_AUTO_INPUT: core.autocrlf=input (no attributes)
- CRLF_AUTO_CRLF : core.autocrlf=true  (no attributes)

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-10 15:53:35 -08:00
Junio C Hamano
a3764e7da7 Merge branch 'ls/clean-smudge-override-in-config'
Clean/smudge filters defined in a configuration file of lower
precedence can now be overridden to be a pass-through no-op by
setting the variable to an empty string.

* ls/clean-smudge-override-in-config:
  convert: treat an empty string for clean/smudge filters as "cat"
2016-02-10 14:20:07 -08:00
Torsten Bögershausen
4b4024f5dd convert.c: use text_eol_is_crlf()
Add a helper function to find out, which line endings text files
should get at checkout, depending on core.autocrlf and core.eol
configuration variables.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-08 10:02:12 -08:00
Torsten Bögershausen
bb211b4de8 convert.c: remove input_crlf_action()
Integrate the code of input_crlf_action() into convert_attrs(),
so that ca.crlf_action is always valid after calling convert_attrs().
Keep a copy of crlf_action in attr_action, this is needed for
get_convert_attr_ascii().

Remove eol_attr from struct conv_attrs, as it is now used temporally.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-08 10:01:40 -08:00
Torsten Bögershausen
92cce1355e convert.c: remove unused parameter 'path'
Some functions get a parameter path, but don't use it.
Remove the unused parameter.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-08 09:59:43 -08:00
Lars Schneider
1a8630dc3b convert: treat an empty string for clean/smudge filters as "cat"
Once a lower-priority configuration file defines a clean or smudge
filter, there is no convenient way to override it to produce as-is
output.  Even though the configuration mechanism implements "the
last one wins" semantics, you cannot set them to an empty string and
expect them to work, as apply_filter() would try to run the empty
string as an external command and fail.  The conversion is not done,
but the function would still report a failure to convert.

Even though resetting the variable to "cat" (i.e. pass the data back
as-is and report success) is an obvious and a viable way to solve
this, it is wasteful to spawn an external process just as a
workaround.

Instead, teach apply_filter() to treat an empty string as a no-op
filter that always returns successfully its input as-is without
conversion.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-29 11:04:27 -08:00
Torsten Bögershausen
a7630bd427 ls-files: add eol diagnostics
When working in a cross-platform environment, a user may want to
check if text files are stored normalized in the repository and
if .gitattributes are set appropriately.

Make it possible to let Git show the line endings in the index and
in the working tree and the effective text/eol attributes.

The end of line ("eolinfo") are shown like this:

    "-text"        binary (or with bare CR) file
    "none"         text file without any EOL
    "lf"           text file with LF
    "crlf"         text file with CRLF
    "mixed"        text file with mixed line endings.

The effective text/eol attribute is one of these:

    "", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf"

git ls-files --eol gives an output like this:

    i/none   w/none   attr/text=auto      t/t5100/empty
    i/-text  w/-text  attr/-text          t/test-binary-2.png
    i/lf     w/lf     attr/text eol=lf    t/t5100/rfc2047-info-0007
    i/lf     w/crlf   attr/text eol=crlf  doit.bat
    i/mixed  w/mixed  attr/               locale/XX.po

to show what eol convention is used in the data in the index ('i'),
and in the working tree ('w'), and what attribute is in effect,
for each path that is shown.

Add test cases in t0027.

Helped-By: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-18 19:48:43 -08:00
Jeff King
5096d4909f convert trivial sprintf / strcpy calls to xsnprintf
We sometimes sprintf into fixed-size buffers when we know
that the buffer is large enough to fit the input (either
because it's a constant, or because it's numeric input that
is bounded in size). Likewise with strcpy of constant
strings.

However, these sites make it hard to audit sprintf and
strcpy calls for buffer overflows, as a reader has to
cross-reference the size of the array with the input. Let's
use xsnprintf instead, which communicates to a reader that
we don't expect this to overflow (and catches the mistake in
case we do).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Junio C Hamano
0c4dd67a04 filter_buffer_or_fd(): ignore EPIPE
We are explicitly ignoring SIGPIPE, as we fully expect that the
filter program may not read our output fully.  Ignore EPIPE that
may come from writing to it as well.

A new test was stolen from Jeff's suggestion.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 10:19:12 -07:00
Junio C Hamano
f0d8900175 Merge branch 'sp/stream-clean-filter'
When running a required clean filter, we do not have to mmap the
original before feeding the filter.  Instead, stream the file
contents directly to the filter and process its output.

* sp/stream-clean-filter:
  sha1_file: don't convert off_t to size_t too early to avoid potential die()
  convert: stream from fd to required clean filter to reduce used address space
  copy_fd(): do not close the input file descriptor
  mmap_limit: introduce GIT_MMAP_LIMIT to allow testing expected mmap size
  memory_limit: use git_env_ulong() to parse GIT_ALLOC_LIMIT
  config.c: add git_env_ulong() to parse environment variable
  convert: drop arguments other than 'path' from would_convert_to_git()
2014-10-08 13:05:32 -07:00
Steffen Prohaska
9035d75a2b convert: stream from fd to required clean filter to reduce used address space
The data is streamed to the filter process anyway.  Better avoid mapping
the file if possible.  This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media.  The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit.  The new
implementation can filter files of any size as long as the filter output
is small enough.

The new code path is only taken if the filter is required.  The filter
consumes data directly from the fd.  If it fails, the original data is
not immediately available.  The condition can easily be handled as
a fatal error, which is expected for a required filter anyway.

If the filter was not required, the condition would need to be handled
in a different way, like seeking to 0 and reading the data.  But this
would require more restructuring of the code and is probably not worth
it.  The obvious approach of falling back to reading all data would not
help achieving the main purpose of this patch, which is to handle large
files with limited address space.  If reading all data is an option, we
can simply take the old code path right away and mmap the entire file.

The environment variable GIT_MMAP_LIMIT, which has been introduced in
a previous commit is used to test that the expected code path is taken.
A related test that exercises required filters is modified to verify
that the data actually has been modified on its way from the file system
to the object store.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-28 10:25:15 -07:00
René Scharfe
d318027932 run-command: introduce CHILD_PROCESS_INIT
Most struct child_process variables are cleared using memset first after
declaration.  Provide a macro, CHILD_PROCESS_INIT, that can be used to
initialize them statically instead.  That's shorter, doesn't require a
function call and is slightly more readable (especially given that we
already have STRBUF_INIT, ARGV_ARRAY_INIT etc.).

Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-20 09:53:37 -07:00
Jeff King
ae021d8791 use skip_prefix to avoid magic numbers
It's a common idiom to match a prefix and then skip past it
with a magic number, like:

  if (starts_with(foo, "bar"))
	  foo += 3;

This is easy to get wrong, since you have to count the
prefix string yourself, and there's no compiler check if the
string changes.  We can use skip_prefix to avoid the magic
numbers here.

Note that some of these conversions could be much shorter.
For example:

  if (starts_with(arg, "--foo=")) {
	  bar = arg + 6;
	  continue;
  }

could become:

  if (skip_prefix(arg, "--foo=", &bar))
	  continue;

However, I have left it as:

  if (skip_prefix(arg, "--foo=", &v)) {
	  bar = v;
	  continue;
  }

to visually match nearby cases which need to actually
process the string. Like:

  if (skip_prefix(arg, "--foo=", &v)) {
	  bar = atoi(v);
	  continue;
  }

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20 10:44:45 -07:00
Christian Couder
5955654823 replace {pre,suf}fixcmp() with {starts,ends}_with()
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.

The change can be recreated by mechanically applying this:

    $ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
      grep -v strbuf\\.c |
      xargs perl -pi -e '
        s|!prefixcmp\(|starts_with\(|g;
        s|prefixcmp\(|!starts_with\(|g;
        s|!suffixcmp\(|ends_with\(|g;
        s|suffixcmp\(|!ends_with\(|g;
      '

on the result of preparatory changes in this series.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05 14:13:21 -08:00
Ondřej Bílka
749f763dbb typofix: in-code comments
Signed-off-by: Ondřej Bílka <neleai@seznam.cz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-22 16:06:49 -07:00
Junio C Hamano
4b35b007a6 Merge branch 'lf/read-blob-data-from-index'
Reduce duplicated code between convert.c and attr.c.

* lf/read-blob-data-from-index:
  convert.c: remove duplicate code
  read_blob_data_from_index(): optionally return the size of blob data
  attr.c: extract read_index_data() as read_blob_data_from_index()
2013-04-21 18:39:45 -07:00
Lukas Fleischer
4982fd78f6 convert.c: remove duplicate code
The has_cr_in_index() function is an almost 1:1 copy of
read_blob_data_from_index() with some additions.  Use the
latter instead of using copy-pasted code.

Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 09:52:33 -07:00
Jeff King
d731f0ade1 convert some config callbacks to parse_config_key
These callers can drop some inline pointer arithmetic and
magic offset constants, making them more readable and less
error-prone (those constants had to match the lengths of
strings, but there is no automatic verification of that
fact).

The "ep" pointer (presumably for "end pointer"), which
points to the final key segment of the config variable, is
given the more standard name "key" to describe its function
rather than its derivation.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-23 08:41:50 -08:00
Junio C Hamano
524ee675a3 Merge branch 'jb/required-filter'
* jb/required-filter:
  Add a setting to require a filter to be successful

Conflicts:
	convert.c
2012-02-28 13:25:57 -08:00
Junio C Hamano
31e3d834b3 Merge branch 'jk/maint-avoid-streaming-filtered-contents'
* jk/maint-avoid-streaming-filtered-contents:
  do not stream large files to pack when filters are in use
  teach dry-run convert_to_git not to require a src buffer
  teach convert_to_git a "dry run" mode
2012-02-26 23:05:38 -08:00
Jeff King
4c3b57b98b teach dry-run convert_to_git not to require a src buffer
When we call convert_to_git in dry-run mode, it may still
want to look at the source buffer, because some CRLF
conversion modes depend on analyzing the source to determine
whether it is in fact convertible CRLF text.

However, the main motivation for convert_to_git's dry-run
mode is that we would decide which method to use to acquire
the blob's data (streaming versus in-core). Requiring this
source analysis creates a chicken-and-egg problem. We are
better off simply guessing that anything we can't analyze
will end up needing conversion.

This patch lets a caller specify a NULL src buffer when
using dry-run mode (and only dry-run mode). A non-zero
return value goes from "we would convert" to "we might
convert"; a zero return value remains "we would definitely
not convert".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-24 14:12:19 -08:00
Jeff King
92ac3197e4 teach convert_to_git a "dry run" mode
Some callers may want to know whether convert_to_git will
actually do anything before performing the conversion
itself (e.g., to decide whether to stream or handle blobs
in-core). This patch lets callers specify the dry run mode
by passing a NULL destination buffer. The return value,
instead of indicating whether conversion happened, will
indicate whether conversion would occur.

For readability, we also include a wrapper function which
makes it more obvious we are not actually performing the
conversion.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-24 14:11:27 -08:00
Jehan Bing
6424c2ad12 Ignore SIGPIPE when running a filter driver
If a filter is not defined or if it fails, git should behave as if the
filter is a no-op passthru.

However, if the filter exits before reading all the content, depending on
the timing, git could be killed with SIGPIPE when it tries to write to the
pipe connected to the filter.

Ignore SIGPIPE while processing the filter to give us a chance to check
the return value from a failed write, in order to detect and act on this
mode of failure in a more controlled way.

Signed-off-by: Jehan Bing <jehan@orb.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-21 12:48:09 -08:00
Jehan Bing
36daaaca00 Add a setting to require a filter to be successful
By default, a missing filter driver or a failure from the filter driver is
not an error, but merely makes the filter operation a no-op pass through.
This is useful to massage the content into a shape that is more convenient
for the platform, filesystem, and the user to use, and the content filter
mechanism is not used to turn something unusable into usable.

However, we could also use of the content filtering mechanism and store
the content that cannot be directly used in the repository (e.g. a UUID
that refers to the true content stored outside git, or an encrypted
content) and turn it into a usable form upon checkout (e.g. download the
external content, or decrypt the encrypted content).  For such a use case,
the content cannot be used when filter driver fails, and we need a way to
tell Git to abort the whole operation for such a failing or missing filter
driver.

Add a new "filter.<driver>.required" configuration variable to mark the
second use case.  When it is set, git will abort the operation when the
filter driver does not exist or exits with a non-zero status code.

Signed-off-by: Jehan Bing <jehan@orb.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-17 07:37:08 -08:00
Junio C Hamano
23838b8a15 Merge branch 'jc/maint-lf-to-crlf-keep-crlf' into maint
* jc/maint-lf-to-crlf-keep-crlf:
  lf_to_crlf_filter(): resurrect CRLF->CRLF hack
2011-12-28 11:42:27 -08:00
Junio C Hamano
339aff0846 Merge branch 'jc/maint-lf-to-crlf-keep-crlf'
* jc/maint-lf-to-crlf-keep-crlf:
  lf_to_crlf_filter(): resurrect CRLF->CRLF hack
2011-12-22 11:27:29 -08:00
Junio C Hamano
3bb8d69cdd Merge branch 'cn/maint-lf-to-crlf-filter' into maint
* cn/maint-lf-to-crlf-filter:
  lf_to_crlf_filter(): tell the caller we added "\n" when draining
  convert: track state in LF-to-CRLF filter
2011-12-21 11:42:44 -08:00
Junio C Hamano
8496f56873 lf_to_crlf_filter(): resurrect CRLF->CRLF hack
The non-streaming version of the filter counts CRLF and LF in the whole
buffer, and returns without doing anything when they match (i.e. what is
recorded in the object store already uses CRLF). This was done to help
people who added files from the DOS world before realizing they want to go
cross platform and adding .gitattributes to tell Git that they only want
CRLF in their working tree.

The streaming version of the filter does not want to read the whole thing
before starting to work, as that defeats the whole point of streaming. So
we instead check what byte follows CR whenever we see one, and add CR
before LF only when the LF does not immediately follow CR already to keep
CRLF as is.

Reported-and-tested-by: Ralf Thielow
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-18 20:40:41 -08:00
Junio C Hamano
87afe9a5ed lf_to_crlf_filter(): tell the caller we added "\n" when draining
This can only happen when the input size is multiple of the
buffer size of the cascade filter (16k) and ends with an LF,
but in such a case, the code forgot to tell the caller that
it added the "\n" it could not add during the last round.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-16 14:39:37 -08:00
Carlos Martín Nieto
284e3d280e convert: track state in LF-to-CRLF filter
There may not be enough space to store CRLF in the output. If we don't
fill the buffer, then the filter will keep getting called with the same
short buffer and will loop forever.

Instead, always store the CR and record whether there's a missing LF
if so we store it in the output buffer the next time the function gets
called.

Reported-by: Henrik Grubbström <grubba@roxen.com>
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-28 11:30:34 -08:00
Ramsay Jones
ef563de6dd convert.c: Fix return type of git_path_check_eol()
The git_path_check_eol() function converts a string value to the
corresponding 'enum eol' value. However, the function is currently
declared to return an 'enum crlf_action', which causes sparse to
complain thus:

        SP convert.c
    convert.c:736:50: warning: mixing different enum types
    convert.c:736:50:     int enum crlf_action  versus
    convert.c:736:50:     int enum eol

In order to suppress the warning, we simply correct the return type
in the function declaration.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-21 11:00:57 -08:00
Ramkumar Ramachandra
7356b51e4b convert: don't mix enum with int
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-15 16:09:02 -08:00
Junio C Hamano
8a72864426 Merge branch 'tr/maint-ident-to-git-memmove'
* tr/maint-ident-to-git-memmove:
  Use memmove in ident_to_git
2011-09-02 13:18:25 -07:00
Thomas Rast
7732118438 Use memmove in ident_to_git
convert_to_git sets src=dst->buf if any of the preceding conversions
actually did any work.  Thus in ident_to_git we have to use memmove
instead of memcpy as far as src->dst copying is concerned.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-29 15:23:22 -07:00
Michael Haggerty
d932f4eb9f Rename git_checkattr() to git_check_attr()
Suggested by: Junio Hamano <gitster@pobox.com>

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-04 15:53:21 -07:00
Junio C Hamano
a265a7f95e streaming: filter cascading
This implements an internal "cascade" filter mechanism that plugs
two filters in series.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26 16:47:15 -07:00
Junio C Hamano
b84c783917 streaming filter: ident filter
Add support for "ident" filter on the output codepath. This does not work
with lf-to-crlf filter together (yet).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26 16:47:15 -07:00
Junio C Hamano
e322ee38ad Add LF-to-CRLF streaming conversion
If we do not have to guess or validate by scanning the input, we can
just stream this through.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26 16:47:15 -07:00
Junio C Hamano
4ae6670444 stream filter: add "no more input" to the filters
Some filters may need to buffer the input and look-ahead inside it
to decide what to output, and they may consume more than zero bytes
of input and still not produce any output. After feeding all the
input, pass NULL as input as keep calling stream_filter() to let
such filters know there is no more input coming, and it is time for
them to produce the remaining output based on the buffered input.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26 16:47:15 -07:00
Junio C Hamano
b6691092d7 Add streaming filter API
This introduces an API to plug custom filters to an input stream.

The caller gets get_stream_filter("path") to obtain an appropriate
filter for the path, and then uses it when opening an input stream
via open_istream().  After that, the caller can read from the stream
with read_istream(), and close it with close_istream(), just like an
unfiltered stream.

This only adds a "null" filter that is a pass-thru filter, but later
changes can add LF-to-CRLF and other filters, and the callers of the
streaming API do not have to change.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26 16:47:15 -07:00
Junio C Hamano
b0d9c69f5e convert: CRLF_INPUT is a no-op in the output codepath
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 23:16:53 -07:00
Junio C Hamano
dd8e912190 streaming_write_entry(): use streaming API in write_entry()
When the output to a path does not have to be converted, we can read from
the object database from the streaming API and write to the file in the
working tree, without having to hold everything in the memory.

The ident, auto- and safe- crlf conversions inherently require you to read
the whole thing before deciding what to do, so while it is technically
possible to support them by using a buffer of an unbound size or rewinding
and reading the stream twice, it is less practical than the traditional
"read the whole thing in core and convert" approach.

Adding streaming filters for the other conversions on top of this should
be doable by tweaking the can_bypass_conversion() function (it should be
renamed to can_filter_stream() when it happens). Then the streaming API
can be extended to wrap the git_istream streaming_write_entry() opens on
the underlying object in another git_istream that reads from it, filters
what is read, and let the streaming_write_entry() read the filtered
result. But that is outside the scope of this series.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 18:46:58 -07:00
Junio C Hamano
3bfba20dae convert: make it harder to screw up adding a conversion attribute
The current internal API requires the callers of setup_convert_check() to
supply the git_attr_check structures (hence they need to know how many to
allocate), but they grab the same set of attributes for given path.

Define a new convert_attrs() API that fills a higher level information that
the callers (convert_to_git and convert_to_working_tree) really want, and
move the common code to interact with the attributes system to it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 14:59:09 -07:00
Junio C Hamano
83295964b3 convert: make it safer to add conversion attributes
The places that need to pass an array of "struct git_attr_check" needed to
be careful to pass a large enough array and know what index each element
lied.  Make it safer and easier to code these.

Besides, the hard-coded sequence of initializing various attributes was
too ugly after we gained more than a few attributes.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 14:59:09 -07:00
Junio C Hamano
c61dcff9d6 convert: give saner names to crlf/eol variables, types and functions
Back when the conversion was only about the end-of-line convention, it
might have made sense to call what we do upon seeing CR/LF simply an
"action", but these days the conversion routines do a lot more than just
tweaking the line ending.  Raname "action" to "crlf_action".

The function that decides what end of line conversion to use on the output
codepath was called "determine_output_conversion", as if there is no other
kind of output conversion.  Rename it to "output_eol"; it is a function
that returns what EOL convention is to be used.

A function that decides what "crlf_action" needs to be used on the input
codepath, given what conversion attribute is set to the path and global
end-of-line convention, was called "determine_action".  Rename it to
"input_crlf_action".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 14:59:09 -07:00
Junio C Hamano
ec70f52f6f convert: rename the "eol" global variable to "core_eol"
Yes, it is clear that "eol" wants to mean some sort of end-of-line thing,
but as the name of a global variable, it is way too short to describe what
kind of end-of-line thing it wants to represent. Besides, there are many
codepaths that want to use their own local "char *eol" variable to point
at the end of the current line they are processing.

This global variable holds what we read from core.eol configuration
variable. Name it as such.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 14:58:52 -07:00