1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-11-02 07:17:58 +01:00
Commit graph

40869 commits

Author SHA1 Message Date
Junio C Hamano
669b963af2 mailinfo: handle charset conversion errors in the caller
Instead of dying in convert_to_utf8(), just report an error and let
the callers handle it.  Between the two callers:

 - decode_header() silently punts when it cannot parse a broken
   RFC2047 encoded text (e.g. when it sees anything other than B or
   Q after it sees "=?<charset>") by jumping to release_return,
   returning the string it successfully parsed out so far, to the
   caller.  A piece of string that convert_to_utf8() cannot handle
   can be treated the same way.

 - handle_commit_msg() doesn't cope with a malformed line well, so
   die there for now.  We'll lift this even higher in later changes
   in this series.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:59:34 -07:00
Junio C Hamano
c6905e45f0 mailinfo: libify
Move the bulk of the code from builtin/mailinfo.c to mailinfo.c
so that new callers can start calling mailinfo() directly.

Note that a few calls to exit() and die() need to be cleaned up
for the API to be truly useful, which will come in later steps.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:59:34 -07:00
Junio C Hamano
05e625e5bf mailinfo: keep the parsed log message in a strbuf
When mailinfo() is eventually libified, the calling "git am" still
will have to write out the log message in the "msg" file for hooks
and other users of the information, but it does not have to reopen
and reread what it wrote earlier if the function kept it in a strbuf.

This also removes the need for seeking and truncating the output
file when we see a scissors mark in the input, which in turn allows
us to lose two callsites of die_errno().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:57:17 -07:00
Junio C Hamano
4933910ab7 mailinfo: handle_commit_msg() shouldn't be called after finding patchbreak
There is a strange "if (!mi->cmitmsg) return 0" at the very beginning
of handle_commit_msg(), but the condition should never trigger, because:

 * The only place cmitmsg is set to NULL is after this function sees
   a patch break, closes the FILE * to write the commit log message
   and returns 1.  This function returns non-zero only from that
   codepath.

 * The caller of this function, upon seeing a non-zero return,
   increments filter_stage, starts treating the input as patch text
   and will never call handle_commit_msg() again.

Replace it with an assert(!mi->filter_stage) to ensure the above
observation will stay to be true.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:57:17 -07:00
Junio C Hamano
8e919277e0 mailinfo: move content/content_top to struct mailinfo
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:57:17 -07:00
Junio C Hamano
d895bf0f57 mailinfo: move [ps]_hdr_data to struct mailinfo
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:56:17 -07:00
Junio C Hamano
8f63588a6e mailinfo: move cmitmsg and patchfile to struct mailinfo
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:55:01 -07:00
Junio C Hamano
f1e037b9af mailinfo: move charset to struct mailinfo
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:55:01 -07:00
Junio C Hamano
ab50e38b5d mailinfo: move transfer_encoding to struct mailinfo
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:53:25 -07:00
Junio C Hamano
28c6bfe94c mailinfo: move check for metainfo_charset to convert_to_utf8()
All callers of this function refrain from calling it when
mi->metainfo_charset is NULL; move the check to the callee,
as it already has a few conditions at its beginning to turn
it into a no-op.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:50:17 -07:00
Junio C Hamano
28be2d083c mailinfo: move metainfo_charset to struct mailinfo
This requires us to pass the struct down to decode_header() and
convert_to_utf8() callchain.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:50:17 -07:00
Junio C Hamano
ad57ef9da9 mailinfo: move use_scissors and use_inbody_headers to struct mailinfo
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:42:57 -07:00
Junio C Hamano
6200b751bb mailinfo: move add_message_id and message_id to struct mailinfo
This requires us to pass the structure into check_header() codepath.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:42:57 -07:00
Junio C Hamano
43550efa71 mailinfo: move patch_lines to struct mailinfo
This one is trivial thanks to previous steps that started passing
the structure throughout the input codepaths.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:39:01 -07:00
Junio C Hamano
13c6df2642 mailinfo: move filter/header stage to struct mailinfo
Earlier we got rid of two function-scope static variables that kept
track of the states of helper functions by making them extra arguments
that are passed throughout the callchain.  Now we have a convenient
place to store and pass them around in the form of "struct mailinfo",
change them into two fields in the struct.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:39:01 -07:00
Junio C Hamano
173aef7c2e mailinfo: move global "FILE *fin, *fout" to struct mailinfo
This requires us to pass "struct mailinfo" to more functions
throughout the codepath that read input lines.  Incidentally,
later steps are helped by this patch passing the struct to
more callchains.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:39:01 -07:00
Junio C Hamano
849106d511 mailinfo: move keep_subject & keep_non_patch_bracket to struct mailinfo
These two are the only easy ones that do not require passing the
structure around to deep corners of the callchain.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:37:53 -07:00
Junio C Hamano
c69f2395ba mailinfo: introduce "struct mailinfo" to hold globals
In this first step, move only 'email' and 'name' fields in there and
remove the corresponding globals.  In subsequent patches, more
globals will be moved to this and the structure will be passed
around as a new parameter to more functions.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:37:52 -07:00
Junio C Hamano
6e21b5089f mailinfo: move global "line" into mailinfo() function
With the previous steps, it becomes clear that the mailinfo()
function is the only one that wants the "line" to be directly
touchable.  Move it to the function scope of this function.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:37:52 -07:00
Junio C Hamano
fbbcafd060 mailinfo: do not let find_boundary() touch global "line" directly
With the previous two commits, we established that the local
variable "line" in handle_body() and handle_boundary() functions
always refer to the global "line" that is used as the common and
shared "current line from the input".  They are the only callers of
the last function that refers to the global line directly, i.e.
find_boundary().  Pass "line" as a parameter to this leaf function
to complete the clean-up.  Now the only function that directly refers
to the global "line" is the caller of handle_body() at the very
beginning of this whole callchain.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:37:50 -07:00
Junio C Hamano
69e24defd6 mailinfo: do not let handle_boundary() touch global "line" directly
This function has a single caller, and called with the global "line"
holding the multi-part boundary line the caller saw while processing
the e-mail body.  The function then goes into a loop to process each
line of the input, and fills the same global "line" variable from
the input as it needs to read more lines to process the multi-part
headers.

Let the caller explicitly pass a pointer to this global "line"
variable as an argument, and have the function itself use that
strbuf throughout, instead of referring to the global "line" itself.

There still is a helper function that this function calls that still
touches the global directly; it will be updated as the series progresses.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:36:37 -07:00
Junio C Hamano
fde00d50f6 mailinfo: do not let handle_body() touch global "line" directly
This function has a single caller, and called with the global "line"
holding the first line of the e-mail body after the caller finished
processing the e-mail headers.  The function then goes into a loop
to process each line of the input, starting from what was given by
its caller, and fills the same global "line" variable from the input
as it needs to process more lines.

Let the caller explicitly pass a pointer to this global "line"
variable as an argument, and have the function itself use that
strbuf throughout, instead of referring to the global "line" itself.

There are helper functions that this function calls that still touch
the global directly; they will be updated as the series progresses.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:36:37 -07:00
Junio C Hamano
269e239c48 mailinfo: get rid of function-local static states
Two helper functions use "static int" in their scope to keep track
of the state while repeatedly getting called once for each input
line.  Move these state variables to their ultimate caller and pass
down pointers to them along the callchain, as a small step in
preparation for making this entire callchain more reentrant.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:36:37 -07:00
Junio C Hamano
c1b40bd7b6 mailinfo: move definition of MAX_HDR_PARSED closer to its use
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:34:49 -07:00
Junio C Hamano
30f50c3426 mailinfo: move cleanup_space() before its users
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:33:39 -07:00
Junio C Hamano
4f0f9d46c7 mailinfo: move check_header() after the helpers it uses
This way, we can lose a forward decl for decode_header().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:32:43 -07:00
Junio C Hamano
9cc243f7a9 mailinfo: move read_one_header_line() closer to its callers
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:30:15 -07:00
Junio C Hamano
39afcd3819 mailinfo: move handle_boundary() lower
This function wants to call find_boundary() and is called only from
one place without any recursing, so it becomes easier to read if it
appears after the called function.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:20:49 -07:00
Junio C Hamano
12d19e80b0 mailinfo: plug strbuf leak during continuation line handling
Whether this loop is left via EOF/break or upon finding a
non-continuation line, the storage used for the contination line
handling is left behind.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21 15:18:50 -07:00
Junio C Hamano
e38ee06e99 mailinfo: explicitly close file handle to the patch output
This does not make a difference within the context of "git mailinfo"
that runs once and exits, as flushing and closing would happen upon
process termination.  It however will matter when we eventually make
it callable as an API function.

Besides, cleaning after yourself once you are done is a good hygiene.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-18 22:13:27 -07:00
Junio C Hamano
b6af8ed13a mailinfo: fix an off-by-one error in the boundary stack
We pre-increment the pointer that we will use to store something at,
so the pointer is already beyond the end of the array if it points
at content[MAX_BOUNDARIES].

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-18 22:13:27 -07:00
Junio C Hamano
3a8fcdaf84 mailinfo: fold decode_header_bq() into decode_header()
In olden days we might have wanted to behave differently in
decode_header() if the header line was encoded with RFC2047, but we
apparently do not do so, hence this helper function can go, together
with its return value.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-18 22:13:27 -07:00
Junio C Hamano
2a5ce7cf0d mailinfo: remove a no-op call convert_to_utf8(it, "")
The called function checks if the second parameter is either a NULL
or an empty string at the very beginning and returns without doing
anything.  Remove the useless call.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-18 22:13:27 -07:00
Junio C Hamano
22f698cb18 Git 2.6.1
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-28 19:19:34 -07:00
Junio C Hamano
3adc4ec7b9 Sync with v2.5.4 2015-09-28 19:16:54 -07:00
Junio C Hamano
24358560c3 Git 2.5.4
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-28 15:34:28 -07:00
Junio C Hamano
11a458befc Sync with 2.4.10 2015-09-28 15:33:56 -07:00
Junio C Hamano
a2558fb8e1 Git 2.4.10
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-28 15:30:30 -07:00
Junio C Hamano
6343e2f6f2 Sync with 2.3.10 2015-09-28 15:28:31 -07:00
Junio C Hamano
18b58f707f Git 2.3.10
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-28 15:26:52 -07:00
Junio C Hamano
92cdfd2131 Merge branch 'jk/xdiff-memory-limits' into maint-2.3 2015-09-28 14:59:28 -07:00
Jeff King
83c4d38017 merge-file: enforce MAX_XDIFF_SIZE on incoming files
The previous commit enforces MAX_XDIFF_SIZE at the
interfaces to xdiff: xdi_diff (which calls xdl_diff) and
ll_xdl_merge (which calls xdl_merge).

But we have another direct call to xdl_merge in
merge-file.c. If it were written today, this probably would
just use the ll_merge machinery. But it predates that code,
and uses slightly different options to xdl_merge (e.g.,
ZEALOUS_ALNUM).

We could try to abstract out an xdi_merge to match the
existing xdi_diff, but even that is difficult. Rather than
simply report error, we try to treat large files as binary,
and that distinction would happen outside of xdi_merge.

The simplest fix is to just replicate the MAX_XDIFF_SIZE
check in merge-file.c.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-28 14:58:13 -07:00
Jeff King
dcd1742e56 xdiff: reject files larger than ~1GB
The xdiff code is not prepared to handle extremely large
files. It uses "int" in many places, which can overflow if
we have a very large number of lines or even bytes in our
input files. This can cause us to produce incorrect diffs,
with no indication that the output is wrong. Or worse, we
may even underallocate a buffer whose size is the result of
an overflowing addition.

We're much better off to tell the user that we cannot diff
or merge such a large file. This patch covers both cases,
but in slightly different ways:

  1. For merging, we notice the large file and cleanly fall
     back to a binary merge (which is effectively "we cannot
     merge this").

  2. For diffing, we make the binary/text distinction much
     earlier, and in many different places. For this case,
     we'll use the xdi_diff as our choke point, and reject
     any diff there before it hits the xdiff code.

     This means in most cases we'll die() immediately after.
     That's not ideal, but in practice we shouldn't
     generally hit this code path unless the user is trying
     to do something tricky. We already consider files
     larger than core.bigfilethreshold to be binary, so this
     code would only kick in when that is circumvented
     (either by bumping that value, or by using a
     .gitattribute to mark a file as diffable).

     In other words, we can avoid being "nice" here, because
     there is already nice code that tries to do the right
     thing. We are adding the suspenders to the nice code's
     belt, so notice when it has been worked around (both to
     protect the user from malicious inputs, and because it
     is better to die() than generate bogus output).

The maximum size was chosen after experimenting with feeding
large files to the xdiff code. It's just under a gigabyte,
which leaves room for two obvious cases:

  - a diff3 merge conflict result on files of maximum size X
    could be 3*X plus the size of the markers, which would
    still be only about 3G, which fits in a 32-bit int.

  - some of the diff code allocates arrays of one int per
    record. Even if each file consists only of blank lines,
    then a file smaller than 1G will have fewer than 1G
    records, and therefore the int array will fit in 4G.

Since the limit is arbitrary anyway, I chose to go under a
gigabyte, to leave a safety margin (e.g., we would not want
to overflow by allocating "(records + 1) * sizeof(int)" or
similar.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-28 14:57:23 -07:00
Jeff King
3efb988098 react to errors in xdi_diff
When we call into xdiff to perform a diff, we generally lose
the return code completely. Typically by ignoring the return
of our xdi_diff wrapper, but sometimes we even propagate
that return value up and then ignore it later.  This can
lead to us silently producing incorrect diffs (e.g., "git
log" might produce no output at all, not even a diff header,
for a content-level diff).

In practice this does not happen very often, because the
typical reason for xdiff to report failure is that it
malloc() failed (it uses straight malloc, and not our
xmalloc wrapper).  But it could also happen when xdiff
triggers one our callbacks, which returns an error (e.g.,
outf() in builtin/rerere.c tries to report a write failure
in this way). And the next patch also plans to add more
failure modes.

Let's notice an error return from xdiff and react
appropriately. In most of the diff.c code, we can simply
die(), which matches the surrounding code (e.g., that is
what we do if we fail to load a file for diffing in the
first place). This is not that elegant, but we are probably
better off dying to let the user know there was a problem,
rather than simply generating bogus output.

We could also just die() directly in xdi_diff, but the
callers typically have a bit more context, and can provide a
better message (and if we do later decide to pass errors up,
we're one step closer to doing so).

There is one interesting case, which is in diff_grep(). Here
if we cannot generate the diff, there is nothing to match,
and we silently return "no hits". This is actually what the
existing code does already, but we make it a little more
explicit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-28 14:57:10 -07:00
Junio C Hamano
f2df3104ce Merge branch 'jk/transfer-limit-redirection' into maint-2.3 2015-09-28 14:46:05 -07:00
Junio C Hamano
df37727a65 Merge branch 'jk/transfer-limit-protocol' into maint-2.3 2015-09-28 14:33:27 -07:00
Junio C Hamano
be08dee973 Git 2.6
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-28 13:18:19 -07:00
Blake Burkhart
b258116462 http: limit redirection depth
By default, libcurl will follow circular http redirects
forever. Let's put a cap on this so that somebody who can
trigger an automated fetch of an arbitrary repository (e.g.,
for CI) cannot convince git to loop infinitely.

The value chosen is 20, which is the same default that
Firefox uses.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 15:32:28 -07:00
Blake Burkhart
f4113cac0c http: limit redirection to protocol-whitelist
Previously, libcurl would follow redirection to any protocol
it was compiled for support with. This is desirable to allow
redirection from HTTP to HTTPS. However, it would even
successfully allow redirection from HTTP to SFTP, a protocol
that git does not otherwise support at all. Furthermore
git's new protocol-whitelisting could be bypassed by
following a redirect within the remote helper, as it was
only enforced at transport selection time.

This patch limits redirects within libcurl to HTTP, HTTPS,
FTP and FTPS. If there is a protocol-whitelist present, this
list is limited to those also allowed by the whitelist. As
redirection happens from within libcurl, it is impossible
for an HTTP redirect to a protocol implemented within
another remote helper.

When the curl version git was compiled with is too old to
support restrictions on protocol redirection, we warn the
user if GIT_ALLOW_PROTOCOL restrictions were requested. This
is a little inaccurate, as even without that variable in the
environment, we would still restrict SFTP, etc, and we do
not warn in that case. But anything else means we would
literally warn every time git accesses an http remote.

This commit includes a test, but it is not as robust as we
would hope. It redirects an http request to ftp, and checks
that curl complained about the protocol, which means that we
are relying on curl's specific error message to know what
happened. Ideally we would redirect to a working ftp server
and confirm that we can clone without protocol restrictions,
and not with them. But we do not have a portable way of
providing an ftp server, nor any other protocol that curl
supports (https is the closest, but we would have to deal
with certificates).

[jk: added test and version warning]

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 15:30:39 -07:00
Jeff King
5088d3b387 transport: refactor protocol whitelist code
The current callers only want to die when their transport is
prohibited. But future callers want to query the mechanism
without dying.

Let's break out a few query functions, and also save the
results in a static list so we don't have to re-parse for
each query.

Based-on-a-patch-by: Blake Burkhart <bburky@bburky.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 15:28:36 -07:00