1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-11-05 16:52:59 +01:00
Commit graph

183 commits

Author SHA1 Message Date
Junio C Hamano
8061ae8b46 Merge branch 'jk/commit-buffer-length'
Move "commit->buffer" out of the in-core commit object and keep
track of their lengths.  Use this to optimize the code paths to
validate GPG signatures in commit objects.

* jk/commit-buffer-length:
  reuse cached commit buffer when parsing signatures
  commit: record buffer length in cache
  commit: convert commit->buffer to a slab
  commit-slab: provide a static initializer
  use get_commit_buffer everywhere
  convert logmsg_reencode to get_commit_buffer
  use get_commit_buffer to avoid duplicate code
  use get_cached_commit_buffer where appropriate
  provide helpers to access the commit buffer
  provide a helper to set the commit buffer
  provide a helper to free commit buffer
  sequencer: use logmsg_reencode in get_message
  logmsg_reencode: return const buffer
  do not create "struct commit" with xcalloc
  commit: push commit_index update into alloc_commit_node
  alloc: include any-object allocations in alloc_report
  replace dangerous uses of strbuf_attach
  commit_tree: take a pointer/len pair rather than a const strbuf
2014-07-02 12:53:02 -07:00
Junio C Hamano
64d845477b Merge branch 'maint'
* maint:
  t7300: repair filesystem permissions with test_when_finished
  enums: remove trailing ',' after last item in enum
2014-07-02 12:52:46 -07:00
Junio C Hamano
c2f7b1026e Merge branch 'maint-1.8.5' into maint
* maint-1.8.5:
  t7300: repair filesystem permissions with test_when_finished
  enums: remove trailing ',' after last item in enum
2014-07-02 12:51:50 -07:00
Ronnie Sahlberg
782735203c enums: remove trailing ',' after last item in enum
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-02 12:37:05 -07:00
Junio C Hamano
09e13ad5b0 Merge branch 'as/pretty-truncate'
* as/pretty-truncate:
  pretty.c: format string with truncate respects logOutputEncoding
  t4205, t6006: add tests that fail with i18n.logOutputEncoding set
  t4205 (log-pretty-format): use `tformat` rather than `format`
  t4041, t4205, t6006, t7102: don't hardcode tested encoding value
  t4205 (log-pretty-formats): don't hardcode SHA-1 in expected outputs
2014-06-16 10:07:12 -07:00
Jeff King
8597ea3afe commit: record buffer length in cache
Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.

For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.

This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:09:38 -07:00
Jeff King
b66103c3ba convert logmsg_reencode to get_commit_buffer
Like the callsites in the previous commit, logmsg_reencode
already falls back to read_sha1_file when necessary.
However, I split its conversion out into its own commit
because it's a bit more complex.

We return either:

  1. The original commit->buffer

  2. A newly allocated buffer from read_sha1_file

  3. A reencoded buffer (based on either 1 or 2 above).

while trying to do as few extra reads/allocations as
possible. Callers currently free the result with
logmsg_free, but we can simplify this by pointing them
straight to unuse_commit_buffer. This is a slight layering
violation, in that we may be passing a buffer from (3).
However, since the end result is to free() anything except
(1), which is unlikely to change, and because this makes the
interface much simpler, it's a reasonable bending of the
rules.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:08:17 -07:00
Jeff King
b000c59b0c logmsg_reencode: return const buffer
The return value from logmsg_reencode may be either a newly
allocated buffer or a pointer to the existing commit->buffer.
We would not want the caller to accidentally free() or
modify the latter, so let's mark it as const.  We can cast
away the constness in logmsg_free, but only once we have
determined that it is a free-able buffer.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-12 10:29:43 -07:00
Alexey Shumkin
7d509878b8 pretty.c: format string with truncate respects logOutputEncoding
Pretty format string %<(N,[ml]trunc)>%s truncates subject to a given
length with an appropriate padding. This works for non-ASCII texts when
i18n.logOutputEncoding is UTF-8 only (independently of a printed commit
message encoding) but does not work when i18n.logOutputEncoding is NOT
UTF-8.

In 7e77df3 (pretty: two phase conversion for non utf-8 commits, 2013-04-19)
'format_commit_item' function assumes commit message to be in UTF-8.
And that was so until ecaee80 (pretty: --format output should honor
logOutputEncoding, 2013-06-26) where conversion to logOutputEncoding was
added before calling 'format_commit_message'.

Correct this by converting a commit message to UTF-8 first (as it
assumed in 7e77df3 (pretty: two phase conversion for non utf-8 commits,
2013-04-19)). Only after that convert a commit message to an actual
logOutputEncoding.

Signed-off-by: Alexey Shumkin <Alex.Crezoff@gmail.com>
Reviewed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-21 11:13:30 -07:00
Jeff King
d105324655 pretty: make show_ident_date public
We use this function internally to format "Date" lines in
commit logs, but other parts of the code will want it, too.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-02 14:13:00 -07:00
Junio C Hamano
6f75e48323 Merge branch 'rm/strchrnul-not-strlen'
* rm/strchrnul-not-strlen:
  use strchrnul() in place of strchr() and strlen()
2014-03-18 13:51:18 -07:00
Junio C Hamano
3c83b080e4 Merge branch 'jk/commit-dates-parsing-fix'
Tighten codepaths that parse timestamps in commit objects.

* jk/commit-dates-parsing-fix:
  show_ident_date: fix tz range check
  log: do not segfault on gmtime errors
  log: handle integer overflow in timestamps
  date: check date overflow against time_t
  fsck: report integer overflow in author timestamps
  t4212: test bogus timestamps with git-log
2014-03-14 14:25:44 -07:00
Rohit Mani
2c5495f7b6 use strchrnul() in place of strchr() and strlen()
Avoid scanning strings twice, once with strchr() and then with
strlen(), by using strchrnul().

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rohit Mani <rohit.mani@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-10 08:35:30 -07:00
Jeff King
3f419d45ef show_ident_date: fix tz range check
Commit 1dca155fe3 (log: handle integer overflow in
timestamps, 2014-02-24) tried to catch integer overflow
coming from strtol() on the timezone field by comparing against
LONG_MIN/LONG_MAX. However, the intermediate "tz" variable
is an "int", which means it can never be LONG_MAX on LP64
systems; we would truncate the output from strtol before the
comparison.

Clang's -Wtautological-constant-out-of-range-compare notices
this and rightly complains.

Let's instead store the result of strtol in a long, and then
compare it against INT_MIN/INT_MAX. This will catch overflow
from strtol, and also overflow when we pass the result as an
int to show_date.

Reported-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-07 11:53:29 -08:00
Jeff King
1dca155fe3 log: handle integer overflow in timestamps
If an ident line has a ridiculous date value like (2^64)+1,
we currently just pass ULONG_MAX along to the date code,
which can produce nonsensical dates.

On systems with a signed long time_t (e.g., 64-bit glibc
systems), this actually doesn't end up too bad. The
ULONG_MAX is converted to -1, we apply the timezone field to
that, and the result ends up somewhere between Dec 31, 1969
and Jan 1, 1970.

However, there is still a few good reasons to detect the
overflow explicitly:

  1. On systems where "unsigned long" is smaller than
     time_t, we get a nonsensical date in the future.

  2. Even where it would produce "Dec 31, 1969", it's easier
     to recognize "midnight Jan 1" as a consistent sentinel
     value for "we could not parse this".

  3.  Values which do not overflow strtoul but do overflow a
      signed time_t produce nonsensical values in the past.
      For example, on a 64-bit system with a signed long
      time_t, a timestamp of 18446744073000000000 produces a
      date in 1947.

We also recognize overflow in the timezone field, which
could produce nonsensical results. In this case we show the
parsed date, but in UTC.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 10:12:58 -08:00
Christian Couder
5955654823 replace {pre,suf}fixcmp() with {starts,ends}_with()
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.

The change can be recreated by mechanically applying this:

    $ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
      grep -v strbuf\\.c |
      xargs perl -pi -e '
        s|!prefixcmp\(|starts_with\(|g;
        s|prefixcmp\(|!starts_with\(|g;
        s|!suffixcmp\(|ends_with\(|g;
        s|suffixcmp\(|!ends_with\(|g;
      '

on the result of preparatory changes in this series.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05 14:13:21 -08:00
Felipe Contreras
35b2fa5ba3 pretty: trivial style fix
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-31 13:47:41 -07:00
Jeff King
662cc30cd0 format-patch: print in-body "From" only when needed
Commit a908047 taught format-patch the "--from" option,
which places the author ident into an in-body from header,
and uses the committer ident in the rfc822 from header.  The
documentation claims that it will omit the in-body header
when it is the same as the rfc822 header, but the code never
implemented that behavior.

This patch completes the feature by comparing the two idents
and doing nothing when they are the same (this is the same
as simply omitting the in-body header, as the two are by
definition indistinguishable in this case). This makes it
reasonable to turn on "--from" all the time (if it matches
your particular workflow), rather than only using it when
exporting other people's patches.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-20 11:09:51 -07:00
Jeff King
a90804752f teach format-patch to place other authors into in-body "From"
Format-patch generates emails with the "From" address set to the
author of each patch. If you are going to send the emails, however,
you would want to replace the author identity with yours (if they
are not the same), and bump the author identity to an in-body
header.

Normally this is handled by git-send-email, which does the
transformation before sending out the emails. However, some
workflows may not use send-email (e.g., imap-send, or a custom
script which feeds the mbox to a non-git MUA). They could each
implement this feature themselves, but getting it right is
non-trivial (one must canonicalize the identities by reversing any
RFC2047 encoding or RFC822 quoting of the headers, which has caused
many bugs in send-email over the years).

This patch takes a different approach: it teaches format-patch a
"--from" option which handles the ident check and in-body header
while it is writing out the email.  It's much simpler to do at this
level (because we haven't done any quoting yet), and any workflow
based on format-patch can easily turn it on.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-03 12:11:04 -07:00
Jeff King
10f2fbff68 pretty.c: drop const-ness from pretty_print_context
In the current code, callers are expected to fill in the
pretty_print_context, and then the pretty.c functions simply
read from it. This leaves no room for the pretty.c functions
to communicate with each other by manipulating the context
(e.g., data seen while printing the header may impact how we
print the body).

Rather than introduce a new struct to hold modifiable data,
let's just drop the const-ness of the existing context
struct.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-03 12:10:57 -07:00
Junio C Hamano
d9291ecf4f Merge branch 'rs/pp-user-info-without-extra-allocation'
* rs/pp-user-info-without-extra-allocation:
  pretty: remove intermediate strbufs from pp_user_info()
  pretty: simplify output line length calculation in pp_user_info()
  pretty: simplify input line length calculation in pp_user_info()
2013-05-01 15:24:08 -07:00
René Scharfe
a0511b3934 pretty: remove intermediate strbufs from pp_user_info()
Use namebuf/namelen and mailbuf/maillen directly instead of copying
their contents into strbufs first.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-25 15:02:54 -07:00
René Scharfe
97a17e7721 pretty: simplify output line length calculation in pp_user_info()
Keep namelen unchanged and don't use it to hold a value that we're not
interested in anyway -- we can use maillen and the constant part
directly instead.  This simplifies the code slightly and prepares for
the next patch that makes use of the original value of namelen.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-25 15:02:53 -07:00
René Scharfe
30e77bcb50 pretty: simplify input line length calculation in pp_user_info()
Instead of searching for LF and NUL with two strchr() calls use a single
strchrnul() call.  We don't need to check if the returned pointer is NULL
because either we'll find the NUL at the end of line, or the caller
forgot to NUL-terminate the string and we'll overrun the buffer in any
case.  Also we don't need to pass LF or NUL to split_ident_line() as it
ignores it anyway.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-25 15:02:51 -07:00
Junio C Hamano
e52e6f79cc Merge branch 'nd/pretty-formats'
pretty-printing body of the commit that is stored in non UTF-8
encoding did not work well.  The early part of this series fixes
it.  And then it adds %C(auto) specifier that turns the coloring on
when we are emitting to the terminal, and adds column-aligning
format directives.

* nd/pretty-formats:
  pretty: support %>> that steal trailing spaces
  pretty: support truncating in %>, %< and %><
  pretty: support padding placeholders, %< %> and %><
  pretty: add %C(auto) for auto-coloring
  pretty: split color parsing into a separate function
  pretty: two phase conversion for non utf-8 commits
  utf8.c: add reencode_string_len() that can handle NULs in string
  utf8.c: add utf8_strnwidth() with the ability to skip ansi sequences
  utf8.c: move display_mode_esc_sequence_len() for use by other functions
  pretty: share code between format_decoration and show_decorations
  pretty-formats.txt: wrap long lines
  pretty: get the correct encoding for --pretty:format=%e
  pretty: save commit encoding from logmsg_reencode if the caller needs it
2013-04-23 11:22:48 -07:00
Junio C Hamano
703319313f Merge branch 'jk/chopped-ident'
A commit object whose author or committer ident are malformed
crashed some code that trusted that a name, an email and an
timestamp can always be found in it.

* jk/chopped-ident:
  blame: handle broken commit headers gracefully
  pretty: handle broken commit headers gracefully
  cat-file: print tags raw for "cat-file -p"
2013-04-22 11:11:36 -07:00
Nguyễn Thái Ngọc Duy
1640632b4f pretty: support %>> that steal trailing spaces
This is pretty useful in `%<(100)%s%Cred%>(20)% an' where %s does not
use up all 100 columns and %an needs more than 20 columns. By
replacing %>(20) with %>>(20), %an can steal spaces from %s.

%>> understands escape sequences, so %Cred does not stop it from
stealing spaces in %<(100).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:29 -07:00
Nguyễn Thái Ngọc Duy
a7f01c6b4d pretty: support truncating in %>, %< and %><
%>(N,trunc) truncates the right part after N columns and replace the
last two letters with "..". ltrunc does the same on the left. mtrunc
cuts the middle out.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:29 -07:00
Nguyễn Thái Ngọc Duy
a57523428b pretty: support padding placeholders, %< %> and %><
Either %<, %> or %>< standing before a placeholder specifies how many
columns (at least as the placeholder can exceed it) it takes. Each
differs on how spaces are padded:

  %< pads on the right (aka left alignment)
  %> pads on the left (aka right alignment)
  %>< pads both ways equally (aka centered)

The (<N>) follows them, e.g. `%<(100)', to specify the number of
columns the next placeholder takes.

However, if '|' stands before (<N>), e.g. `%>|(100)', then the number
of columns is calculated so that it reaches the Nth column on screen.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:29 -07:00
Nguyễn Thái Ngọc Duy
a95f067e3f pretty: add %C(auto) for auto-coloring
This is not simply convenient over %C(auto,xxx). Some placeholders
(actually only one, %d) do multi coloring and we can't emit a multiple
colors with %C(auto,xxx).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:28 -07:00
Nguyễn Thái Ngọc Duy
fcabc2d91c pretty: split color parsing into a separate function
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:28 -07:00
Nguyễn Thái Ngọc Duy
7e77df39bf pretty: two phase conversion for non utf-8 commits
Always assume format_commit_item() takes an utf-8 string for string
handling simplicity (we can handle utf-8 strings, but can't with other
encodings).

If commit message is in non-utf8, or output encoding is not, then the
commit is first converted to utf-8, processed, then output converted
to output encoding. This of course only works with encodings that are
compatible with Unicode.

This also fixes the iso8859-1 test in t6006. It's supposed to create
an iso8859-1 commit, but the commit content in t6006 is in UTF-8.
t6006 is now converted back in UTF-8 (the downside is we can't put
utf-8 strings there anymore).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:28 -07:00
Nguyễn Thái Ngọc Duy
9d3f002f21 pretty: share code between format_decoration and show_decorations
This also adds color support to format_decorations()

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:27 -07:00
Nguyễn Thái Ngọc Duy
0940a76db6 pretty: get the correct encoding for --pretty:format=%e
parse_commit_header() provides the commit encoding for '%e' and it
reads it from the re-encoded message, which contains the new encoding,
not the original one in the commit object. This never happens because
--pretty=format:xxx never respects i18n.logoutputencoding. But that's
a different story.

Get the commit encoding from logmsg_reencode() instead.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:27 -07:00
Nguyễn Thái Ngọc Duy
5a10d23658 pretty: save commit encoding from logmsg_reencode if the caller needs it
The commit encoding is parsed by logmsg_reencode, there's no need for
the caller to re-parse it again. The reencoded message now has the new
encoding, not the original one. The caller would need to read commit
object again before parsing.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:27 -07:00
René Scharfe
9dbe7c3d7f pretty: handle broken commit headers gracefully
Centralize the parsing of the date and time zone strings in the new
helper function show_ident_date() and make sure it checks the pointers
provided by split_ident_line() for NULL before use.

Reported-by: Ivan Lyapunov <dront78@gmail.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 14:50:36 -07:00
Junio C Hamano
b771d8d7cf Merge branch 'mg/gpg-interface-using-status' into maint
Verification of signed tags were not done correctly when not in C
or en/US locale.

* mg/gpg-interface-using-status:
  pretty: make %GK output the signing key for signed commits
  pretty: parse the gpg status lines rather than the output
  gpg_interface: allow to request status return
  log-tree: rely upon the check in the gpg_interface
  gpg-interface: check good signature in a reliable way
2013-04-03 09:26:27 -07:00
Junio C Hamano
e6658b9d69 Merge branch 'ks/rfc2047-one-char-at-a-time' into maint
When "format-patch" quoted a non-ascii strings on the header files,
it incorrectly applied rfc2047 and chopped a single character in the
middle of it.

* ks/rfc2047-one-char-at-a-time:
  format-patch: RFC 2047 says multi-octet character may not be split
2013-04-03 09:25:29 -07:00
Sebastian Götte
e290c4b944 pretty printing: extend %G? to include 'N' and 'U'
Expand %G? in pretty format strings to 'N' in case of no GPG signature
and 'U' in case of a good but untrusted GPG signature in addition to
the previous 'G'ood and 'B'ad. This eases writing anyting parsing
git-log output.

Signed-off-by: Sebastian Götte <jaseg@physik-pool.tu-berlin.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-31 22:38:53 -07:00
Sebastian Götte
ffb6d7d5c9 Move commit GPG signature verification to commit.c
Signed-off-by: Sebastian Götte <jaseg@physik-pool.tu-berlin.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-31 19:15:11 -07:00
Junio C Hamano
573f1a9cf1 Merge branch 'ks/rfc2047-one-char-at-a-time'
When "format-patch" quoted a non-ascii strings on the header files,
it incorrectly applied rfc2047 and chopped a single character in
the middle of it.

* ks/rfc2047-one-char-at-a-time:
  format-patch: RFC 2047 says multi-octet character may not be split
2013-03-25 14:00:46 -07:00
Junio C Hamano
0f6875dbe2 Merge branch 'mg/gpg-interface-using-status'
Call "gpg" using the right API when validating the signature on
tags.

* mg/gpg-interface-using-status:
  pretty: make %GK output the signing key for signed commits
  pretty: parse the gpg status lines rather than the output
  gpg_interface: allow to request status return
  log-tree: rely upon the check in the gpg_interface
  gpg-interface: check good signature in a reliable way
2013-03-21 14:02:55 -07:00
Kirill Smelkov
6cd3c05327 format-patch: RFC 2047 says multi-octet character may not be split
Even though an earlier attempt (bafc478..41dd00bad) cleaned
up RFC 2047 encoding, pretty.c::add_rfc2047() still decides
where to split the output line by going through the input
one byte at a time, and potentially splits a character in
the middle.  A subject line may end up showing like this:

     ".... fö?? bar".   (instead of  ".... föö bar".)

if split incorrectly.

RFC 2047, section 5 (3) explicitly forbids such beaviour

    Each 'encoded-word' MUST represent an integral number of
    characters.  A multi-octet character may not be split across
    adjacent 'encoded- word's.

that means that e.g. for

    Subject: .... föö bar

encoding

    Subject: =?UTF-8?q?....=20f=C3=B6=C3=B6?=
     =?UTF-8?q?=20bar?=

is correct, and

    Subject: =?UTF-8?q?....=20f=C3=B6=C3?=      <-- NOTE ö is broken here
     =?UTF-8?q?=B6=20bar?=

is not, because "ö" character UTF-8 encoding C3 B6 is split here across
adjacent encoded words.

To fix the problem, make the loop grab one _character_ at a time and
determine its output length to see where to break the output line.  Note
that this version only knows about UTF-8, but the logic to grab one
character is abstracted out in mbs_chrlen() function to make it possible
to extend it to other encodings with the help of iconv in the future.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-09 11:11:19 -08:00
Michael J Gruber
0174eeaa73 pretty: make %GK output the signing key for signed commits
In order to employ signed keys in an automated way it is absolutely
necessary to check which keys the signatures come from.

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-14 09:30:36 -08:00
Michael J Gruber
4a868fd655 pretty: parse the gpg status lines rather than the output
Currently, parse_signature_lines() parses the gpg output for strings
which depend on LANG so it fails to recognize good commit signatures
(and thus does not fill in %G? and the like) in most locales.

Make it parse the status lines from gpg instead, which are the proper
machine interface. This fixes the problem described above.

There is a change in behavior for "%GS" which we intentionally do not
work around: "%GS" used to put quotes around the signer's uid (or
rather: it inherited from the gpg user output). We output the uid
without quotes now, just like author and committer names.

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-14 09:30:22 -08:00
Michael J Gruber
9cc4ac8ff1 gpg_interface: allow to request status return
Currently, verify_signed_buffer() returns the user facing output only.

Allow callers to request the status output also.

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-14 09:30:04 -08:00
Jeff King
be5c9fb904 logmsg_reencode: lazily load missing commit buffers
Usually a commit that makes it to logmsg_reencode will have
been parsed, and the commit->buffer struct member will be
valid. However, some code paths will free commit buffers
after having used them (for example, the log traversal
machinery will do so to keep memory usage down).

Most of the time this is fine; log should only show a commit
once, and then exits. However, there are some code paths
where this does not work. At least two are known:

  1. A commit may be shown as part of a regular ref, and
     then it may be shown again as part of a submodule diff
     (e.g., if a repo contains refs to both the superproject
     and subproject).

  2. A notes-cache commit may be shown during "log --all",
     and then later used to access a textconv cache during a
     diff.

Lazily loading in logmsg_reencode does not necessarily catch
all such cases, but it should catch most of them. Users of
the commit buffer tend to be either parsing for structure
(in which they will call parse_commit, and either we will
already have parsed, or we will load commit->buffer lazily
there), or outputting (either to the user, or fetching a
part of the commit message via format_commit_message). In
the latter case, we should always be using logmsg_reencode
anyway (and typically we do so via the pretty-print
machinery).

If there are any cases that this misses, we can fix them up
to use logmsg_reencode (or handle them on a case-by-case
basis if that is inappropriate).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-26 13:28:22 -08:00
Jeff King
dd0d388c44 logmsg_reencode: never return NULL
The logmsg_reencode function will return the reencoded
commit buffer, or NULL if reencoding failed or no reencoding
was necessary. Since every caller then ends up checking for NULL
and just using the commit's original buffer, anyway, we can
be a bit more helpful and just return that buffer when we
would have returned NULL.

Since the resulting string may or may not need to be freed,
we introduce a logmsg_free, which checks whether the buffer
came from the commit object or not (callers either
implemented the same check already, or kept two separate
pointers, one to mark the buffer to be used, and one for the
to-be-freed string).

Pushing this logic into logmsg_* simplifies the callers, and
will let future patches lazily load the commit buffer in a
single place.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-26 13:28:21 -08:00
Junio C Hamano
577f63e781 Merge branch 'ap/log-mailmap'
Teach commands in the "log" family to optionally pay attention to
the mailmap.

* ap/log-mailmap:
  log --use-mailmap: optimize for cases without --author/--committer search
  log: add log.mailmap configuration option
  log: grep author/committer using mailmap
  test: add test for --use-mailmap option
  log: add --use-mailmap option
  pretty: use mailmap to display username and email
  mailmap: add mailmap structure to rev_info and pp
  mailmap: simplify map_user() interface
  mailmap: remove email copy and length limitation
  Use split_ident_line to parse author and committer
  string-list: allow case-insensitive string list
2013-01-20 17:06:53 -08:00
Junio C Hamano
3ab4c543e3 Merge branch 'rs/pretty-use-prefixcmp'
* rs/pretty-use-prefixcmp:
  pretty: use prefixcmp instead of memcmp on NUL-terminated strings
2013-01-18 11:20:08 -08:00