mirror of
https://github.com/git/git.git
synced 2024-10-31 14:27:54 +01:00
Merge branch 'kb/i18n-doc'
* kb/i18n-doc: Documentation/i18n.txt: clarify character encoding support
This commit is contained in:
commit
81bc521af2
1 changed files with 23 additions and 10 deletions
|
@ -1,18 +1,31 @@
|
|||
At the core level, Git is character encoding agnostic.
|
||||
|
||||
- The pathnames recorded in the index and in the tree objects
|
||||
are treated as uninterpreted sequences of non-NUL bytes.
|
||||
What readdir(2) returns are what are recorded and compared
|
||||
with the data Git keeps track of, which in turn are expected
|
||||
to be what lstat(2) and creat(2) accepts. There is no such
|
||||
thing as pathname encoding translation.
|
||||
Git is to some extent character encoding agnostic.
|
||||
|
||||
- The contents of the blob objects are uninterpreted sequences
|
||||
of bytes. There is no encoding translation at the core
|
||||
level.
|
||||
|
||||
- The commit log messages are uninterpreted sequences of non-NUL
|
||||
bytes.
|
||||
- Path names are encoded in UTF-8 normalization form C. This
|
||||
applies to tree objects, the index file, ref names, as well as
|
||||
path names in command line arguments, environment variables
|
||||
and config files (`.git/config` (see linkgit:git-config[1]),
|
||||
linkgit:gitignore[5], linkgit:gitattributes[5] and
|
||||
linkgit:gitmodules[5]).
|
||||
+
|
||||
Note that Git at the core level treats path names simply as
|
||||
sequences of non-NUL bytes, there are no path name encoding
|
||||
conversions (except on Mac and Windows). Therefore, using
|
||||
non-ASCII path names will mostly work even on platforms and file
|
||||
systems that use legacy extended ASCII encodings. However,
|
||||
repositories created on such systems will not work properly on
|
||||
UTF-8-based systems (e.g. Linux, Mac, Windows) and vice versa.
|
||||
Additionally, many Git-based tools simply assume path names to
|
||||
be UTF-8 and will fail to display other encodings correctly.
|
||||
|
||||
- Commit log messages are typically encoded in UTF-8, but other
|
||||
extended ASCII encodings are also supported. This includes
|
||||
ISO-8859-x, CP125x and many others, but _not_ UTF-16/32,
|
||||
EBCDIC and CJK multi-byte encodings (GBK, Shift-JIS, Big5,
|
||||
EUC-x, CP9xx etc.).
|
||||
|
||||
Although we encourage that the commit log messages are encoded
|
||||
in UTF-8, both the core and Git Porcelain are designed not to
|
||||
|
|
Loading…
Reference in a new issue