utf8

资料来源 : Free On-Line Dictionary of Computing

UTF-8
     
         (UCS transformation format 8) An
        {ASCII}-compatible multibyte {Unicode} and {UCS} encoding,
        used by {Java} and {Plan 9}.
     
        The {Unicode character} set occupies a 16-bit code space.  The
        most obvious Unicode encoding (known as UCS-2) consists of a
        sequence of 16-bit words.  Such strings can contain bytes like
        '\0' or '/' which have a special meaning in filenames and
        other {C} library function parameters.  In addition, the
        majority of {Unix} tools expects ASCII files and can't read
        16-bit words as characters without major modifications.  For
        these reasons, UCS-2 is not a suitable external encoding of
        Unicode in filenames, text files, environment variables, etc.
     
        The {ISO 10646} {Universal Character Set} (UCS), a superset of
        Unicode, occupies a 31-bit code space and the obvious UCS-4
        encoding for it (a sequence of 32-bit words) has the same
        problems.
     
        The UTF-8 encoding of Unicode and UCS avoids the problems of
        fixed-length Unicode encodings because an ASCII file encoded
        in UTF is exactly same as the original ASCII file and all
        non-ASCII characters are guaranteed to have the most
        significant bit set (bit 0x80).  This means that normal tools
        for text searching etc. work as expected.
     
        UTF-8 is defined in {RFC 2279}.
     
        ["File System Safe UCS Transformation Format (FSS_UTF)",
        X/Open Preliminary Specification, X/Open Company Ltd.,
        Document Number: P316.  This information also appears in
        ISO/IEC 10646, Annex P].
     
        {Plan 9 UTF manual entry
        (ftp://ftp.uu.net/doc/obi/Bell.Labs/plan9pm/09utf.ps.Z)}.
     
        (1998-07-29)

依字母排序 : A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

utf8

资料来源 : Free On-Line Dictionary of Computing

相近字词