Author:
Source
The GNU libiconv package provides the basis for character set conversion of text, for systems that don’t use glibc.
It contains an implementation of the iconv() POSIX:2024 API and of the ‘iconv’ program, in a way that is mostly glibc compatible.
New in this release:
- Many more transliterations, in particular also of Emoji characters.
- The iconv_open function is now POSIX:2024 compliant: it recognizes a suffix //NON_IDENTICAL_DISCARD in the ‘tocode’ argument, with the effect that characters that cannot be represented in the target character set will be silently discarded. Whereas the suffix //IGNORE in the ‘tocode’ argument has the effect of discarding not only characters that cannot be represented in the target character set, but also invalid multibyte sequences in the input. Accordingly, the iconvctl function accepts requests ICONV_GET_DISCARD_INVALID, ICONV_SET_DISCARD_INVALID, ICONV_GET_DISCARD_NON_IDENTICAL, ICONV_SET_DISCARD_NON_IDENTICAL.
- The iconv_open function and the iconv program now support multiple suffixes, such as //TRANSLIT//IGNORE, not only one.
- GB18030 is now an alias for GB18030:2005. A new converter for GB18030:2022 is added. Since this encoding merely cleans up a few private-use-area mappings, you can continue to use the GB18030 converter, for backward compatibility. Its Unicode to GB18030 conversion direction has been enhanced, to help transitioning away from PUA code points.
- When converting from/to an EBCDIC encoding, a non-standard way of converting newlines can be requested
- at the C level, by calling iconvctl with argument ICONV_SET_FROM_SURFACE or ICONV_SET_TO_SURFACE, or
- from the iconv program, by setting the environment variable ICONV_EBCDIC_ZOS_UNIX to a non-empty value.
- Special support for z/OS: The iconv program adds a charset metadata tag to its output file. (Contributed by Mike Fulton.)
- For conversions from UCS-2, UCS-4, UTF-16, UTF-32, invoking iconv(cd,NULL,NULL,…) now preserves the byte order state.