.\" $OpenBSD: c16rtomb.3,v 1.1 2023/08/20 15:02:51 schwarze Exp $ .\" .\" Copyright (c) 2023 Ingo Schwarze .\" .\" Permission to use, copy, modify, and distribute this software for any .\" purpose with or without fee is hereby granted, provided that the above .\" copyright notice and this permission notice appear in all copies. .\" .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. .\" .Dd $Mdocdate: August 20 2023 $ .Dt C16RTOMB 3 .Os .Sh NAME .Nm c16rtomb .Nd convert one UTF-16 encoded character to UTF-8 .Sh SYNOPSIS .In uchar.h .Ft size_t .Fo c16rtomb .Fa "char * restrict s" .Fa "char16_t c16" .Fa "mbstate_t * restrict mbs" .Fc .Sh DESCRIPTION This function converts one UTF-16 encoded character to UTF-8. In some cases, it is necessary to call the function twice to convert a single character. .Pp First, call .Fn c16rtomb passing the first 16-bit code unit of the UTF-16 encoded character in .Fa c16 . If the return value is greater than 0, the character is part of the UCS-2 range, the complete UTF-8 encoding consisting of at most .Dv MB_CUR_MAX bytes has been written to the storage starting at .Fa s , and the function does not need to be called again. .Pp If the return value is 0, the first 16-bit code unit is a UTF-16 high surrogate and the function needs to be called a second time, this time passing the second 16-bit code unit of the UTF-16 encoded character in .Fa c16 and passing the same .Fa mbs again that was also passed to the first call. If the second 16-bit code unit is a UTF-16 low surrogate, the second call returns a value greater than 0, the surrogate pair represents a Unicode code point beyond the basic multilingual plane, and the complete UTF-8 encoding consisting of at most .Dv MB_CUR_MAX bytes is written to the storage starting at .Fa s . .Pp The output encoding that .Fn c16rtomb uses in .Fa s is determined by the .Dv LC_CTYPE category of the current locale. .Ox only supports UTF-8 and ASCII output, and this function is only useful for UTF-8. .Pp The following arguments cause special processing: .Bl -tag -width 012345678901 .It Fa c16 No == 0 A NUL byte is stored to .Pf * Fa s and the state object pointed to by .Fa mbs is reset to the initial state. On operating systems other than .Ox that support state-dependent multibyte encodings, a special byte sequence .Pq Dq shift sequence is written before the NUL byte to return to the initial state if that is required by the output encoding and by the current output encoding state. .It Fa mbs No == Dv NULL An internal .Vt mbstate_t object specific to the .Fn c16rtomb function is used instead of the .Fa mbs argument. This internal object is automatically initialized at program startup and never changed by any .Em libc function except .Fn c16rtomb . .It Fa s No == Dv NULL The object pointed to by .Fa mbs , or the internal object if .Fa mbs is a .Dv NULL pointer, is reset to its initial state, .Fa c16 is ignored, and 1 is returned. .El .Sh RETURN VALUES .Fn c16rtomb returns the number of bytes written to .Fa s on success or .Po Vt size_t Pc Ns \-1 on failure, specifically: .Bl -tag -width 10n .It 0 The first 16-bit code unit was successfully decoded as a UTF-16 high surrogate. Nothing was written to .Fa s yet. .It 1 The first 16-bit code unit was successfully decoded as a character in the range U+0000 to U+007F, or .Fa s is .Dv NULL . .It 2 The first 16-bit code unit was successfully decoded as a character in the range U+0080 to U+07FF. .It 3 The first 16-bit code unit was successfully decoded as a character in the range U+0800 to U+D7FF or U+E000 to U+FFFF. .It 4 The second 16-bit code unit was successfully decoded as a UTF-16 low surrogate, resulting in a character in the range U+10000 to U+10FFFF. .It greater Return values greater than 4 may occur on operating systems other than .Ox for output encodings other than UTF-8, in particular when a shift sequence was written. .It Po Vt size_t Pc Ns \-1 UTF-16 input decoding or .Dv LC_CTYPE output encoding failed, or .Fa mbs is invalid. Nothing was written to .Fa s , and .Va errno has been set. .El .Sh ERRORS .Fn c16rtomb causes an error in the following cases: .Bl -tag -width Er .It Bq Er EILSEQ UTF-16 input decoding failed because the first 16-bit code unit is neither a UCS-2 character nor a UTF-16 high surrogate, or because the second 16-bit code unit is not a UTF-16 low surrogate; or output encoding failed because the resulting character cannot be represented in the output encoding selected with .Dv LC_CTYPE . .It Bq Er EINVAL .Fa mbs points to an invalid or uninitialized .Vt mbstate_t object. .El .Sh SEE ALSO .Xr mbrtoc16 3 , .Xr setlocale 3 , .Xr wcrtomb 3 .Sh STANDARDS .Fn c16rtomb conforms to .St -isoC-2011 . .Sh HISTORY .Fn c16rtomb has been available since .Ox 7.4 . .Sh CAVEATS The C11 standard only requires the .Fa c16 argument to be interpreted according to UTF-16 if the predefined environment macro .Dv __STDC_UTF_16__ is defined with a value of 1. On .Ox , .In uchar.h provides this definition. Other operating systems which do not define .Dv __STDC_UTF_16__ could theoretically use a different, implementation-defined input encoding for .Fa c16 instead of UTF-16. Using UTF-16 becomes mandatory in C23.