opennurbs_std_string.h
1 /* $NoKeywords: $ */
2 /*
3 //
4 // Copyright (c) 1993-2013 Robert McNeel & Associates. All rights reserved.
5 // OpenNURBS, Rhinoceros, and Rhino3D are registered trademarks of Robert
6 // McNeel & Associates.
7 //
8 // THIS SOFTWARE IS PROVIDED "AS IS" WITHOUT EXPRESS OR IMPLIED WARRANTY.
9 // ALL IMPLIED WARRANTIES OF FITNESS FOR ANY PARTICULAR PURPOSE AND OF
10 // MERCHANTABILITY ARE HEREBY DISCLAIMED.
11 //
12 // For complete openNURBS copyright information see <http://www.opennurbs.org>.
13 //
14 ////////////////////////////////////////////////////////////////
15 */
16 
17 #if !defined(ON_STD_STRING_INC_)
18 #define ON_STD_STRING_INC_
19 
20 /*
21 When the predecessor of opennurbs was released in 1995, there was
22 no robust corss platform support for dynamic string classes.
23 In order to provide robust dynamic string support, openNURBS
24 had to implement ON_String and ON_wString.
25 
26 It's now 2013 and current C++ compilers from the
27 GNU Project (gcc 4.7), Microsoft (Visual C++ 11 (2012)),
28 Google (Android NDK r8e) and Apple (LLVM 4.2) provide
29 reasonable support for much of the C++11 standard and provide
30 working implementations std::basic_string, std:string and
31 std::wstring classes.
32 
33 Over time, opennurbs will transition from using ON_String and
34 ON_wString to using std::string and std::wstring.
35 
36 The tools in the opennurbs_std_string*.* files provide support
37 for string formatting and UTF conversion that are not available
38 from the standard C++ string classes.
39 
40 These implementations assume the compiler has solid support for
41 std:basic_string, std::string, std::wstring, std::u16string,
42 std::u32string and for using rvalue references to efficient
43 return dynamic strings.
44 */
45 
46 ON_DECL
47 std::string ON_VARGS_FUNC_CDECL ON_std_string_format(
48  const char* format,
49  ...
50  ) ON_NOEXCEPT;
51 
52 ON_DECL
53 std::wstring ON_VARGS_FUNC_CDECL ON_std_wstring_format(
54  const wchar_t* format,
55  ...
56  ) ON_NOEXCEPT;
57 
58 /*
59 Description:
60  Convert a UTF-8 encoded char string to a UTF-8 encoded std::string.
61  This function removes byte order marks (BOM) and can repair encoding
62  errors.
63 
64 Parameters:
65  bTestByteOrder - [in]
66  If bTestByteOrder is true and the the input buffer is a
67  byte order mark (BOM), then the BOM is skipped. It the value
68  of the BOM is byte swapped, then subsequent input elements are
69  byte swapped before being decoded. Specifically:
70  - If the size of an input buffer element is 1 byte and the
71  values of the first three input elements are a UTF-8 BOM
72  (0xEF, 0xBB, 0xBF), then the first three input elements are
73  ignored and decoding begins at the forth input element.
74  - If the size of an input buffer element is 2 bytes and the value
75  of the first element is a UTF-16 BOM (0xFEFF), then the first
76  element is ignored and decoding begins with the second element.
77  - If the size of an input buffer element is 2 bytes and the value
78  of the first element is a byte swapped UTF-16 BOM (0xFFFE),
79  then the first element is ignored, decoding begins with the
80  second element, and input element bytes are swapped before
81  being decoded.
82  - If the size of an input buffer element is 4 bytes and the value
83  of the first element is a UTF-32 BOM (0x0000FEFF), then the
84  first element is ignored and decoding begins with the second
85  element.
86  - If the size of an input buffer element is 4 bytes and the value
87  of the first element is bytes swapped UTF-32 BOM (0xFFFE0000),
88  then the first element is ignored, decoding begins with the
89  second element, and input element bytes are swapped before
90  being decoded.
91  - In all other cases the first element of the input buffer is
92  decoded and no byte swapping is performed.
93 
94  sInputUTF - [in]
95  UTF-8 encoded char string to convert.
96 
97  sInputUTF_count - [in]
98  If sInputUTF_count >= 0, then it specifies the number of
99  elements in sInputUTF[] to convert.
100 
101  If sInputUTF_count == -1, then sInputUTF must be a zero
102  terminated array and all the elements up to the first zero
103  element are converted.
104 
105  error_status - [out]
106  If error_status is not null, then bits of *error_status are
107  set to indicate the success or failure of the conversion.
108  When the error_mask parameter is used to used to mask some
109  conversion errors, multiple bits may be set.
110  0: Successful conversion with no errors.
111  1: The input parameters were invalid.
112  This error cannot be masked.
113  2: The ouput buffer was not large enough to hold the converted
114  string. As much conversion as possible is performed in this
115  case and the error cannot be masked.
116  4: When parsing a UTF-8 or UTF-32 string, the values of two
117  consecutive encoding sequences formed a valid UTF-16
118  surrogate pair.
119  This error is masked if 0 != (4 & m_error_mask).
120  If the error is masked, then the surrogate pair is
121  decoded, the value of the resulting unicode code point
122  is used, and parsing continues.
123  8: An overlong UTF-8 encoding sequence was encountered and
124  the value of the overlong sUTF-8 equence was a valid
125  unicode code point.
126  This error is masked if 0 != (8 & m_error_mask).
127  If the error is masked, then the unicode code point
128  is used and parsing continues.
129  16: An illegal UTF-8, UTF-16 or UTF-32 encoding sequence occured
130  or an invalid unicode code point value resulted from decoding
131  a UTF-8 sequence.
132  This error is masked if 0 != (16 & m_error_mask).
133  If the error is masked and the value of error_code_point is
134  a valid unicode code point, then error_code_point is encoded
135  in the output string and parsing continues.
136 
137  error_mask - [in]
138  If 0 != (error_mask & 4), then type 4 errors are masked.
139  If 0 != (error_mask & 8), then type 8 errors are masked.
140  If 0 != (error_mask & 16) and error_code_point is a valid unicode
141  code point value, then type 16 errors are masked.
142 
143  error_code_point - [in]
144  Unicode code point value to use in when masking type 16 errors.
145  If 0 == (error_mask & 16), then this parameter is ignored.
146  0xFFFD is a popular choice for the error_code_point value.
147 
148  sEndElement - [out]
149  If sEndElement is not null, then *sEndElement points to the
150  element of sInputUTF[] were conversion stopped.
151 
152  If an error occured and was not masked, then *sEndElement points
153  to the element of sInputUTF[] where the conversion failed.
154  If no errors occured or all errors were masked, then
155  *sEndElement = sInputUTF + sInputUTF_count or points to
156  the zero terminator in sInputUTF[], depending on the input
157  value of sInputUTF_count.
158 
159 Returns:
160  A UTF-8 encoded std::string.
161  The returned string does not have a byte order mark (BOM).
162 */
163 ON_DECL
164 std::string ON_UTF8_to_std_string(
165  int bTestByteOrder,
166  const char* sInputUTF,
167  int sInputUTF_count,
168  unsigned int* error_status,
169  unsigned int error_mask,
170  ON__UINT32 error_code_point,
171  const char** sEndElement
172  ) ON_NOEXCEPT;
173 
174 /*
175 Description:
176  Convert a UTF-16 encoded ON__UINT16 string to a UTF-8 encoded std:string.
177  This function removes byte order marks (BOM) and can repair encoding
178  errors.
179 
180 Parameters:
181  bTestByteOrder - [in]
182  If bTestByteOrder is true and the the input buffer is a
183  byte order mark (BOM), then the BOM is skipped. It the value
184  of the BOM is byte swapped, then subsequent input elements are
185  byte swapped before being decoded. Specifically:
186  - If the size of an input buffer element is 1 byte and the
187  values of the first three input elements are a UTF-8 BOM
188  (0xEF, 0xBB, 0xBF), then the first three input elements are
189  ignored and decoding begins at the forth input element.
190  - If the size of an input buffer element is 2 bytes and the value
191  of the first element is a UTF-16 BOM (0xFEFF), then the first
192  element is ignored and decoding begins with the second element.
193  - If the size of an input buffer element is 2 bytes and the value
194  of the first element is a byte swapped UTF-16 BOM (0xFFFE),
195  then the first element is ignored, decoding begins with the
196  second element, and input element bytes are swapped before
197  being decoded.
198  - If the size of an input buffer element is 4 bytes and the value
199  of the first element is a UTF-32 BOM (0x0000FEFF), then the
200  first element is ignored and decoding begins with the second
201  element.
202  - If the size of an input buffer element is 4 bytes and the value
203  of the first element is bytes swapped UTF-32 BOM (0xFFFE0000),
204  then the first element is ignored, decoding begins with the
205  second element, and input element bytes are swapped before
206  being decoded.
207  - In all other cases the first element of the input buffer is
208  decoded and no byte swapping is performed.
209 
210  sInputUTF - [in]
211  UTF-16 encoded ON__UINT16 string to convert.
212 
213 
214  sInputUTF_count - [in]
215  If sInputUTF_count >= 0, then it specifies the number of
216  elements in sInputUTF[] to convert.
217 
218  If sInputUTF_count == -1, then sInputUTF must be a zero
219  terminated array and all the elements up to the first zero
220  element are converted.
221 
222  sUTF8 - [out]
223  If sUTF8 is not null and sUTF8_count > 0, then the UTF-8
224  encoded string is returned in this buffer. If there is room
225  for the null terminator, the converted string will be null
226  terminated. The null terminator is never included in the count
227  of returned by this function. The converted string is in the
228  CPU's native byte order. No byte order mark is prepended.
229 
230  sUTF8_count - [in]
231  If sUTF8_count > 0, then it specifies the number of available
232  ON__UINT8 elements in the sUTF8[] buffer.
233 
234  If sUTF8_count == 0, then the sUTF8 parameter is ignored.
235 
236  error_status - [out]
237  If error_status is not null, then bits of *error_status are
238  set to indicate the success or failure of the conversion.
239  When the error_mask parameter is used to used to mask some
240  conversion errors, multiple bits may be set.
241  0: Successful conversion with no errors.
242  1: The input parameters were invalid.
243  This error cannot be masked.
244  2: The ouput buffer was not large enough to hold the converted
245  string. As much conversion as possible is performed in this
246  case and the error cannot be masked.
247  4: When parsing a UTF-8 or UTF-32 string, the values of two
248  consecutive encoding sequences formed a valid UTF-16
249  surrogate pair.
250  This error is masked if 0 != (4 & m_error_mask).
251  If the error is masked, then the surrogate pair is
252  decoded, the value of the resulting unicode code point
253  is used, and parsing continues.
254  8: An overlong UTF-8 encoding sequence was encountered and
255  the value of the overlong sUTF-8 equence was a valid
256  unicode code point.
257  This error is masked if 0 != (8 & m_error_mask).
258  If the error is masked, then the unicode code point
259  is used and parsing continues.
260  16: An illegal UTF-8, UTF-16 or UTF-32 encoding sequence occured
261  or an invalid unicode code point value resulted from decoding
262  a UTF-8 sequence.
263  This error is masked if 0 != (16 & m_error_mask).
264  If the error is masked and the value of error_code_point is
265  a valid unicode code point, then error_code_point is encoded
266  in the output string and parsing continues.
267 
268  error_mask - [in]
269  If 0 != (error_mask & 4), then type 4 errors are masked.
270  If 0 != (error_mask & 8), then type 8 errors are masked.
271  If 0 != (error_mask & 16) and error_code_point is a valid unicode
272  code point value, then type 16 errors are masked.
273 
274  error_code_point - [in]
275  Unicode code point value to use in when masking type 16 errors.
276  If 0 == (error_mask & 16), then this parameter is ignored.
277  0xFFFD is a popular choice for the error_code_point value.
278 
279  sEndElement - [out]
280  If sEndElement is not null, then *sEndElement points to the
281  element of sInputUTF[] were conversion stopped.
282 
283  If an error occured and was not masked, then *sEndElement points
284  to the element of sInputUTF[] where the conversion failed.
285  If no errors occured or all errors were masked, then
286  *sEndElement = sInputUTF + sInputUTF_count or points to
287  the zero terminator in sInputUTF[], depending on the input
288  value of sInputUTF_count.
289 
290 
291 Returns:
292  A UTF-8 encoded std::string.
293  The returned string does not have a byte order mark (BOM).
294 */
295 ON_DECL
296 std::string ON_UTF16_to_std_string(
297  int bTestByteOrder,
298  const ON__UINT16* sInputUTF,
299  int sInputUTF_count,
300  unsigned int* error_status,
301  unsigned int error_mask,
302  ON__UINT32 error_code_point,
303  const ON__UINT16** sEndElement
304  ) ON_NOEXCEPT;
305 
306 /*
307 Description:
308  Convert a UTF-32 encoded ON__UINT16 string to a UTF-8 encoded std:string.
309  This function removes byte order marks (BOM) and can repair encoding
310  errors.
311 
312 Parameters:
313  bTestByteOrder - [in]
314  If bTestByteOrder is true and the the input buffer is a
315  byte order mark (BOM), then the BOM is skipped. It the value
316  of the BOM is byte swapped, then subsequent input elements are
317  byte swapped before being decoded. Specifically:
318  - If the size of an input buffer element is 1 byte and the
319  values of the first three input elements are a UTF-8 BOM
320  (0xEF, 0xBB, 0xBF), then the first three input elements are
321  ignored and decoding begins at the forth input element.
322  - If the size of an input buffer element is 2 bytes and the value
323  of the first element is a UTF-16 BOM (0xFEFF), then the first
324  element is ignored and decoding begins with the second element.
325  - If the size of an input buffer element is 2 bytes and the value
326  of the first element is a byte swapped UTF-16 BOM (0xFFFE),
327  then the first element is ignored, decoding begins with the
328  second element, and input element bytes are swapped before
329  being decoded.
330  - If the size of an input buffer element is 4 bytes and the value
331  of the first element is a UTF-32 BOM (0x0000FEFF), then the
332  first element is ignored and decoding begins with the second
333  element.
334  - If the size of an input buffer element is 4 bytes and the value
335  of the first element is bytes swapped UTF-32 BOM (0xFFFE0000),
336  then the first element is ignored, decoding begins with the
337  second element, and input element bytes are swapped before
338  being decoded.
339  - In all other cases the first element of the input buffer is
340  decoded and no byte swapping is performed.
341 
342  sInputUTF - [in]
343  UTF-32 encoded ON__UINT32 string to convert.
344 
345  sInputUTF_count - [in]
346  If sInputUTF_count >= 0, then it specifies the number of
347  elements in sInputUTF[] to convert.
348 
349  If sInputUTF_count == -1, then sInputUTF must be a zero
350  terminated array and all the elements up to the first zero
351  element are converted.
352 
353  error_status - [out]
354  If error_status is not null, then bits of *error_status are
355  set to indicate the success or failure of the conversion.
356  When the error_mask parameter is used to used to mask some
357  conversion errors, multiple bits may be set.
358  0: Successful conversion with no errors.
359  1: The input parameters were invalid.
360  This error cannot be masked.
361  2: The ouput buffer was not large enough to hold the converted
362  string. As much conversion as possible is performed in this
363  case and the error cannot be masked.
364  4: When parsing a UTF-8 or UTF-32 string, the values of two
365  consecutive encoding sequences formed a valid UTF-16
366  surrogate pair.
367  This error is masked if 0 != (4 & m_error_mask).
368  If the error is masked, then the surrogate pair is
369  decoded, the value of the resulting unicode code point
370  is used, and parsing continues.
371  8: An overlong UTF-8 encoding sequence was encountered and
372  the value of the overlong sUTF-8 equence was a valid
373  unicode code point.
374  This error is masked if 0 != (8 & m_error_mask).
375  If the error is masked, then the unicode code point
376  is used and parsing continues.
377  16: An illegal UTF-8, UTF-16 or UTF-32 encoding sequence occured
378  or an invalid unicode code point value resulted from decoding
379  a UTF-8 sequence.
380  This error is masked if 0 != (16 & m_error_mask).
381  If the error is masked and the value of error_code_point is
382  a valid unicode code point, then error_code_point is encoded
383  in the output string and parsing continues.
384 
385  error_mask - [in]
386  If 0 != (error_mask & 4), then type 4 errors are masked.
387  If 0 != (error_mask & 8), then type 8 errors are masked.
388  If 0 != (error_mask & 16) and error_code_point is a valid unicode
389  code point value, then type 16 errors are masked.
390 
391  error_code_point - [in]
392  Unicode code point value to use in when masking type 16 errors.
393  If 0 == (error_mask & 16), then this parameter is ignored.
394  0xFFFD is a popular choice for the error_code_point value.
395 
396  sEndElement - [out]
397  If sEndElement is not null, then *sEndElement points to the
398  element of sInputUTF[] were conversion stopped.
399 
400  If an error occured and was not masked, then *sEndElement points
401  to the element of sInputUTF[] where the conversion failed.
402  If no errors occured or all errors were masked, then
403  *sEndElement = sInputUTF + sInputUTF_count or points to
404  the zero terminator in sInputUTF[], depending on the input
405  value of sInputUTF_count.
406 
407 Returns:
408  A UTF-8 encoded std::string.
409  The returned string does not have a byte order mark (BOM).
410 */
411 ON_DECL
412 std::string ON_UTF32_to_std_string(
413  int bTestByteOrder,
414  const ON__UINT32* sInputUTF,
415  int sInputUTF_count,
416  unsigned int* error_status,
417  unsigned int error_mask,
418  ON__UINT32 error_code_point,
419  const ON__UINT32** sEndElement
420  ) ON_NOEXCEPT;
421 
422 /*
423 Description:
424  Convert a UTF-XX encoded wchar_t string to a UTF-8 encoded std:string.
425  This function removes byte order marks (BOM) and can repair encoding
426  errors.
427 
428  The value of sizeof(wchar_t) determines which UTF-XX encoding is used.
429  sizeof(wchar_t) UTF-XX
430  1 UTF-8
431  2 UTF-16
432  4 UTF-32
433 
434 Parameters:
435  bTestByteOrder - [in]
436  If bTestByteOrder is true and the the input buffer is a
437  byte order mark (BOM), then the BOM is skipped. It the value
438  of the BOM is byte swapped, then subsequent input elements are
439  byte swapped before being decoded. Specifically:
440  - If the size of an input buffer element is 1 byte and the
441  values of the first three input elements are a UTF-8 BOM
442  (0xEF, 0xBB, 0xBF), then the first three input elements are
443  ignored and decoding begins at the forth input element.
444  - If the size of an input buffer element is 2 bytes and the value
445  of the first element is a UTF-16 BOM (0xFEFF), then the first
446  element is ignored and decoding begins with the second element.
447  - If the size of an input buffer element is 2 bytes and the value
448  of the first element is a byte swapped UTF-16 BOM (0xFFFE),
449  then the first element is ignored, decoding begins with the
450  second element, and input element bytes are swapped before
451  being decoded.
452  - If the size of an input buffer element is 4 bytes and the value
453  of the first element is a UTF-32 BOM (0x0000FEFF), then the
454  first element is ignored and decoding begins with the second
455  element.
456  - If the size of an input buffer element is 4 bytes and the value
457  of the first element is bytes swapped UTF-32 BOM (0xFFFE0000),
458  then the first element is ignored, decoding begins with the
459  second element, and input element bytes are swapped before
460  being decoded.
461  - In all other cases the first element of the input buffer is
462  decoded and no byte swapping is performed.
463 
464  sInputUTF - [in]
465  UTF-XX encoded wchar_t string to convert.
466 
467  sInputUTF_count - [in]
468  If sInputUTF_count >= 0, then it specifies the number of
469  elements in sInputUTF[] to convert.
470 
471  If sInputUTF_count == -1, then sInputUTF must be a zero
472  terminated array and all the elements up to the first zero
473  element are converted.
474 
475  error_status - [out]
476  If error_status is not null, then bits of *error_status are
477  set to indicate the success or failure of the conversion.
478  When the error_mask parameter is used to used to mask some
479  conversion errors, multiple bits may be set.
480  0: Successful conversion with no errors.
481  1: The input parameters were invalid.
482  This error cannot be masked.
483  2: The ouput buffer was not large enough to hold the converted
484  string. As much conversion as possible is performed in this
485  case and the error cannot be masked.
486  4: When parsing a UTF-8 or UTF-32 string, the values of two
487  consecutive encoding sequences formed a valid UTF-16
488  surrogate pair.
489  This error is masked if 0 != (4 & m_error_mask).
490  If the error is masked, then the surrogate pair is
491  decoded, the value of the resulting unicode code point
492  is used, and parsing continues.
493  8: An overlong UTF-8 encoding sequence was encountered and
494  the value of the overlong sUTF-8 equence was a valid
495  unicode code point.
496  This error is masked if 0 != (8 & m_error_mask).
497  If the error is masked, then the unicode code point
498  is used and parsing continues.
499  16: An illegal UTF-8, UTF-16 or UTF-32 encoding sequence occured
500  or an invalid unicode code point value resulted from decoding
501  a UTF-8 sequence.
502  This error is masked if 0 != (16 & m_error_mask).
503  If the error is masked and the value of error_code_point is
504  a valid unicode code point, then error_code_point is encoded
505  in the output string and parsing continues.
506 
507  error_mask - [in]
508  If 0 != (error_mask & 4), then type 4 errors are masked.
509  If 0 != (error_mask & 8), then type 8 errors are masked.
510  If 0 != (error_mask & 16) and error_code_point is a valid unicode
511  code point value, then type 16 errors are masked.
512 
513  error_code_point - [in]
514  Unicode code point value to use in when masking type 16 errors.
515  If 0 == (error_mask & 16), then this parameter is ignored.
516  0xFFFD is a popular choice for the error_code_point value.
517 
518  end_element_index - [out]
519  If end_element_index is not null, then *end_element_index is the
520  index of the first element in sInputUTF that was not converted.
521 
522  If an error occured and was not masked, then *end_element_index
523  is the index of the element of sInputUTF[] where the conversion
524  failed.
525  If no errors occured or all errors were masked, then
526  *end_element_index is the number of elements in sInputUTF[] that
527  were converted.
528 
529 Returns:
530  A UTF-8 encoded std::string.
531  The returned string does not have a byte order mark (BOM).
532 */
533 std::string ON_UTF_WideChar_to_std_string(
534  int bTestByteOrder,
535  const wchar_t* sInputUTF,
536  int sInputUTF_count,
537  unsigned int* error_status,
538  unsigned int error_mask,
539  ON__UINT32 error_code_point,
540  int* end_element_index
541  ) ON_NOEXCEPT;
542 
543 /*
544 Description:
545  Convert a UTF-XX encoded std::wstring to a UTF-8 encoded std:string.
546  This function removes byte order marks (BOM) and can repair encoding
547  errors.
548 
549  The value of sizeof(wchar_t) determines which UTF-XX encoding is used.
550  sizeof(wchar_t) UTF-XX
551  1 UTF-8
552  2 UTF-16
553  4 UTF-32
554 
555 Parameters:
556  bTestByteOrder - [in]
557  If bTestByteOrder is true and the the input buffer is a
558  byte order mark (BOM), then the BOM is skipped. It the value
559  of the BOM is byte swapped, then subsequent input elements are
560  byte swapped before being decoded. Specifically:
561  - If the size of an input buffer element is 1 byte and the
562  values of the first three input elements are a UTF-8 BOM
563  (0xEF, 0xBB, 0xBF), then the first three input elements are
564  ignored and decoding begins at the forth input element.
565  - If the size of an input buffer element is 2 bytes and the value
566  of the first element is a UTF-16 BOM (0xFEFF), then the first
567  element is ignored and decoding begins with the second element.
568  - If the size of an input buffer element is 2 bytes and the value
569  of the first element is a byte swapped UTF-16 BOM (0xFFFE),
570  then the first element is ignored, decoding begins with the
571  second element, and input element bytes are swapped before
572  being decoded.
573  - If the size of an input buffer element is 4 bytes and the value
574  of the first element is a UTF-32 BOM (0x0000FEFF), then the
575  first element is ignored and decoding begins with the second
576  element.
577  - If the size of an input buffer element is 4 bytes and the value
578  of the first element is bytes swapped UTF-32 BOM (0xFFFE0000),
579  then the first element is ignored, decoding begins with the
580  second element, and input element bytes are swapped before
581  being decoded.
582  - In all other cases the first element of the input buffer is
583  decoded and no byte swapping is performed.
584 
585  sInputUTF - [in]
586  UTF-XX encoded std::wstring to convert.
587 
588  sInputUTF_count - [in]
589  If sInputUTF_count >= 0, then it specifies the number of
590  elements in sInputUTF[] to convert.
591 
592  If sInputUTF_count == -1, then sInputUTF must be a zero
593  terminated array and all the elements up to the first zero
594  element are converted.
595 
596  sUTF8 - [out]
597  If sUTF8 is not null and sUTF8_count > 0, then the UTF-8
598  encoded string is returned in this buffer. If there is room
599  for the null terminator, the converted string will be null
600  terminated. The null terminator is never included in the count
601  of returned by this function. The converted string is in the
602  CPU's native byte order. No byte order mark is prepended.
603 
604  sUTF8_count - [in]
605  If sUTF8_count > 0, then it specifies the number of available
606  ON__UINT8 elements in the sUTF8[] buffer.
607 
608  If sUTF8_count == 0, then the sUTF8 parameter is ignored.
609 
610  error_status - [out]
611  If error_status is not null, then bits of *error_status are
612  set to indicate the success or failure of the conversion.
613  When the error_mask parameter is used to used to mask some
614  conversion errors, multiple bits may be set.
615  0: Successful conversion with no errors.
616  1: The input parameters were invalid.
617  This error cannot be masked.
618  2: The ouput buffer was not large enough to hold the converted
619  string. As much conversion as possible is performed in this
620  case and the error cannot be masked.
621  4: When parsing a UTF-8 or UTF-32 string, the values of two
622  consecutive encoding sequences formed a valid UTF-16
623  surrogate pair.
624  This error is masked if 0 != (4 & m_error_mask).
625  If the error is masked, then the surrogate pair is
626  decoded, the value of the resulting unicode code point
627  is used, and parsing continues.
628  8: An overlong UTF-8 encoding sequence was encountered and
629  the value of the overlong sUTF-8 equence was a valid
630  unicode code point.
631  This error is masked if 0 != (8 & m_error_mask).
632  If the error is masked, then the unicode code point
633  is used and parsing continues.
634  16: An illegal UTF-8, UTF-16 or UTF-32 encoding sequence occured
635  or an invalid unicode code point value resulted from decoding
636  a UTF-8 sequence.
637  This error is masked if 0 != (16 & m_error_mask).
638  If the error is masked and the value of error_code_point is
639  a valid unicode code point, then error_code_point is encoded
640  in the output string and parsing continues.
641 
642  error_mask - [in]
643  If 0 != (error_mask & 4), then type 4 errors are masked.
644  If 0 != (error_mask & 8), then type 8 errors are masked.
645  If 0 != (error_mask & 16) and error_code_point is a valid unicode
646  code point value, then type 16 errors are masked.
647 
648  error_code_point - [in]
649  Unicode code point value to use in when masking type 16 errors.
650  If 0 == (error_mask & 16), then this parameter is ignored.
651  0xFFFD is a popular choice for the error_code_point value.
652 
653  end_element_index - [out]
654  If end_element_index is not null, then *end_element_index is the
655  index of the first element in sInputUTF that was not converted.
656 
657  If an error occured and was not masked, then *end_element_index
658  is the index of the element of sInputUTF[] where the conversion
659  failed.
660  If no errors occured or all errors were masked, then
661  *end_element_index is the number of elements in sInputUTF[] that
662  were converted.
663 
664 Returns:
665  A UTF-8 encoded std::string.
666  The returned string does not have a byte order mark (BOM).
667 */
668 ON_DECL
669 std::string ON_UTF_std_wstring_to_std_string(
670  int bTestByteOrder,
671  const std::wstring& sInputUTF,
672  int sInputUTF_count,
673  unsigned int* error_status,
674  unsigned int error_mask,
675  ON__UINT32 error_code_point,
676  int* end_element_index
677  ) ON_NOEXCEPT;
678 
679 /*
680 Description:
681  Convert a UTF-8 encoded char string to a UTF-XX encoded std::wstring.
682  This function removes byte order marks (BOM) and can repair encoding
683  errors.
684 
685  The value of sizeof(wchar_t) determines which UTF-XX encoding is used.
686  sizeof(wchar_t) UTF-XX
687  1 UTF-8
688  2 UTF-16
689  4 UTF-32
690 
691 Parameters:
692  bTestByteOrder - [in]
693  If bTestByteOrder is true and the the input buffer is a
694  byte order mark (BOM), then the BOM is skipped. It the value
695  of the BOM is byte swapped, then subsequent input elements are
696  byte swapped before being decoded. Specifically:
697  - If the size of an input buffer element is 1 byte and the
698  values of the first three input elements are a UTF-8 BOM
699  (0xEF, 0xBB, 0xBF), then the first three input elements are
700  ignored and decoding begins at the forth input element.
701  - If the size of an input buffer element is 2 bytes and the value
702  of the first element is a UTF-16 BOM (0xFEFF), then the first
703  element is ignored and decoding begins with the second element.
704  - If the size of an input buffer element is 2 bytes and the value
705  of the first element is a byte swapped UTF-16 BOM (0xFFFE),
706  then the first element is ignored, decoding begins with the
707  second element, and input element bytes are swapped before
708  being decoded.
709  - If the size of an input buffer element is 4 bytes and the value
710  of the first element is a UTF-32 BOM (0x0000FEFF), then the
711  first element is ignored and decoding begins with the second
712  element.
713  - If the size of an input buffer element is 4 bytes and the value
714  of the first element is bytes swapped UTF-32 BOM (0xFFFE0000),
715  then the first element is ignored, decoding begins with the
716  second element, and input element bytes are swapped before
717  being decoded.
718  - In all other cases the first element of the input buffer is
719  decoded and no byte swapping is performed.
720 
721  sInputUTF - [in]
722  UTF-8 encoded char string to convert.
723 
724 
725  sInputUTF_count - [in]
726  If sInputUTF_count >= 0, then it specifies the number of
727  elements in sInputUTF[] to convert.
728 
729  If sInputUTF_count == -1, then sInputUTF must be a zero
730  terminated array and all the elements up to the first zero
731  element are converted.
732 
733  error_status - [out]
734  If error_status is not null, then bits of *error_status are
735  set to indicate the success or failure of the conversion.
736  When the error_mask parameter is used to used to mask some
737  conversion errors, multiple bits may be set.
738  0: Successful conversion with no errors.
739  1: The input parameters were invalid.
740  This error cannot be masked.
741  2: The ouput buffer was not large enough to hold the converted
742  string. As much conversion as possible is performed in this
743  case and the error cannot be masked.
744  4: When parsing a UTF-8 or UTF-32 string, the values of two
745  consecutive encoding sequences formed a valid UTF-16
746  surrogate pair.
747  This error is masked if 0 != (4 & m_error_mask).
748  If the error is masked, then the surrogate pair is
749  decoded, the value of the resulting unicode code point
750  is used, and parsing continues.
751  8: An overlong UTF-8 encoding sequence was encountered and
752  the value of the overlong sUTF-8 equence was a valid
753  unicode code point.
754  This error is masked if 0 != (8 & m_error_mask).
755  If the error is masked, then the unicode code point
756  is used and parsing continues.
757  16: An illegal UTF-8, UTF-16 or UTF-32 encoding sequence occured
758  or an invalid unicode code point value resulted from decoding
759  a UTF-8 sequence.
760  This error is masked if 0 != (16 & m_error_mask).
761  If the error is masked and the value of error_code_point is
762  a valid unicode code point, then error_code_point is encoded
763  in the output string and parsing continues.
764 
765  error_mask - [in]
766  If 0 != (error_mask & 4), then type 4 errors are masked.
767  If 0 != (error_mask & 8), then type 8 errors are masked.
768  If 0 != (error_mask & 16) and error_code_point is a valid unicode
769  code point value, then type 16 errors are masked.
770 
771  error_code_point - [in]
772  Unicode code point value to use in when masking type 16 errors.
773  If 0 == (error_mask & 16), then this parameter is ignored.
774  0xFFFD is a popular choice for the error_code_point value.
775 
776  sEndElement - [out]
777  If sEndElement is not null, then *sEndElement points to the
778  element of sInputUTF[] were conversion stopped.
779 
780  If an error occured and was not masked, then *sEndElement points
781  to the element of sInputUTF[] where the conversion failed.
782  If no errors occured or all errors were masked, then
783  *sEndElement = sInputUTF + sInputUTF_count or points to
784  the zero terminator in sInputUTF[], depending on the input
785  value of sInputUTF_count.
786 
787 Returns:
788  A UTF-XX encoded std::wstring.
789  The returned string does not have a byte order mark (BOM).
790 */
791 ON_DECL
792 std::wstring ON_UTF8_to_std_wstring(
793  int bTestByteOrder,
794  const char* sInputUTF,
795  int sInputUTF_count,
796  unsigned int* error_status,
797  unsigned int error_mask,
798  ON__UINT32 error_code_point,
799  const char** sEndElement
800  ) ON_NOEXCEPT;
801 
802 /*
803 Description:
804  Convert a UTF-16 encoded ON__UINT16 string to a UTF-XX encoded std::wstring.
805  This function removes byte order marks (BOM) and can repair encoding
806  errors.
807 
808  The value of sizeof(wchar_t) determines which UTF-XX encoding is used.
809  sizeof(wchar_t) UTF-XX
810  1 UTF-8
811  2 UTF-16
812  4 UTF-32
813 
814 Parameters:
815  bTestByteOrder - [in]
816  If bTestByteOrder is true and the the input buffer is a
817  byte order mark (BOM), then the BOM is skipped. It the value
818  of the BOM is byte swapped, then subsequent input elements are
819  byte swapped before being decoded. Specifically:
820  - If the size of an input buffer element is 1 byte and the
821  values of the first three input elements are a UTF-8 BOM
822  (0xEF, 0xBB, 0xBF), then the first three input elements are
823  ignored and decoding begins at the forth input element.
824  - If the size of an input buffer element is 2 bytes and the value
825  of the first element is a UTF-16 BOM (0xFEFF), then the first
826  element is ignored and decoding begins with the second element.
827  - If the size of an input buffer element is 2 bytes and the value
828  of the first element is a byte swapped UTF-16 BOM (0xFFFE),
829  then the first element is ignored, decoding begins with the
830  second element, and input element bytes are swapped before
831  being decoded.
832  - If the size of an input buffer element is 4 bytes and the value
833  of the first element is a UTF-32 BOM (0x0000FEFF), then the
834  first element is ignored and decoding begins with the second
835  element.
836  - If the size of an input buffer element is 4 bytes and the value
837  of the first element is bytes swapped UTF-32 BOM (0xFFFE0000),
838  then the first element is ignored, decoding begins with the
839  second element, and input element bytes are swapped before
840  being decoded.
841  - In all other cases the first element of the input buffer is
842  decoded and no byte swapping is performed.
843 
844  sInputUTF - [in]
845  UTF-16 encoded ON__UINT16 string to convert.
846 
847  sInputUTF_count - [in]
848  If sInputUTF_count >= 0, then it specifies the number of
849  elements in sInputUTF[] to convert.
850 
851  If sInputUTF_count == -1, then sInputUTF must be a zero
852  terminated array and all the elements up to the first zero
853  element are converted.
854 
855  sUTF8 - [out]
856  If sUTF8 is not null and sUTF8_count > 0, then the UTF-8
857  encoded string is returned in this buffer. If there is room
858  for the null terminator, the converted string will be null
859  terminated. The null terminator is never included in the count
860  of returned by this function. The converted string is in the
861  CPU's native byte order. No byte order mark is prepended.
862 
863  sUTF8_count - [in]
864  If sUTF8_count > 0, then it specifies the number of available
865  ON__UINT8 elements in the sUTF8[] buffer.
866 
867  If sUTF8_count == 0, then the sUTF8 parameter is ignored.
868 
869  error_status - [out]
870  If error_status is not null, then bits of *error_status are
871  set to indicate the success or failure of the conversion.
872  When the error_mask parameter is used to used to mask some
873  conversion errors, multiple bits may be set.
874  0: Successful conversion with no errors.
875  1: The input parameters were invalid.
876  This error cannot be masked.
877  2: The ouput buffer was not large enough to hold the converted
878  string. As much conversion as possible is performed in this
879  case and the error cannot be masked.
880  4: When parsing a UTF-8 or UTF-32 string, the values of two
881  consecutive encoding sequences formed a valid UTF-16
882  surrogate pair.
883  This error is masked if 0 != (4 & m_error_mask).
884  If the error is masked, then the surrogate pair is
885  decoded, the value of the resulting unicode code point
886  is used, and parsing continues.
887  8: An overlong UTF-8 encoding sequence was encountered and
888  the value of the overlong sUTF-8 equence was a valid
889  unicode code point.
890  This error is masked if 0 != (8 & m_error_mask).
891  If the error is masked, then the unicode code point
892  is used and parsing continues.
893  16: An illegal UTF-8, UTF-16 or UTF-32 encoding sequence occured
894  or an invalid unicode code point value resulted from decoding
895  a UTF-8 sequence.
896  This error is masked if 0 != (16 & m_error_mask).
897  If the error is masked and the value of error_code_point is
898  a valid unicode code point, then error_code_point is encoded
899  in the output string and parsing continues.
900 
901  error_mask - [in]
902  If 0 != (error_mask & 4), then type 4 errors are masked.
903  If 0 != (error_mask & 8), then type 8 errors are masked.
904  If 0 != (error_mask & 16) and error_code_point is a valid unicode
905  code point value, then type 16 errors are masked.
906 
907  error_code_point - [in]
908  Unicode code point value to use in when masking type 16 errors.
909  If 0 == (error_mask & 16), then this parameter is ignored.
910  0xFFFD is a popular choice for the error_code_point value.
911 
912  sEndElement - [out]
913  If sEndElement is not null, then *sEndElement points to the
914  element of sInputUTF[] were conversion stopped.
915 
916  If an error occured and was not masked, then *sEndElement points
917  to the element of sInputUTF[] where the conversion failed.
918  If no errors occured or all errors were masked, then
919  *sEndElement = sInputUTF + sInputUTF_count or points to
920  the zero terminator in sInputUTF[], depending on the input
921  value of sInputUTF_count.
922 
923 Returns:
924  A UTF-XX encoded std::wstring.
925  The returned string does not have a byte order mark (BOM).
926 */
927 ON_DECL
928 std::wstring ON_UTF16_to_std_wstring(
929  int bTestByteOrder,
930  const ON__UINT16* sInputUTF,
931  int sInputUTF_count,
932  unsigned int* error_status,
933  unsigned int error_mask,
934  ON__UINT32 error_code_point,
935  const ON__UINT16** sEndElement
936  ) ON_NOEXCEPT;
937 
938 /*
939 Description:
940  Convert a UTF-32 encoded ON__UINT32 string to a UTF-XX encoded std::wstring.
941  This function removes byte order marks (BOM) and can repair encoding
942  errors.
943 
944  The value of sizeof(wchar_t) determines which UTF-XX encoding is used.
945  sizeof(wchar_t) UTF-XX
946  1 UTF-8
947  2 UTF-16
948  4 UTF-32
949 
950 Parameters:
951  bTestByteOrder - [in]
952  If bTestByteOrder is true and the the input buffer is a
953  byte order mark (BOM), then the BOM is skipped. It the value
954  of the BOM is byte swapped, then subsequent input elements are
955  byte swapped before being decoded. Specifically:
956  - If the size of an input buffer element is 1 byte and the
957  values of the first three input elements are a UTF-8 BOM
958  (0xEF, 0xBB, 0xBF), then the first three input elements are
959  ignored and decoding begins at the forth input element.
960  - If the size of an input buffer element is 2 bytes and the value
961  of the first element is a UTF-16 BOM (0xFEFF), then the first
962  element is ignored and decoding begins with the second element.
963  - If the size of an input buffer element is 2 bytes and the value
964  of the first element is a byte swapped UTF-16 BOM (0xFFFE),
965  then the first element is ignored, decoding begins with the
966  second element, and input element bytes are swapped before
967  being decoded.
968  - If the size of an input buffer element is 4 bytes and the value
969  of the first element is a UTF-32 BOM (0x0000FEFF), then the
970  first element is ignored and decoding begins with the second
971  element.
972  - If the size of an input buffer element is 4 bytes and the value
973  of the first element is bytes swapped UTF-32 BOM (0xFFFE0000),
974  then the first element is ignored, decoding begins with the
975  second element, and input element bytes are swapped before
976  being decoded.
977  - In all other cases the first element of the input buffer is
978  decoded and no byte swapping is performed.
979 
980  sInputUTF - [in]
981  UTF-32 encoded ON__UINT32 string to convert.
982 
983  sInputUTF_count - [in]
984  If sInputUTF_count >= 0, then it specifies the number of
985  elements in sInputUTF[] to convert.
986 
987  If sInputUTF_count == -1, then sInputUTF must be a zero
988  terminated array and all the elements up to the first zero
989  element are converted.
990 
991  error_status - [out]
992  If error_status is not null, then bits of *error_status are
993  set to indicate the success or failure of the conversion.
994  When the error_mask parameter is used to used to mask some
995  conversion errors, multiple bits may be set.
996  0: Successful conversion with no errors.
997  1: The input parameters were invalid.
998  This error cannot be masked.
999  2: The ouput buffer was not large enough to hold the converted
1000  string. As much conversion as possible is performed in this
1001  case and the error cannot be masked.
1002  4: When parsing a UTF-8 or UTF-32 string, the values of two
1003  consecutive encoding sequences formed a valid UTF-16
1004  surrogate pair.
1005  This error is masked if 0 != (4 & m_error_mask).
1006  If the error is masked, then the surrogate pair is
1007  decoded, the value of the resulting unicode code point
1008  is used, and parsing continues.
1009  8: An overlong UTF-8 encoding sequence was encountered and
1010  the value of the overlong sUTF-8 equence was a valid
1011  unicode code point.
1012  This error is masked if 0 != (8 & m_error_mask).
1013  If the error is masked, then the unicode code point
1014  is used and parsing continues.
1015  16: An illegal UTF-8, UTF-16 or UTF-32 encoding sequence occured
1016  or an invalid unicode code point value resulted from decoding
1017  a UTF-8 sequence.
1018  This error is masked if 0 != (16 & m_error_mask).
1019  If the error is masked and the value of error_code_point is
1020  a valid unicode code point, then error_code_point is encoded
1021  in the output string and parsing continues.
1022 
1023  error_mask - [in]
1024  If 0 != (error_mask & 4), then type 4 errors are masked.
1025  If 0 != (error_mask & 8), then type 8 errors are masked.
1026  If 0 != (error_mask & 16) and error_code_point is a valid unicode
1027  code point value, then type 16 errors are masked.
1028 
1029  error_code_point - [in]
1030  Unicode code point value to use in when masking type 16 errors.
1031  If 0 == (error_mask & 16), then this parameter is ignored.
1032  0xFFFD is a popular choice for the error_code_point value.
1033 
1034  sEndElement - [out]
1035  If sEndElement is not null, then *sEndElement points to the
1036  element of sInputUTF[] were conversion stopped.
1037 
1038  If an error occured and was not masked, then *sEndElement points
1039  to the element of sInputUTF[] where the conversion failed.
1040  If no errors occured or all errors were masked, then
1041  *sEndElement = sInputUTF + sInputUTF_count or points to
1042  the zero terminator in sInputUTF[], depending on the input
1043  value of sInputUTF_count.
1044 
1045 Returns:
1046  A UTF-XX encoded std::wstring.
1047  The returned string does not have a byte order mark (BOM).
1048 */
1049 ON_DECL
1050 std::wstring ON_UTF32_to_std_wstring(
1051  int bTestByteOrder,
1052  const ON__UINT32* sInputUTF,
1053  int sInputUTF_count,
1054  unsigned int* error_status,
1055  unsigned int error_mask,
1056  ON__UINT32 error_code_point,
1057  const ON__UINT32** sEndElement
1058  ) ON_NOEXCEPT;
1059 
1060 /*
1061 Description:
1062  Convert a UTF-XX encoded wchar_t string to a UTF-XX encoded std::wstring.
1063  This function removes byte order marks (BOM) and can repair encoding
1064  errors.
1065 
1066  The value of sizeof(wchar_t) determines which UTF-XX encoding is used.
1067  sizeof(wchar_t) UTF-XX
1068  1 UTF-8
1069  2 UTF-16
1070  4 UTF-32
1071 
1072 Parameters:
1073  bTestByteOrder - [in]
1074  If bTestByteOrder is true and the the input buffer is a
1075  byte order mark (BOM), then the BOM is skipped. It the value
1076  of the BOM is byte swapped, then subsequent input elements are
1077  byte swapped before being decoded. Specifically:
1078  - If the size of an input buffer element is 1 byte and the
1079  values of the first three input elements are a UTF-8 BOM
1080  (0xEF, 0xBB, 0xBF), then the first three input elements are
1081  ignored and decoding begins at the forth input element.
1082  - If the size of an input buffer element is 2 bytes and the value
1083  of the first element is a UTF-16 BOM (0xFEFF), then the first
1084  element is ignored and decoding begins with the second element.
1085  - If the size of an input buffer element is 2 bytes and the value
1086  of the first element is a byte swapped UTF-16 BOM (0xFFFE),
1087  then the first element is ignored, decoding begins with the
1088  second element, and input element bytes are swapped before
1089  being decoded.
1090  - If the size of an input buffer element is 4 bytes and the value
1091  of the first element is a UTF-32 BOM (0x0000FEFF), then the
1092  first element is ignored and decoding begins with the second
1093  element.
1094  - If the size of an input buffer element is 4 bytes and the value
1095  of the first element is bytes swapped UTF-32 BOM (0xFFFE0000),
1096  then the first element is ignored, decoding begins with the
1097  second element, and input element bytes are swapped before
1098  being decoded.
1099  - In all other cases the first element of the input buffer is
1100  decoded and no byte swapping is performed.
1101 
1102  sInputUTF - [in]
1103  UTF-XX encoded wchar_t string to convert.
1104 
1105  sInputUTF_count - [in]
1106  If sInputUTF_count >= 0, then it specifies the number of
1107  elements in sInputUTF[] to convert.
1108 
1109  If sInputUTF_count == -1, then sInputUTF must be a zero
1110  terminated array and all the elements up to the first zero
1111  element are converted.
1112 
1113  error_status - [out]
1114  If error_status is not null, then bits of *error_status are
1115  set to indicate the success or failure of the conversion.
1116  When the error_mask parameter is used to used to mask some
1117  conversion errors, multiple bits may be set.
1118  0: Successful conversion with no errors.
1119  1: The input parameters were invalid.
1120  This error cannot be masked.
1121  2: The ouput buffer was not large enough to hold the converted
1122  string. As much conversion as possible is performed in this
1123  case and the error cannot be masked.
1124  4: When parsing a UTF-8 or UTF-32 string, the values of two
1125  consecutive encoding sequences formed a valid UTF-16
1126  surrogate pair.
1127  This error is masked if 0 != (4 & m_error_mask).
1128  If the error is masked, then the surrogate pair is
1129  decoded, the value of the resulting unicode code point
1130  is used, and parsing continues.
1131  8: An overlong UTF-8 encoding sequence was encountered and
1132  the value of the overlong sUTF-8 equence was a valid
1133  unicode code point.
1134  This error is masked if 0 != (8 & m_error_mask).
1135  If the error is masked, then the unicode code point
1136  is used and parsing continues.
1137  16: An illegal UTF-8, UTF-16 or UTF-32 encoding sequence occured
1138  or an invalid unicode code point value resulted from decoding
1139  a UTF-8 sequence.
1140  This error is masked if 0 != (16 & m_error_mask).
1141  If the error is masked and the value of error_code_point is
1142  a valid unicode code point, then error_code_point is encoded
1143  in the output string and parsing continues.
1144 
1145  error_mask - [in]
1146  If 0 != (error_mask & 4), then type 4 errors are masked.
1147  If 0 != (error_mask & 8), then type 8 errors are masked.
1148  If 0 != (error_mask & 16) and error_code_point is a valid unicode
1149  code point value, then type 16 errors are masked.
1150 
1151  error_code_point - [in]
1152  Unicode code point value to use in when masking type 16 errors.
1153  If 0 == (error_mask & 16), then this parameter is ignored.
1154  0xFFFD is a popular choice for the error_code_point value.
1155 
1156  end_element_index - [out]
1157  If end_element_index is not null, then *end_element_index is the
1158  index of the first element in sInputUTF that was not converted.
1159 
1160  If an error occured and was not masked, then *end_element_index
1161  is the index of the element of sInputUTF[] where the conversion
1162  failed.
1163  If no errors occured or all errors were masked, then
1164  *end_element_index is the number of elements in sInputUTF[] that
1165  were converted.
1166 
1167 Returns:
1168  A UTF-XX encoded std::wstring.
1169  The returned string does not have a byte order mark (BOM).
1170 */
1171 std::wstring ON_UTF_WideChar_to_std_wstring(
1172  int bTestByteOrder,
1173  const wchar_t* sInputUTF,
1174  int sInputUTF_count,
1175  unsigned int* error_status,
1176  unsigned int error_mask,
1177  ON__UINT32 error_code_point,
1178  int* end_element_index
1179  ) ON_NOEXCEPT;
1180 
1181 /*
1182 Description:
1183  Convert a UTF-8 encoded std::string to a UTF-XX encoded std::wstring.
1184  This function removes byte order marks (BOM) and can repair encoding
1185  errors.
1186 
1187  The value of sizeof(wchar_t) determines which UTF-XX encoding is used.
1188  sizeof(wchar_t) UTF-XX
1189  1 UTF-8
1190  2 UTF-16
1191  4 UTF-32
1192 
1193 Parameters:
1194  bTestByteOrder - [in]
1195  If bTestByteOrder is true and the the input buffer is a
1196  byte order mark (BOM), then the BOM is skipped. It the value
1197  of the BOM is byte swapped, then subsequent input elements are
1198  byte swapped before being decoded. Specifically:
1199  - If the size of an input buffer element is 1 byte and the
1200  values of the first three input elements are a UTF-8 BOM
1201  (0xEF, 0xBB, 0xBF), then the first three input elements are
1202  ignored and decoding begins at the forth input element.
1203  - If the size of an input buffer element is 2 bytes and the value
1204  of the first element is a UTF-16 BOM (0xFEFF), then the first
1205  element is ignored and decoding begins with the second element.
1206  - If the size of an input buffer element is 2 bytes and the value
1207  of the first element is a byte swapped UTF-16 BOM (0xFFFE),
1208  then the first element is ignored, decoding begins with the
1209  second element, and input element bytes are swapped before
1210  being decoded.
1211  - If the size of an input buffer element is 4 bytes and the value
1212  of the first element is a UTF-32 BOM (0x0000FEFF), then the
1213  first element is ignored and decoding begins with the second
1214  element.
1215  - If the size of an input buffer element is 4 bytes and the value
1216  of the first element is bytes swapped UTF-32 BOM (0xFFFE0000),
1217  then the first element is ignored, decoding begins with the
1218  second element, and input element bytes are swapped before
1219  being decoded.
1220  - In all other cases the first element of the input buffer is
1221  decoded and no byte swapping is performed.
1222 
1223  sInputUTF - [in]
1224  UTF-8 encoded std::string to convert.
1225 
1226 
1227  sInputUTF_count - [in]
1228  If sInputUTF_count >= 0, then it specifies the number of
1229  elements in sInputUTF[] to convert.
1230 
1231  If sInputUTF_count == -1, then sInputUTF must be a zero
1232  terminated array and all the elements up to the first zero
1233  element are converted.
1234 
1235  sUTF8 - [out]
1236  If sUTF8 is not null and sUTF8_count > 0, then the UTF-8
1237  encoded string is returned in this buffer. If there is room
1238  for the null terminator, the converted string will be null
1239  terminated. The null terminator is never included in the count
1240  of returned by this function. The converted string is in the
1241  CPU's native byte order. No byte order mark is prepended.
1242 
1243  sUTF8_count - [in]
1244  If sUTF8_count > 0, then it specifies the number of available
1245  ON__UINT8 elements in the sUTF8[] buffer.
1246 
1247  If sUTF8_count == 0, then the sUTF8 parameter is ignored.
1248 
1249  error_status - [out]
1250  If error_status is not null, then bits of *error_status are
1251  set to indicate the success or failure of the conversion.
1252  When the error_mask parameter is used to used to mask some
1253  conversion errors, multiple bits may be set.
1254  0: Successful conversion with no errors.
1255  1: The input parameters were invalid.
1256  This error cannot be masked.
1257  2: The ouput buffer was not large enough to hold the converted
1258  string. As much conversion as possible is performed in this
1259  case and the error cannot be masked.
1260  4: When parsing a UTF-8 or UTF-32 string, the values of two
1261  consecutive encoding sequences formed a valid UTF-16
1262  surrogate pair.
1263  This error is masked if 0 != (4 & m_error_mask).
1264  If the error is masked, then the surrogate pair is
1265  decoded, the value of the resulting unicode code point
1266  is used, and parsing continues.
1267  8: An overlong UTF-8 encoding sequence was encountered and
1268  the value of the overlong sUTF-8 equence was a valid
1269  unicode code point.
1270  This error is masked if 0 != (8 & m_error_mask).
1271  If the error is masked, then the unicode code point
1272  is used and parsing continues.
1273  16: An illegal UTF-8, UTF-16 or UTF-32 encoding sequence occured
1274  or an invalid unicode code point value resulted from decoding
1275  a UTF-8 sequence.
1276  This error is masked if 0 != (16 & m_error_mask).
1277  If the error is masked and the value of error_code_point is
1278  a valid unicode code point, then error_code_point is encoded
1279  in the output string and parsing continues.
1280 
1281  error_mask - [in]
1282  If 0 != (error_mask & 4), then type 4 errors are masked.
1283  If 0 != (error_mask & 8), then type 8 errors are masked.
1284  If 0 != (error_mask & 16) and error_code_point is a valid unicode
1285  code point value, then type 16 errors are masked.
1286 
1287  error_code_point - [in]
1288  Unicode code point value to use in when masking type 16 errors.
1289  If 0 == (error_mask & 16), then this parameter is ignored.
1290  0xFFFD is a popular choice for the error_code_point value.
1291 
1292  end_element_index - [out]
1293  If end_element_index is not null, then *end_element_index is the
1294  index of the first element in sInputUTF that was not converted.
1295 
1296  If an error occured and was not masked, then *end_element_index
1297  is the index of the element of sInputUTF[] where the conversion
1298  failed.
1299  If no errors occured or all errors were masked, then
1300  *end_element_index is the number of elements in sInputUTF[] that
1301  were converted.
1302 
1303 Returns:
1304  A UTF-XX encoded std::wstring.
1305  The returned string does not have a byte order mark (BOM).
1306 */
1307 ON_DECL
1308 std::wstring ON_UTF_std_string_to_std_wstring(
1309  int bTestByteOrder,
1310  const std::string& sInputUTF,
1311  int sInputUTF_count,
1312  unsigned int* error_status,
1313  unsigned int error_mask,
1314  ON__UINT32 error_code_point,
1315  int* end_element_index
1316  ) ON_NOEXCEPT;
1317 
1318 #endif