JavaScript strings – UTF-16 vs UCS-2?

JavaScript, strictly speaking, ECMAScript, pre-dates Unicode 2.0, so in some cases you may find references to UCS-2 simply because that was correct at the time the reference was written. Can you point us to specific citations of JavaScript being “UCS-2”? Specifications for ECMAScript versions 3 and 5 at least both explicitly declare a String to … Read more

How to solve “unable to switch the encoding” error when inserting XML into SQL Server

This question is a near-duplicate of 2 others, and surprisingly – while this one is the most recent – I believe it is missing the best answer. The duplicates, and what I believe to be their best answers, are: Using StringWriter for XML Serialization (2009-10-14) https://stackoverflow.com/a/1566154/751158 Trying to store XML content into SQL Server 2005 … Read more

Writing utf16 to file in binary mode

Here we run into the little used locale properties. If you output your string as a string (rather than raw data) you can get the locale to do the appropriate conversion auto-magically. N.B.This code does not take into account edianness of the wchar_t character. #include <locale> #include <fstream> #include <iostream> // See Below for the … Read more

JavaScript strings outside of the BMP

Depends what you mean by ‘support’. You can certainly put non-UCS-2 characters in a JS string using surrogates, and browsers will display them if they can. But, each item in a JS string is a separate UTF-16 code unit. There is no language-level support for handling full characters: all the standard String members (length, split, … Read more

UTF-8, UTF-16, and UTF-32

UTF-8 has an advantage in the case where ASCII characters represent the majority of characters in a block of text, because UTF-8 encodes these into 8 bits (like ASCII). It is also advantageous in that a UTF-8 file containing only ASCII characters has the same encoding as an ASCII file. UTF-16 is better where ASCII … Read more

What is the Java’s internal represention for String? Modified UTF-8? UTF-16?

Java uses UTF-16 for the internal text representation The representation for String and StringBuilder etc in Java is UTF-16 https://docs.oracle.com/javase/8/docs/technotes/guides/intl/overview.html How is text represented in the Java platform? The Java programming language is based on the Unicode character set, and several libraries implement the Unicode standard. The primitive data type char in the Java programming … Read more