Encoding issues with iOS and SQLITESupport

I’ve posted in a few places about this over the Thanksgiving weekend. I’ve been struggling with using the SQLiteSupport module which uses utf-16 sqlite3 functions. On Windows, everything works as expected, you can create an FString to the path of your database, send it to the Database.Open function and it will create a database. The exact same code, will fail on iOS.

Specifically, sqlite will return an error code of ‘14’ which means it can not open the database. At first this led me down a rabbit chase believing it had to do with file permissions and the path. However, finally in a desperate attempt, I tried using the UTF-8 functions, and it worked for opening a database. This when I realized that encoding was the likely issue.

First I tried converting the TCHAR to NSString and use it to encode it to UTF-16LE using this:

#if PLATFORM_IOS
const TCHAR* EncodedString = (const TCHAR )([[[NSString stringWithFString : UTF16EncodedString] dataUsingEncoding:NSUTF16LittleEndianStringEncoding] bytes]);
#else
const TCHAR EncodedString = *UTF16EncodedString;
#endif
return EncodedString;

(The discussions started here)

This partially worked. I could now send the result string to sqlite3_open16 and it would create a database at the right path, only issue is, it would add a bunch of ‘junk’ characters at the end of the path name, usually question marks and the occasional japanese or chinese characters. So if I ran the code repeatedly, maybe 4 times, I would end up with 4 different databases despite pointing it to the same place. Sometimes it would even create the correct string but like I said it would be pretty random so unreliable, also querying the database was possible using once again these conversion functions, but the data coming back was nonsense just '???'s and other random characters, and I couldn’t find a way to convert this back into readable form (though as I said before, all this works on windows with 0 issues).

Someone suggested there might not be a null ending, I tried adding a terminator but it didn’t do anything.

Lastly, stephenwhittle who I had been discussing the issue with in the pull request on github, found that perhaps the internal representation of the TCHAR was UTF-32 on iOS (which don’t quote him on it, I’m paraphrasing my understanding of his understanding, look at the link below to see what he has said on github). I haven’t looked too far into this, and I’m not really sure what would be the best way to go from UTF-32 to UTF-16 LE and if there is a way to do this using Unreal’s libraries, and if there is a way, I would need to be able to do the reverse to.

Any suggestions or ideas from people with deeper knowledge of the UE4 engine?

My last ditch effort would be to just convert everything to UTF-8 for iOS and use the UTF-8 functions but I would really like a cross platform UTF-16 solution so any help I can get on this would go a long way.

Thank you!

TCHAR is 4 bytes on iOS and 2 bytes in Windows, because those are the native OS wide character sizes.

However they aren’t UTF16 or UTF32, they are UCS2 and 4 ( I think that’s what it’s called). UTFn implies a smaller char size can be variably grown to support some characters. In other words the string length does not equal number of bytes divided by char size.

TCHAR is always uniform size, so length of string will be equal number of bytes divided by char size. Hope that makes sense.

You cannot ever use UTF16 as a TCHAR. It’s not a thing unreal understands. Use UTF8 for all platforms with your library. Unreal has conversion routines to/from UTF8 (TCHAR_TO_UTF8, Etc)

Josh

Hi Josh, I implemented the library/PR for the SQLite support.

Thanks for the helpful response - I’ll see about modifying the module to use the UTF8 API.

Any chance we can get this information added to https://docs.unrealengine.com/latest/INT/Programming/UnrealArchitecture/StringHandling/CharacterEncoding/index.html ? It claims ‘All strings in Unreal Engine 4 are stored in memory in UTF-16 format as FStrings or TCHAR arrays.’

Howdy Twiddle,

I have placed a Documentation report UEDOC-693 into our database so that it may resolved at a later time.

Thanks and have a great day!

It’s confusingly worded, in that page, it does say that “Unreal’s internal encoding is more correctly described as UCS-2.” (and in fact UCS-4 [if there is such a thing] on Apple platforms, since TCHAR is 4 bytes there).

Sorry for the confusion, as says, it’s in the queue to be cleaned up by doc folks :slight_smile:

thanks for the quick support!

Just a heads up - This pull request implements the switch to UTF8.