DynamoDB HTTP multibyte character problem

anonymous_user_602b70da · February 12, 2016, 2:50pm

I’ve recently integrated AWS DynamoDB into my code using http requests to put/get/update data from tables.

This seemed to work pretty well until I added a multibyte character to the content data section. At this point Dynamo doesn’t understand the data.

Here’s my AWS thread.

The code below shows the final step getting the jSon (which is well formatted and works for single byte characters). body is a FString with the jSon formatted DB UpdateItem request information.

When I add a single two byte UTF8 character the output shows the utf8 data to be one byte larger than body.Len().

		FBufferArchive BinaryArrayArchive;
		FTCHARToUTF8 bodyStrUtf8(*body);
		BinaryArrayArchive.Serialize((UTF8CHAR*)bodyStrUtf8.Get(), (bodyStrUtf8.Length()) * sizeof(UTF8CHAR));

		Outputf(_T("{%d,%d,%d}"), body.Len(), bodyStrUtf8.Length(), BinaryArrayArchive.Num());
		request->SetContent(BinaryArrayArchive);

Looking inside BinaryArrayArchive in the debugger I see the characters with the appropriate encoding. However the response is a rejection (not a very good one - seems to be a default DynamoDB your syntax is broken dude response).

I was wondering if the HTTP code does anything to the content string under the hood before sending it. This seems unlikely to me but I can’t think of what else the problem might be from my side.

joeGraf · February 12, 2016, 5:39pm

You shouldn’t send binary payloads as that can cause endian issues. You should use JSON, Base64, or some other string format so that the OS/CPU are not trying to interpret things natively.

anonymous_user_602b70da · February 12, 2016, 6:05pm

Actually that wasn’t the problem in this case - though all my prior networking experience has been over very binary extremely compressed UDP connections so I’ll happily take better instruction in how to ensure utf8 formatted strings as content in the http request system.

There’s more detail in the AWS thread but body is a JSon string built based on the AWS Dynamo DB format for the command in question (PutItem and UpdateItem were causing me problems).

The problem here was that I was calculating content-length based on the string length, not the utf8 formatted data. AWS was then rejecting my request - presumably because within the length I was sending it just wasn’t valid json within the length I was telling them existed (would have been missing N characters at the end where N == the extra bytes needed for the UTF8 format).

If you set “ContentAsString” does it automatically convert to UTF-8 and if you then get it’s length is it the UTF-8 encoded length? If not I don’t think that works in this case.

ExtraCredit:

For the AWS request I changed my code to:

	FBufferArchive BinaryArrayArchive;
	FTCHARToUTF8 bodyStrUtf8(*body);
	BinaryArrayArchive.Serialize((UTF8CHAR*)bodyStrUtf8.Get(), (bodyStrUtf8.Length()) * sizeof(UTF8CHAR));

	request->SetContent(BinaryArrayArchive);
	request->SetHeader(_T("content-length"), FString::Printf(_T("%d"), BinaryArrayArchive.Num()));

anonymous_user_602b70da · February 16, 2016, 3:42pm

Hey Joe, the reason I was serializing my own (jSon) buffer was to get around httprequest->GetContent().Num() not returning a valid value. Seems that GetContentLength works.

    Outputf(_T("%d vs %d"), request->GetContentLength(), request->GetContent().Num());

Output on IOS is:
LogTemp:Display: 50 vs 0

50 was correct in this case.

Funnily enough I was only doing this work to set content-length, which I now realize is set internally (the only header that is set according to code comments on SetHeader).

On top of that I had some problems in my AWS signature generation (the way payload hash was generated) that meant the hash was incorrect on IOS, so it was a double wammy that I suspect set me off in the wrong direction.

Seems to be working for now…

joeGraf · February 17, 2016, 2:46pm

Ah, I see. Glad you sorted it out. We should figure out how to make the content length part of this issue more clear. Thanks for updating this