tester_abc

Hi,

I'm having difficulty printing strings of TCHAR[] arrays into a file using CreateFile & WriteFile.

When I try to write TEXT("1234567") to a file using WriteFile, the result looks like this in the file when opened with notepad or wordpad:

1 2 3 4 5 6 7

There is a space between every character, which makes sense since I'm using UNICODE character sets, but I don't want it to look like this when I'm writing to a file, is there a way to convert between the two character sets

I've been trying to come up with a snippet of code to convert TCHAR into char, but no luck so far as the VC++ won't let me declare a variable length array and won't let me access TCHAR[] array by index. For example, I get a run-time error saying that "Access violation writing location" when I do this:

myConvertFunction(TCHAR * str, int lengthOfStr)
{
char tempStr[lengthOfStr]; // <--------- this causes a compiler error

str[1] = str[2]; //<-------------this causes a run-time Access violation error
}

Does anyone have any ideas for me I just want to be able to print TCHAR in UNICODE set properly into a file so that when viewed in wordpad or notepad that there are not spaces between all the characters.


Re: Visual C++ Language Printing strings of TCHAR[] into a file

tester_abc

I found out that if you cast each character in the TCHAR array into a char then copy it into a newly declared char array with allocated size then it successfully converts a DBCS to a SBCS.




Re: Visual C++ Language Printing strings of TCHAR[] into a file

Holger Grund

When Windows talks about Unicode, which is strictly speaking not an encoding, and doesn't say otherwise it actually means little-endian UTF-16 (even though there's no real support for characters outside the BMP).

With _UNICODE & UNICODE, TCHAR translates to wchar_t and TEXT("xy") translates to L"xy".

Obviously a wchar_t being 16-bits wide can represent more characters than a char. Unlike wchar_t, the interpretation of char depends on the codepage it is represented in.

Utlimately, there is are the CRT functions wctomb(s)[_l] and mbtowc(s)[_l] and the corresponding OS APIs WideCharacterToMultiByte and MutiByteToWideCharacter. Both convert from wide character (wchar_t) to multibyte encodings (char*) with a given codepage (code page in the OS, locale in the CRT). Many conversion functions assume the currently active thread code page (OS) or the currently active locale (CRT) for the interpretation of char(s).

With all that being said, there are several ways to convert things. Some of the C++ string wrappers come with builtin conversion, then there are the ATL conversion macros (CT2CA[EX] etc.) and the functions mentioned above. Also Windows APIs with strings in the signature generally come in two variants, e.g.:

CreateFileA: takes const char* assumes in format of current thread ACP

CreateFileW: takes const wchar_t*

CreateFile : macro defined to CreateFileW in UNICODE builds CreateFileA otherwise.

It's perfectly valid to use CreateFileA (if you have a const char* which is to be interpreted with the thread's ACP) or CreateFileW (if you have a const wchar_t*) explicitly.

Generally, the XxxA variants are just wrappers around the XxxW variants that do the conversion of string parameters and return values.

What you do, simply casting element by element will work for standard characters because, that's the way Unicode is designed. It will, however, silently do the wrong thing for some special characters, even if a mapping is present in the target code page.

Lastly, array declarations need to have a fixed size in the form of a constant expresson in C++ & C90. You use alloca or dynamic allocation for that. Or better just use one of the many string wrapper classes VC++ provides (std::basic_string, CString, ...) Also, in C++ character literals convert implicitly to char* (or wchar_t*) for compatibility reasons with C. However, there contents are really immutable and trying to alter their contents will result in an access violation with the default compiler settings.

-hg





Re: Visual C++ Language Printing strings of TCHAR[] into a file

Giovanni Dicanio (C++ MVP)

Hi tester_abc,

if you want to write Unicode text data to a file, you should choose an encoding for Unicode (there are some options, like UTF-8, and UTF-16LE, etc.)

Personally, I like using UTF-8 as encoding out of the bounds of the application, and UTF-16 inside the app.

This is because Unicode UTF-16 is Windows "natural" default Unicode encoding format. So, I think that it makes sense to use this Unicode format in Widows apps.

But UTF-16 has some problems, like the endiannes of the host machine, (UTF-8 is just UTF-8, there is no UTF-8 big-endian or UTF-8 little-endian; on the other sinde, there are UTF-16LE and UTF-16BE...), etc. So, I prefer using UTF-8 *outside* the app , e.g. when saving text data to an external file, or when sending that data across the net.

Note that UTF-8 is kind of the Unicode de facto standard for the Web.

So, I like storing or sending Unicode text data as UTF-8 out of the app, and this data is converted to UTF-16 on app's bounds.

You might find the following link to be interesting for you:

http://www.geocities.com/giovanni.dicanio/vc/index.htm#utf8conv

there you can find sample C++ code that implements what I've tried to explain in this post.

HTH,

Giovanni