Adah

For the following program (encoded in Latin1):

#include <stdio.h>

#ifdef _MSC_VER
#define POUND 0xA3
#else
#define POUND '¡ê'
#endif

int main()
{
printf("%c\n", POUND);
return 0;
}


I get the following message under Chinese Windows:

test.c : warning C4819: The file contains a character that cannot be represented in the current code page (936). Save the file in Unicode format to prevent data loss

The point is that the offending line is not designed to work with MSVC. I do not think it right to give such a warning.

I found this when compiling a cross-platform project. It has code like above for other platforms. Earlier versions of MSVC do not report this error.

The suggestion is useless in my case (won't work). What is more, even for MSVC, the suggestion seems problematic.--When I commented the preprocessor lines except the offending one and saved the file in UTF-16, the character literal passed to printf becomes ` '.--Should the build environment affect the code in such a significant way while providing no controlling options (no encoding or charset options have been found) If I do not set the locale to English and reboot, it seems simply not possible to work without make significant changes to the code (like wprintf(L"%c\n", L'¡ê');).

Any suggestions/work-arounds/fixes



Re: Visual C++ Express Edition Bug?: Unwanted C4819

orcmid

Well, you know, it is a warning. I don't think VC++ 2005 Express Edition has any way to tell that you are using Latin1. I think that is why it checks the default code page you are running under.

There's probably a #pragma clause that you could add to your #if material to suppress that particular warning.

I think we are seeing the difficulties of moving to Unicode while our systems don't support Unicode at the codepage and console application level and it is not possible to tell which non-Unicode format is being used on a computer except by checking the default code page setting. Sometimes, the programmer has to guess wrong in attempting to have a behavior that works most of the time.

- Dennis

MORE ANALYSIS:

Because the Forum web page is in UTF-8, I was able to paste your program into a blank C Language file using VC++ 2005 Express Edition on my code-page 437 configuration. I see the pound-Sterling symbol without any trouble, and I don't see any warning when I compile. But the program shows me a single "u" symbol when I compile it and run it in a console window. This is exactly what I deserve for character code 0xA3, of course, because that's what code-page 437 has to offer (its code for the pound-Sterling symbol being 0x9C).

So what do you see in your program's output when you compile and run it with VC++ under default codepage 936 Surely not '¡ê'

This article digs into some of the same problems you are reporting: http://forums.microsoft.com/MSDN/ShowPost.aspx PostID=306497&SiteID=1

I think one of the reasons for the warning is that the character literal '¡ê' (with 0xA3 followed by 0x27) is not a valid code-page 936 sequence and creating '¡ê' == 0xA1EA is apparently too over the edge (and won't produce the desired result anyhow), so it creates 0xFFFFFFA3 instead and then printf uses only the low-order byte.

I don't think the ' ' comes from the compiler. It comes from the run-time system when it doesn't recognize a code for the code-page you have installed. (I ran into all of this fussing with codepage 932 recently.)

The message is misleading. The advice about saving the file (that is, your test.c file) in Unicode format (UTF-8) is fine, but not so easy to do. You can use notepad to do it, and if the file is pasted into a blank page the way I did with your sample, it appears that the file is indeed saved in UTF-8 (the pound-Sterling symbol saves and restores just fine). The compiler can't tell you are using Latin1 because there are no markers that indicate the intended code as there are for UTF-8 and UTF-16 files on Windows.






Re: Visual C++ Express Edition Bug?: Unwanted C4819

orcmid

Addendum: You can get ' ' substitutions from the compiler, I've seen them when printing out strings in hex to see what is going on. I lose track of all the ways things can come unstuck when working with different code pages. I also don't think the compiler behavior is all that consistent, so expect more tweaking as we move into the Vista era.

The thing about the warning is that this is because there is an effort to deal with codes more carefully in international settings. This is definitely one of those things where whatever they do, it won't be right for everyone.

- Dennis






Re: Visual C++ Express Edition Bug?: Unwanted C4819

Adah

Well, you know, it is a warning. I don't think VC++ 2005 Express Edition has any way to tell that you are using Latin1. I think that is why it checks the default code page you are running under.

There's probably a #pragma clause that you could add to your #if material to suppress that particular warning.

Two points:

1) The warning seems to be from the preprocessor: after preprocessing the code is removed.

2) Probably because of above it cannot be disabled by #pragma, which works after preprocessing.

So what do you see in your program's output when you compile and run it with VC++ under default codepage 936 Surely not '¡ê'

Of course not. This is just a fake example, instead of the real code. In real code it could be converted to Unicode and output by TextOutW. And my point is that the building environment needs not to be the same as run-time environment (which might use really the English locale).

I don't think the ' ' comes from the compiler. It comes from the run-time system when it doesn't recognize a code for the code-page you have installed.

It is from the compiler. I checked the assembly output by /Fa :-).

The message is misleading. The advice about saving the file (that is, your test.c file) in Unicode format (UTF-8) is fine, but not so easy to do. You can use notepad to do it, and if the file is pasted into a blank page the way I did with your sample, it appears that the file is indeed saved in UTF-8 (the pound-Sterling symbol saves and restores just fine). The compiler can't tell you are using Latin1 because there are no markers that indicate the intended code as there are for UTF-8 and UTF-16 files on Windows.

Exactly. The advice is useless really. Saving the file as Unicode eliminates the warning, but producing code differently. And it is not feasible at all for cross-platform coding.

Do notice that in my original case (and example) the character literal is not used on Windows. It is really for EBCDIC systems! Since it is not used, one does not expect anything from the compiler, let alone an overwhelming (my original case occurs in a header file), unsuppressable warning!





Re: Visual C++ Express Edition Bug?: Unwanted C4819

orcmid

 Adah wrote:
Exactly. The advice is useless really. Saving the file as Unicode eliminates the warning, but producing code differently. And it is not feasible at all for cross-platform coding.

Do notice that in my original case (and example) the character literal is not used on Windows. It is really for EBCDIC systems! Since it is not used, one does not expect anything from the compiler, let alone an overwhelming (my original case occurs in a header file), unsuppressable warning!

Two points:

1. #pragma warning(disable: 4003) applies to a Preprocessor warning and it works.  I think you should try to disable the warning you don't like.  (I can't verify it myself because I don't have anything that produces the 4819 warning.)

2. VC++ 2005 Express Edition is not intended for cross-platform program creation.  In fact, it won't work if you use the standard libraries.  The product is intended for targeting the Microsoft Windows platform only, and that is even one of the terms of use in the EULA.

I think maybe you meant that your source code is intended for compiling on different platforms.  I write code like that too.  But problems about different interpretations of the program-file character encodings (unless you store your program file in UTF-16 or UTF-8 and those are recognized properly by different platforms and compilers) are not something anyone can fix easily cross-platform. 

I, for one, favor the effort Microsoft is making to overcome the problems of character-set encoding/interpretation in interchange of source code and also provide warnings when there may be problems preserving the user's intention.  I don't think it is easy to get right (or necessarily possible to "get right"), but I applaud the effort.

 - Dennis

[Afterthought: There probably needs to be careful annotation of presumed character-set encodings in source files so that others have a chance of understanding when they may not be seeing the file properly on their machine and that it might not be compiling the character codes properly either.  I don't know how to deal with the dependencies between the VC++ IDE, the VC++ compiler, and the compiling-machine default codepage conflicting with that where the program is operated.  It looks like one must use the Windows-specific, code-page sensitive input-output operations, properly isolated for portability of the code.  My brain hurts thinking about navigating that morass properly, with proper verification of code execution on systems with different default code pages.  It strikes me that the ultimate work-around is using Unicode as a reference for the intended characters, and then adjusting as needed for the command-line applications actual setting.  And GUI WinApps and .NET/CLI applications can just use the Unicode, praise be.]

[After-Afterthought.  It is probably no consolation, but I explore the roots of all of this in a blog post made in March: http://orcmid.com/BlunderDome/clueless/2006/03/what-we-see-is-not-what-we-get.asp I did not address the specific issue of dealing with code pages.]






Re: Visual C++ Express Edition Bug?: Unwanted C4819

Adah

1. #pragma warning(disable: 4003) applies to a Preprocessor warning and it works. I think you should try to disable the warning you don't like. (I can't verify it myself because I don't have anything that produces the 4819 warning.)

No, #pragma warning(disable: 4819) does not work. If disabling 4003 works for Preprocessor, it is more like a bug now.

I think maybe you meant that your source code is intended for compiling on different platforms. I write code like that too. But problems about different interpretations of the program-file character encodings (unless you store your program file in UTF-16 or UTF-8 and those are recognized properly by different platforms and compilers) are not something anyone can fix easily cross-platform.

Yes, that is what I intended. The trick that used to work is carefully #ifdef. It does not work now.

BTW, I've found another #pragma that should help here but does not. I added

#pragma setlocale("english_usa.1252")

to the beginning of the test program, and it still does not work.




Re: Visual C++ Express Edition Bug?: Unwanted C4819

orcmid

That's disappointing.

Here's something I just stumbled on, though.

With a program open in the VC++ IDE text editor, click on menu selection File | Advanced Save Options ...

- Dennis






Re: Visual C++ Express Edition Bug?: Unwanted C4819

I already have a name

> 2. VC++ 2005 Express Edition is not intended for cross-platform program creation. In fact, it won't work if you use the

I am experiencing this problem on a full copy of "Visual Studio 2005 Professional Edition" and it is being caused by a file supplied by Microsoft in their platform SDK ("C:\Program Files\Microsoft Visual Studio 8\VC\PlatformSDK\include\uuids.h"). So not cross-platform compiling involved there.

I am also seeing the problem in the QuickTime SDK headers, but in this case we want to preserve the exact character value not convert it to anything else (they are being embedded in strings that will be used with binary files). The worst case so far is a character with hex value "0xbd" in a string literal somehow being converted into a new line - but this only happens sometimes which makes me think the compiler is picking random values (uninitialised variable anyone ).

Also the copyright character in header comments seems to upset the compiler even though it is discarding it since it is a comment, but lots of third parties seem to like putting this character in their header.




Re: Visual C++ Express Edition Bug?: Unwanted C4819

Soskywalkr

I am also having the same issue - with MS VS 2005 Professional Edition. In specific, we use a lot of GNU code that does have copyright symbols (In addition to a few in house pages that are shared with our clients).

That said, when I add

#pragma warning( disable: 4819 )

in a file __before__ the offending file is even referenced, the warning is no longer issued. Perhaps this is the trouble The warning is issued by the tokenizing step in the preprocessor - so it hasn't even gotten to reading the pragma statement in the same file

Not sure. This is an open issue for us & the long term solution does not involve include a header file every time I want to use a copyright symbol.

On a side note, I do think that warnings of this nature can somehow be directly disabled in the compiler properties - & I know you can also pass them as command line parameters.

- Matt




Re: Visual C++ Express Edition Bug?: Unwanted C4819

orcmid

I already have a name wrote:
> 2. VC++ 2005 Express Edition is not intended for cross-platform program creation. In fact, it won't work if you use the

I am experiencing this problem on a full copy of "Visual Studio 2005 Professional Edition" and it is being caused by a file supplied by Microsoft in their platform SDK ("C:\Program Files\Microsoft Visual Studio 8\VC\PlatformSDK\include\uuids.h"). So not cross-platform compiling involved there.

These are code-page problems for all of the VC++ and VS editions. The 2005 editions provide more diagnostics and warnings. The 2005 editions (and presumably the 2008 editions) also default to compiling for Unicode, in contrast to the earlier editions.

Any time a non-ASCII (0x20 to 0x7e) character code is found, it will be treated differently based on code-page settings of the system and the compiler. (I have no idea why uuids.h has such a character, but the invalid characters in the comments on lines 483, 487, and 491 are safely replaced by spaces. I never use uuids.h because it breaks version protection in addition to the fact that "This is a real pain in the neck" - line 1047.)

Conflicting code-page assumptions will impact the pre-processor and the compiler and also the handling of character literals. The recommended approach is to use UTF-8 (or UTF-16 if you are brave) in source files and use Unicode (wchar_t) strings in the running program. Otherwise, you will always have to deal with code-page dependencies of different compiler configurations and the machine they are installed on. This is not a problem that Visual Studio created, it is one that Visual Studio has made an effort to solve and for which there are tighter filters in the 2005 editions.

There is no automatic solution for different 8-bit (single-byte and double-byte) code-page dependencies from one compilation to another. You will need to repair that in the project configuration settings.

- Dennis






Re: Visual C++ Express Edition Bug?: Unwanted C4819

Adah

Conflicting code-page assumptions will impact the pre-processor and the compiler and also the handling of character literals. The recommended approach is to use UTF-8 (or UTF-16 if you are brave) in source files and use Unicode (wchar_t) strings in the running program.

Does VS 2005 support UTF-8 You meant it requires a BOM character, right

Otherwise, you will always have to deal with code-page dependencies of different compiler configurations and the machine they are installed on. This is not a problem that Visual Studio created, it is one that Visual Studio has made an effort to solve and for which there are tighter filters in the 2005 editions.

I doubt how you can say this. The obvious solution is allow the encoding to be set in configuration, command line, or as a pragma. Microsoft has a pragma that can be used for this purpose (locale).

BTW, Java has been allowing setting the source encoding on the command line for many years. The world is flat, one can no longer assume the machine building some code should have the same locale setting as the code author.





Re: Visual C++ Express Edition Bug?: Unwanted C4819

orcmid

Adah wrote:

Does VS 2005 support UTF-8 You meant it requires a BOM character, right

The VC++ 2005 Express Edition IDE supports source-code text files in UTF-8. Here are settings to review in the IDE Tools | Options ... selection.

  1. Under Environment | Documents check the option to "Save documents as Unicode when data cannot be saved in codepage"
  2. Under Text Editor | General check the option to "Auto-detect UTF-8 encoding without signature" (if you might be using UTF-8 files from systems and applications that do not use a Byte Order Mark [BOM]).

When you have a file open in the VC++ IDE editing window, select the File | Advanced Save Options ... menu item. You will see the ANSI codepage default that is used when new files are saved (e.g., on my system it is "Western European (Windows) - Codepage 1252"). You can change that to another encoding. My recommendation is "Unicode (UTF-8 with signature) - Codepage 65001". You can use this to force a file to be resaved in UTF-8 also.

- Dennis

I have only addressed your question with regard to the IDE and the compiler system. This is different than creating programs that use Unicode, that have their char literals ('a') and string literals ("abc") converted to Unicode in the compiled code, etc., and that use API functions that accept and deliver Unicode data (e.g., in wchar_t arrays rather than char arrays).

PS: If a source file is already in UTF-8 encoding (preferably with the BOM to avoid ambiguity) or UTF-16, it is recognized as such and the default save encoding will be the same. I author much of my C/C++ code outside of the VC++ IDE in an editor that defaults to UTF-8 for me, so I hadn't noticed this. It just worked. (My text files use the Unicode Copyright symbol, not the code page 1252 one.) I also tend to name the encoding in the top comment line of a file so that someone can correct their software to view/process it properly if necessary.






Re: Visual C++ Express Edition Bug?: Unwanted C4819

Dion Campbell

To disable specific warnings, enter a semi-colon delimited list of warning numbers under 'Disable Specific Warnings' under 'Tools->[Project Name] Properties->Configuration Properties->C/C++->Advanced'. I have had the C4819 problem with files containing kanji characters, and disabling the warning in this fashion correctly suppresses the warning.

- Dion.