ifeel

Hi guys!
I'm trying to open an UTF-8 encoded file in a C program...my code is the following:

Code Snippet

#include <stdio.h>
#include <windows.h>
int _tmain(int argc, TCHAR *argv[], TCHAR *envp[])
{
FILE *yyin;
wchar_t *buffer=(wchar_t *)malloc(sizeof(wchar_t)*100);
wchar_t *filepath=L"\\Release\\unicode.txt";
_wfopen_s(&yyin,filepath,L"r");
fgetws(buffer,100,yyin);
wprintf(buffer);
fclose(yyin);
SuspendThread(GetCurrentThread());
return 0;
}


my file is encoded with notepad...
I want my program to print some rows of the file on the screen...even if there are unicode characters in it!
my file contains only this row:

Code Snippet

Questo e il mio testo...


(it's italian), but when I execute the program i get:

Code Snippet

i Questo AšE il mio testo...


What's the matter
I'd like to do it in c, not c++....
can you help me
thanks!

Andrea


Re: Visual C++ Language reading utf-8 file with c

crescens2k

The fact that it is outputting the byte order mark says to me that it isn't opening up the file for reading in UTF-8 mode. Have you tried opening the file with the file mode "r, ccs=UTF-8"






Re: Visual C++ Language reading utf-8 file with c

ifeel

When using "r, ccs=UTF-8" in a win32 project it reads correctly BOM, but the character "e" became "T"




Re: Visual C++ Language reading utf-8 file with c

Holger Grund

Chances are, things are read correctly, but output won't work.

The reasonable assumption is that standard streams are opened in text mode with the default codepage. Therefore wide output will need to convert characters into multibyte with wctomb, which ultimately depends on the selected locale.

You may want to try to set the locale to anything but C before the output function, e.g. setlocale("Italian");

-hg





Re: Visual C++ Language reading utf-8 file with c

Giovanni Dicanio (C++ MVP)

Hi ifeel,

the algorithm is quite simple...

You just need to read the UTF-8 encoded text from file, and convert string from UTF-8 to Windows Unicode UTF-16.

You might consider the following code I developed, which seems OK (the input file is saved by Notepad with UTF-8 encoding, and the project must be compiled in Unicode mode, i.e. TCHAR = WCHAR):

Code Snippet

// *** READ UTF-8 ENCODED TEXT ***

#define WIN32_LEAN_AND_MEAN

#include <Windows.h>

#include <tchar.h>

#include <atlbase.h>

#include <atlstr.h>

#include <stdio.h>

// See my MVP web site for this:

// http://www.geocities.com/giovanni.dicanio/vc/index.htm#utf8conv

#include "utf8conv.h"

int _tmain(int argc, TCHAR *argv[], TCHAR *envp[])

{

// Open the input file

FILE * fileHandle = _tfopen( _T("testo.txt"), _T("r") );

// Check for errors ...

ATLASSERT( fileHandle != NULL );

// Buffer for a line (data read as Unicode UTF-8)

static const int lineChars = 100;

char line[ lineChars ];

// Read the line into UTF-8 buffer

fgets( line, lineChars, fileHandle );

// Close the input file

fclose( fileHandle );

fileHandle = NULL;

// Convert UTF-8 line to Windows Unicode UTF-16

Utf8Conversion::CU2W unicodeString( line );

// Print the UTF-16 string

MessageBox( NULL, unicodeString, _T("Reading UTF-8..."), MB_OK );

// All right

return 0;

}

HTH,

Giovanni