RickP

I'm storing data in an xml file that is UTF-8 encoded. When I try to load that file using XMLDocument it gives an error saying it doesn't like on of the characters in the xml file. The character it doesn't like is:

At the top of the xml file I have

< xml version="1.0" encoding="UTF-8" >

The xml file is saved as UTF-8 in Windows. So why doesn't the XMLDocument class give this error when I call the Load() function

Thanks



Re: XML and the .NET Framework XMLDocument and UTF-8

Martin Honnen

First you say "When I try to load that file using XMLDocument it gives an error", then you ask "why doesn't the XMLDocument class give this error when I call the Load() function ". That does not make sense to me, do you get an error from the Load method of the XmlDocument object or do you not get one

How do you create the XML document that you think is UTF-8 encoded

What happens when you load your XML document in an IE browser window, do you get an error about an incorrectly encoded character too






Re: XML and the .NET Framework XMLDocument and UTF-8

RickP

Sorry, I meant does not doesn't in the second part.

Yes, it gives the same error in IE.

What I'm doing is taking some encrypted bytes, converting it to UFT8 (even tried ASCII with same error), and storing it in the xml document, and trying to read it back.

When I create the document I just hand typed it, and copy & paste the bytes that I convert to UTF8 or ASCII using vb.net.

Let me explain my understanding of all this, and maybe you can correct me. UTF-8 has a bunch of characters in it. If I place those characters in an xml tag, I expect that I should be able to read those characters back from that tag using an xml reader (in this case XmlDocument class from .NET 2.0)





Re: XML and the .NET Framework XMLDocument and UTF-8


Re: XML and the .NET Framework XMLDocument and UTF-8

RickP

Can't I do it this way The bytes are just text. I use RijndaelManaged to encrypt some text. It returns a byte array. I convert that array to UTF8, store that in a field. Then reading it back gives the error. Why wouldn't that work



Re: XML and the .NET Framework XMLDocument and UTF-8

Martin Honnen

Well you have tried it that way and it did not work as you have found. I am not familiar with that encryption method you mention but it sounds as if it creates byte sequences that do not fit into an XML document so you have to encode them as suggested.




Re: XML and the .NET Framework XMLDocument and UTF-8

RickP

So a UTF-8 encoded document doesn't recongize all the characters in the UTF-8 character set That seems odd.



Re: XML and the .NET Framework XMLDocument and UTF-8

Martin Honnen

XML does not allow all Unicode characters, certain control characters for instance are not allowed. But as said, I don't know that encryption method. I can't help any further, my suggestion still is to use the base64 or binhex encoding to properly encode those bytes in the XML document.




Re: XML and the .NET Framework XMLDocument and UTF-8

RickP

Ok, thanks. The knowledge is what I was more or less looking for. I wasn't aware that xml doesn't allow some characters. Thanks for explaining that. I'll look into the encoding you suggested.



Re: XML and the .NET Framework XMLDocument and UTF-8

RickP

It looks like Convert.ToBase64String() is my friend here. I can take the bytes returned from the encryption function and pass it into the Convert function to get valid xml characters that I can stop in an xml tag, and easily read it back. I then can convert it back to Byte using Convert.FromBase64String().