gabrielh

Hi!

I have a seemingly easy task that got extremely hairy.
As an input I have xml files that may or may not define a default namespace. the files look like below:

< xml version="1.0" encoding="UTF-8" >
<collection xmlns="http://www.loc.gov/MARC21/slim">
<record>
<controlfield tag="001">1020311</controlfield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">PHOTOLAB</subfield>
</datafield>
<datafield tag="960" ind1=" " ind2=" ">
<subfield code="a">81</subfield>

</datafield>
...
</record>
</collection>

I looked at the information on xml namespaces given here:
http://sinan.ussakli.net/code/xml-namespaces-on-xpath-queries-and-xslt

and tried their hack:

resultPage.Load(@"C:\SoftwareProjects\eBulletinSubmission\MultimediaGallery\test.xml");
XPathNavigator navigator = resultPage.CreateNavigator();
XPathNavigator context = navigator.Clone();
context.MoveToFirstChild();
XPathNodeIterator iterator = navigator.Select("//record", context);

the only difference in my example is that I do not use a prefix, since I don't have one. The result is that it will never find anything. I tried to run the DocumentNsResolver class from the same link above and got the obvious result that my prefix is "". How can I query for this I tried a whole lot of things, without much success for this simple case. I guess the next thing I am doing is just cut out the namespace from the file, extremely ugly, but it seems to be the only thing that really works...

If you have any ideas on this, please let me know.
Thanks!


Re: XML and the .NET Framework prefix for the default namespace

Martin Honnen

Use an XmlNamespaceManager that binds a prefix to the default namespace URI, then use that prefix in your XPath expressions, like this:

Code Snippet

XPathDocument doc = new XPathDocument(@"..\..\XMLFile1.xml");

XPathNavigator navigator = doc.CreateNavigator();

XmlNamespaceManager namespaceManager = new XmlNamespaceManager(navigator.NameTable);

namespaceManager.AddNamespace("pf", navigator.SelectSingleNode("*").NamespaceURI);

XPathNodeIterator nodeIterator = navigator.Select("//pf:record", namespaceManager);

while (nodeIterator.MoveNext())

{

// access nodeIterator.Current here

}






Re: XML and the .NET Framework prefix for the default namespace

gabrielh

Thanks a lot! This solved my initial problem.
Unfortunately I just noticed a new problem.
The xmlns on the collection root element may or may not be set. Either way your code works. But sometimes the xmlns is not set on the collection, but it is set at the record level, in which case I don't get this record.
The one thing that always stays the same is the actual uri of the namespace, also there will always be only one. But it can be placed basically everywhere...
example:
< xml version="1.0" encoding="UTF-8" >
<collection>
<record>
...
</record>
<record xmlns="http://www.loc.gov/MARC21/slim">
</record>
</collection>

I think it might be a solution to add the xmlns (http://www.loc.gov/MARC21/slim) in any case to the root element and add a prefix like you did. Unfortunately it seems to be read-only. Do you know how I could get around there

Thanks!




Re: XML and the .NET Framework prefix for the default namespace

Martin Honnen

If you know the namespace URI in advance (before parsing the XML) then it is easy, just code it in your .NET program e.g.

Code Snippet

XPathDocument doc = new XPathDocument(@"..\..\XMLFile1.xml");

XPathNavigator navigator = doc.CreateNavigator();

XmlNamespaceManager namespaceManager = new XmlNamespaceManager(navigator.NameTable);

namespaceManager.AddNamespace("pf", "http://www.loc.gov/MARC21/slim");

XPathNodeIterator nodeIterator = navigator.Select("//pf:record", namespaceManager);






Re: XML and the .NET Framework prefix for the default namespace

gabrielh

Unfortunately this doesn't work. The code above with the example from my second post gets only the one record tag that has the xmlns defined on the tag itself. Aslong as there is no namespace in the root element everything else is ignored.
What is the difference with this (extremely easy) solution

XmlNodeList list = resultPage.GetElementsByTagName("record");

Do I have a big performance penalty

Thanks!




Re: XML and the .NET Framework prefix for the default namespace

Martin Honnen

I overlooked that you have different record elements in the same document, some being in a namespace, some in no namespace. In that case using GetElementsByTagName("record") or SelectNodes("//*[local-name() = 'record']") should do, I think the .NET framework 1.x has a flaw with its GetElementsByTagName implementation (http://support.microsoft.com/kb/823928/en-us) but that should not hurt you with .NET 2.0.




Re: XML and the .NET Framework prefix for the default namespace

gabrielh

Thanks a lot! I chose to take both solutions like that:

XmlNodeList recordList = resultPage.GetElementsByTagName("record");

foreach (XmlElement record in recordList)
{
XmlNode titleNode = record.SelectSingleNode("*[local-name() = 'datafield'][@tag='245']/*[local-name() = 'subfield'][@code='a']");
}


It's especially nice, since with this local-name() function getting the title boils down to a one-liner, which would be quite some code if I would have to do it with regular xml parsing.