Monday, 4 April 2016

Groovy XML traverse

Now we have some XML and we need to print out the subfield's code and text under datafield with tag 852.  In this case, the output expected is [b PIC, h test2]

 class XmlTraverse {  
   def String xml = """  
     <response>  
       <marcRecord>  
         <leader>00167nx a22000854 4500</leader>  
         <controlfield tag="001">4000089</controlfield>  
         <controlfield tag="004">3569260</controlfield>  
         <controlfield tag="005">20160330130804.0</controlfield>  
         <controlfield tag="008">1603300u  0  4000uueng0000000</controlfield>  
         <datafield ind2=" " ind1="8" tag="852">  
           <subfield code="b">PIC</subfield>  
           <subfield code="h">test2</subfield>  
         </datafield>  
         <datafield tag="954" ind1="" ind2="">  
           <subfield code="a">NLA</subfield>  
         </datafield>  
       </marcRecord>  
     </response>  
   """ 
 }  

First attempt, find the 852 tag datafield, under that datafield, find all subfields, use collect to transform to a List

 import groovy.util.XmlSlurper  
 import groovy.util.slurpersupport.GPathResult  
 import groovy.util.slurpersupport.NodeChild  
 import groovy.util.slurpersupport.NodeChildren;  
 class XmlTraverse   
   def test(){  
     def response = new XmlSlurper().parseText(xml)  
     def datafield852 = response.marcRecord.'*'.find { node->  
       node.name() == 'datafield' && node.@tag == '852'  
     }  
     def subfields = datafield852.'*'.findAll { node ->  
       node.name() == 'subfield'  
     }  
     def subfieldsCodeAndValue = subfields.collect { node ->  
       "" + node.@code + " " + node.text()  
     }  
     println subfieldsCodeAndValue  
   }  
 }  

It's bad because A. It's too long,  B. if the tag doesn't exist, it throws a ClassCastException.

Here come the 2nd attempt. Find all the subfields with a parent's tag value equal to 852. Now even if tag 852 didn't exist, it would not break, printing out an empty list.

   def test2(){  
     def response = new XmlSlurper().parseText(xml)  
     def List subfieldsValue = response.marcRecord.datafield.subfield.findAll { node->  
       node.parent().@tag == '852'  
     }.collect{"" + it.@code + " " + it.text()}  
     println subfieldsValue  
   }  

If we just to want to print the text, we can take advantage of the asterisk operator.

   def test3(){  
     def response = new XmlSlurper().parseText(xml)  
     def List subfieldsValue = response.marcRecord.datafield.subfield.findAll { node->  
       node.parent().@tag == '852'  
     }*.text()  
     println subfieldsValue  
   }  

Reference: Processing XML