This took me a while to find so I've posted an article below detailing how to parse or extract values from a string containing XML code.
Why?
I'm working with a system which stores XML strings in a database and rather than a separate file, it stores these in a row.
How?
Assuming the following data exists in the column "Event_XML" value of the table "XML_EVENTS":
<STAFF xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<EMPLOYEE_NUMBER>123456</EMPLOYEE_NUMBER>
<TITLE>Mr</TITLE>
<INITIALS>J</INITIALS>
<FORENAME>JOEL</FORENAME>
<SURNAME>LIPMAN</SURNAME>
<GENDER>MALE</GENDER>
<DATE_OF_BIRTH>1990-01-01</DATE_OF_BIRTH>
<DEPARTMENT_DESC>SUPER SUPPORT TEAM</DEPARTMENT_DESC>
<JOB_TITLE_DESC>GENERAL DOGSBODY</JOB_TITLE_DESC>
<ORIGINAL_DATE_JOINED>2014-03-01</ORIGINAL_DATE_JOINED>
<CURRENT_EMPLOYEE financialyear="2014">Y</CURRENT_EMPLOYEE>
<INTERNAL_EMAIL xsi:nil="true" />
<CHANGE_TYPE>INSERT</CHANGE_TYPE>
<CHANGE_DATE>2014-03-27</CHANGE_DATE>
</STAFF>
Extracting the Gender:
We're going to extract the value in between the opening and closing XML tags:
-- Using .query() SELECT CAST(CAST(Event_XML AS XML).query('data(/STAFF/GENDER)') AS VARCHAR(10)) AS Gender FROM [dbo].[XML_EVENTS] -- Returns column GENDER for every record/row (eg. "Male")
- -- Using .query()
- SELECT
- CAST(CAST(Event_XML AS XML).query('data(/STAFF/GENDER)') AS VARCHAR(10)) AS Gender
- FROM
- [dbo].[XML_EVENTS]
- -- Returns column GENDER for every record/row (eg. "Male")
-- Using .value() SELECT CAST(Event_XML AS XML).value('data(/STAFF/GENDER)[1]','VARCHAR(10)') AS Gender FROM [dbo].[XML_EVENTS] -- Returns column GENDER for every record/row (eg. "Male")
- -- Using .value()
- SELECT
- CAST(Event_XML AS XML).value('data(/STAFF/GENDER)[1]','VARCHAR(10)') AS Gender
- FROM
- [dbo].[XML_EVENTS]
- -- Returns column GENDER for every record/row (eg. "Male")
Extracting the attribute:
Here we want to extract an attribute value
SELECT CAST(CAST(Event_XML AS XML).query('data(/STAFF/CURRENT_EMPLOYEE/@financialyear)') AS INT(4)) AS FinancialYear FROM [dbo].[XML_EVENTS] -- Returns attribute "financialyear" of CURRENT_EMPLOYEE (eg. "2014")
- SELECT
- CAST(CAST(Event_XML AS XML).query('data(/STAFF/CURRENT_EMPLOYEE/@financialyear)') AS INT(4)) AS FinancialYear
- FROM
- [dbo].[XML_EVENTS]
- -- Returns attribute "financialyear" of CURRENT_EMPLOYEE (eg. "2014")
In a WHERE clause
To select records matching a value for the XML field:
SELECT * FROM [dbo].[XML_EVENTS] WHERE CAST(Event_XML AS XML).value('(/STAFF/GENDER="MALE")[1]','VARCHAR(10)')='TRUE' AS Gender -- Returns records where GENDER = "Male"
- SELECT
- *
- FROM
- [dbo].[XML_EVENTS]
- WHERE
- CAST(Event_XML AS XML).value('(/STAFF/GENDER="MALE")[1]','VARCHAR(10)')='TRUE' AS Gender
- -- Returns records where GENDER = "Male"
More Examples (Note that Event_XML is of datatype XML in the following examples):
-- exist() returns 1 or 0 if node exists or not respectively SELECT Event_XML.exist('/STAFF/EMPLOYEE_NUMBER') FROM [dbo].[XML_EVENTS] -- query() returns typed XML SELECT Event_XML.query('/STAFF/EMPLOYEE_NUMBER') FROM [dbo].[XML_EVENTS] -- value() returns single value, specify data type as second parameter SELECT Event_XML.value('(/STAFF/EMPLOYEE_NUMBER/@name)[1]','varchar(20)') FROM [dbo].[XML_EVENTS]
- -- exist() returns 1 or 0 if node exists or not respectively
- SELECT Event_XML.exist('/STAFF/EMPLOYEE_NUMBER')
- FROM [dbo].[XML_EVENTS]
- -- query() returns typed XML
- SELECT Event_XML.query('/STAFF/EMPLOYEE_NUMBER')
- FROM [dbo].[XML_EVENTS]
- -- value() returns single value, specify data type as second parameter
- SELECT Event_XML.value('(/STAFF/EMPLOYEE_NUMBER/@name)[1]','varchar(20)')
- FROM [dbo].[XML_EVENTS]
Issues:
- XQuery [value()]: 'value()' requires a singleton (or empty sequence), found operand of type 'xdt:untypedAtomic *' This is because more than one record was returned so you need to surround with brackets and append a [1] as per the above example.
Retrieving nodes
-- to get all below nodes from root node - STAFF SELECT Event_XML.query('/STAFF') FROM [dbo].[XML_EVENTS] -- to get all element nodes named 'EMPLOYEE_NUMBER' SELECT Event_XML.query('//EMPLOYEE_NUMBER') FROM [dbo].[XML_EVENTS] -- to get employee node with ID = 1 SELECT Event_XML.query('//EMPLOYEE_NUMBER[@ID = 1]') FROM [dbo].[XML_EVENTS] -- to get employee node at position = 1 from root node SELECT Event_XML.query('/STAFF/EMPLOYEE_NUMBER[position()=1]') FROM [dbo].[XML_EVENTS] -- to get employee node at last position from root node SELECT Event_XML.query('/STAFF/EMPLOYEE_NUMBER[last()]') FROM [dbo].[XML_EVENTS] -- to return unknown nodes from root node - STAFF SELECT Event_XML.query('/STAFF/*') FROM [dbo].[XML_EVENTS] -- to return unknown nodes from XML column - Event_XML SELECT Event_XML.query('//* ') FROM [dbo].[XML_EVENTS]
- -- to get all below nodes from root node - STAFF
- SELECT Event_XML.query('/STAFF')
- FROM [dbo].[XML_EVENTS]
- -- to get all element nodes named 'EMPLOYEE_NUMBER'
- SELECT Event_XML.query('//EMPLOYEE_NUMBER')
- FROM [dbo].[XML_EVENTS]
- -- to get employee node with ID = 1
- SELECT Event_XML.query('//EMPLOYEE_NUMBER[@ID = 1]')
- FROM [dbo].[XML_EVENTS]
- -- to get employee node at position = 1 from root node
- SELECT Event_XML.query('/STAFF/EMPLOYEE_NUMBER[position()=1]')
- FROM [dbo].[XML_EVENTS]
- -- to get employee node at last position from root node
- SELECT Event_XML.query('/STAFF/EMPLOYEE_NUMBER[last()]')
- FROM [dbo].[XML_EVENTS]
- -- to return unknown nodes from root node - STAFF
- SELECT Event_XML.query('/STAFF/*')
- FROM [dbo].[XML_EVENTS]
- -- to return unknown nodes from XML column - Event_XML
- SELECT Event_XML.query('//* ')
- FROM [dbo].[XML_EVENTS]
Parent/Child Node Names
-- to get name of parent node -- returns STAFF SELECT ( SELECT c.value('local-name(.)', 'nvarchar(50)') FROM Event_XML.nodes('/*') AS r(c) ) AS RootParent FROM [dbo].[XML_EVENTS] -- to get name of immediate child node -- one level down SELECT ( SELECT c.value('local-name(.)', 'nvarchar(50)') FROM Event_XML.nodes('/*/*') AS r(c) ) AS RootChild FROM [dbo].[XML_EVENTS] -- to get name of child node containing string "SURNAME" SELECT ( SELECT c.value('local-name(.)', 'nvarchar(50)') FROM Event_XML.nodes('/*/*[contains(local-name(.),"SURNAME")][1]') AS r(c) -- added [1] to only get 1 result returned ) AS Node_Surname FROM [dbo].[XML_EVENTS]
- -- to get name of parent node -- returns STAFF
- SELECT
- (
- SELECT
- c.value('local-name(.)', 'nvarchar(50)')
- FROM
- Event_XML.nodes('/*') AS r(c)
- ) AS RootParent
- FROM [dbo].[XML_EVENTS]
- -- to get name of immediate child node -- one level down
- SELECT
- (
- SELECT
- c.value('local-name(.)', 'nvarchar(50)')
- FROM
- Event_XML.nodes('/*/*') AS r(c)
- ) AS RootChild
- FROM [dbo].[XML_EVENTS]
- -- to get name of child node containing string "SURNAME"
- SELECT
- (
- SELECT
- c.value('local-name(.)', 'nvarchar(50)')
- FROM
- Event_XML.nodes('/*/*[contains(local-name(.),"SURNAME")][1]') AS r(c)
- -- added [1] to only get 1 result returned
- ) AS Node_Surname
- FROM [dbo].[XML_EVENTS]
Search if Node or value contains string:
-- exist() returns 1 or 0 if a node name exists containing "NAME" or not respectively SELECT Event_XML.exist('(/*/*[1][contains(local-name(.),"NAME")])') FROM [dbo].[XML_EVENTS] -- query() returns XML branch if the node name contains string "NAME" -- returns <SURNAME>LIPMAN</SURNAME> SELECT Event_XML.query('(/*/*[1][contains(local-name(.),"NAME")])') FROM [dbo].[XML_EVENTS] -- query() returns XML branch if the node value contains string "MAN" -- returns <SURNAME>LIPMAN</SURNAME> SELECT Event_XML.query('(/*/*[1][contains(.,"MAN")])') FROM [dbo].[XML_EVENTS] -- value() returns value of node if the node name contains string "NAME" -- returns LIPMAN SELECT Event_XML.value('(/*/*[1][contains(local-name(.),"NAME")])[1]','NVARCHAR(50)') FROM [dbo].[XML_EVENTS] -- value() returns value of node if the node value contains string "MAN" -- returns LIPMAN SELECT Event_XML.value('(/*/*[1][contains(.,"MAN")])[1]','NVARCHAR(50)') FROM [dbo].[XML_EVENTS]
- -- exist() returns 1 or 0 if a node name exists containing "NAME" or not respectively
- SELECT Event_XML.exist('(/*/*[1][contains(local-name(.),"NAME")])')
- FROM [dbo].[XML_EVENTS]
- -- query() returns XML branch if the node name contains string "NAME" -- returns <SURNAME>LIPMAN</SURNAME>
- SELECT Event_XML.query('(/*/*[1][contains(local-name(.),"NAME")])')
- FROM [dbo].[XML_EVENTS]
- -- query() returns XML branch if the node value contains string "MAN" -- returns <SURNAME>LIPMAN</SURNAME>
- SELECT Event_XML.query('(/*/*[1][contains(.,"MAN")])')
- FROM [dbo].[XML_EVENTS]
- -- value() returns value of node if the node name contains string "NAME" -- returns LIPMAN
- SELECT Event_XML.value('(/*/*[1][contains(local-name(.),"NAME")])[1]','NVARCHAR(50)')
- FROM [dbo].[XML_EVENTS]
- -- value() returns value of node if the node value contains string "MAN" -- returns LIPMAN
- SELECT Event_XML.value('(/*/*[1][contains(.,"MAN")])[1]','NVARCHAR(50)')
- FROM [dbo].[XML_EVENTS]
Looping
-- From resultset of $i ,assigning it to variable $j using let clause, -- Getting top two employees by using where & order by clause. SELECT Event_XML.query('for $i in /STAFF/student let $j:= $i where ($j/@ID)
- -- From resultset of $i ,assigning it to variable $j using let clause,
- -- Getting top two employees by using where & order by clause.
- SELECT Event_XML.query('for $i in /STAFF/student
- let $j:= $i
- where ($j/@ID)