All tutorials are implemented in text (here), and also with exercises and multimedia in the Tutorial Player. The main home page has a search engines, and more facilities.
IntroductionThis tutorial introduces the conceptual data model used in XPath. XPath makes uses of nodes to represent an XML document. Nodes can have string values, which are not the same as the actual node object. There are seven nodes in XPath. They are introduced in this tutorial.
XPath has a data model to represent XML documents. This model consists off seven nodes. Each node represents part of an XML document. The nodes can have expanded names (qualified with a namespace prefix) or not. They can have a string value or not. I have listed these nodes and these properties, in this tutorial.
These nodes have an intuitive meaning, but the nodes used in XPath are not the same as the DOM nodes, although there are parallels. You need to understand XML documents before reading this tutorial.
XPath operates on an XML document after entities have been resolved, CDATA references resolved. Therefore before the XPath language is used, the document is in a final state with no other inclusions needed.
The DTD is not addressable via XPath, but default attribute values from the DTD are taken. xmlns attributes are considered as namespace nodes in XPath. Note that they are not considered as attribute nodes.
I am giving a simple example to show the tree representation used by XPath.
Consider:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="employees.xsl"?>
<!DOCTYPE employees [
<!ELEMENT employee (project*)>
<!ATTLIST employee
name CDATA #REQUIRED
dept CDATA #REQUIRED>
<!ELEMENT project (name+)>
<!ATTLIST project
name CDATA #REQUIRED>
]>
The tree looks at follows:
Root nodes and element nodes have an ordered list of child nodes. Nodes do not share children. Every node other than the root node has exactly one parent, which is either an element node or the root node. A root node or an element node is the parent of each of its child nodes. Descendant nodes are child nodes, if a child node itself has child nodes, then they are also descendant nodes to the original parent.
expanded-names String Value
root No concatenation of descendent text nodes
element Yes concatenation of descendent text nodes
attribute Yes attribute value
namespace Yes namespace URI
processing instruction Yes processing instruction after the target and whitespace
comment No Comment
text No Text
The root node is the parent of the document element. There is only one for a document. The following nodes are children of the root node.
- Processing instruction - Comments for processing instructions - Document element
Note that the root node does not correspond to the root element in XML. The root node in XPath contains the entire XML document with the processing instructions.
Element nodes represent XML elements. Since an XML element acts like a container, the concept of a node is an easy one applied to an element. Entities are expanded, and character references are resolved.
Element nodes can have child nodes are follows:
-comment -processing instruction -element
Content of element nodes are text nodes.
Element nodes have ID attributes just as elements have them, and the ID value of the node is the same as the corresponding one for the element declaration in the DTD. Of course, ID's are optional in XML, but all element ID's are unique.
Attribute nodes represent attributes. The concept of attributes being contained in an element is clear, and we would expect that each element has an associated set of attribute nodes. Empty elements have no attributes and therefore no attribute nodes.
An attribute node has therefore a parent node, which corresponds to the element containing the attribute. However, the attribute is not considered a child of the element node, only the element node is considered a parent.
This seems like a contradiction, but this is how XPath is specified as a language.
Namespaces are optional, they are used to ensure that different elements with the same name, or different attributes with the same name and in the same element can be distinguished. If a namespace exists for an element, then it is represented by a namespace node and associated with the element. The element is a parent for these namespace nodes, but they are not children.
Note that elements have a different namespace node for every attribute meeting these conditions:
There is a processing instruction node for every processing instruction, except for any processing instruction that occurs within the document type declaration. Note that the XML declaration is not a processing instruction.
There is a comment node for every comment, except for any comment that occurs within the document type declaration.
Character data is grouped in text nodes. The string value is the text. Expanded names are not supported.
I introduced the data model used in XPath to represent XML documents. It uses seven nodes, which I described. Note that XPath operates on a logical document, after all entities have been resolved.