PHP
downloads | documentation | faq | getting help | mailing lists | reporting bugs | php.net sites | links | conferences | my php.net

search for in the

utf8_decode> <xdiff_string_patch
Last updated: Mon, 26 Nov 2007

view this page in

XML 语法解析函数

简介

XML(eXtensible Markup Language,可扩展标记语言)是一种在 web 上进行文档交换的数据格式。该语言是由 W3C(World Wide Web Concortium,世界万维网组织)定义的一种标准。可以访问 » http://www.w3.org/XML/ 以获取关于 XML 及其相关技术的更多信息。

本扩展模块可为 James Clark 的 expat 提供支持。该工具包帮助解析 XML 文档(而非 XML 文档的有效化)。它支持三种源代码的编码方式,这三种编码方式也被 PHP 本身所支持,它们分别是:US-ASCIIISO-8859-1UTF-8。本系统尚不支持 UTF-16

本扩展模块使用户能够建立 XML 语法解析器,并对不同的 XML 事件定义对应的处理器。每个 XML 语法解析器都有若干个可根据需要调整的参数

需求

本扩展模块默认使用 expat compat layer。它也可以使用 expat,可以访问 » http://www.jclark.com/xml/expat.html 来获取。expat 自带的 Makefile 文件不会生成默认的扩展库,可以使用以下的生成规则来实现:

libexpat.a: $(OBJS)
    ar -rc $@ $(OBJS)
    ranlib $@

请访问 » http://sourceforge.net/projects/expat/ 以获取 expat 源文件的 RPM 包。

安装

这些函数默认为有效的,它们使用了捆绑的 expat 库。您可以通过参数 --disable-xml 来屏蔽 XML 的支持。如果您将 PHP 编译为 Apache 1.3.9 或更高版本的一个模块, PHP 将自动使用 Apache 捆绑的 expat 库。如果您不希望使用该捆绑的 expat 库,请在运行 PHP 的 configure 配置脚本时使用参数 --with-expat-dir=DIR,其中 DIR 应该指向 expat 安装的根目录。

PHP 的 Windows 版本已经内置该扩展模块的支持。无需加载任何附加扩展库即可使用这些函数。

运行时配置

本扩展模块在 php.ini 中未定义任何配置选项。

资源类型

xml

xml_parser_create()xml_parser_create_ns() 返回的 xml 资源引用了一个 XML 解析器实例,将被用在本扩展库提供的函数中。

预定义常量

以下常量由本扩展模块定义,因此只有在本扩展模块被编译到 PHP 中,或者在运行时被动态加载后才有效。

XML_ERROR_NONE (integer)
XML_ERROR_NO_MEMORY (integer)
XML_ERROR_SYNTAX (integer)
XML_ERROR_NO_ELEMENTS (integer)
XML_ERROR_INVALID_TOKEN (integer)
XML_ERROR_UNCLOSED_TOKEN (integer)
XML_ERROR_PARTIAL_CHAR (integer)
XML_ERROR_TAG_MISMATCH (integer)
XML_ERROR_DUPLICATE_ATTRIBUTE (integer)
XML_ERROR_JUNK_AFTER_DOC_ELEMENT (integer)
XML_ERROR_PARAM_ENTITY_REF (integer)
XML_ERROR_UNDEFINED_ENTITY (integer)
XML_ERROR_RECURSIVE_ENTITY_REF (integer)
XML_ERROR_ASYNC_ENTITY (integer)
XML_ERROR_BAD_CHAR_REF (integer)
XML_ERROR_BINARY_ENTITY_REF (integer)
XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF (integer)
XML_ERROR_MISPLACED_XML_PI (integer)
XML_ERROR_UNKNOWN_ENCODING (integer)
XML_ERROR_INCORRECT_ENCODING (integer)
XML_ERROR_UNCLOSED_CDATA_SECTION (integer)
XML_ERROR_EXTERNAL_ENTITY_HANDLING (integer)
XML_OPTION_CASE_FOLDING (integer)
XML_OPTION_TARGET_ENCODING (integer)
XML_OPTION_SKIP_TAGSTART (integer)
XML_OPTION_SKIP_WHITE (integer)

事件处理器

XML 事件处理器定义如下:

已支持的 XML 事件处理器
用来设置处理器的 PHP 函数 事件描述
xml_set_element_handler() 元素事件(Element events)将在 XML 解析器遇到标记符的起始符或者终止符时发生。另外,对于起始符和终止符也有独立的处理器。
xml_set_character_data_handler() 粗略的说,字符数据(Character data)是指 XML 文档中所有标记符以外的内容,包括标记符之间的空格。需要注意的是 XML 语法解析器不会加上或者去掉任何空格。空格的取舍将由应用程序(也就是你自己)来决定。
xml_set_processing_instruction_handler() PHP 程序员对“处理指令”(Processing Instructions,PI)应该已经很熟悉了。<?php ?> 就是一个处理指令,其中 php 被称为“PI target”。除了以“XML”开头的 PI target 已被保留以外,对这些 PI 的处理将由应用程序来完成。
xml_set_default_handler() 所有无法被其它处理器处理的事件将由默认处理器来处理。这些事件包括诸如 XML 和文档类型声明等内容。
xml_set_unparsed_entity_decl_handler() 该处理器将在遇到无法解析的实体名称(NDATA)声明时被调用。
xml_set_notation_decl_handler() 该处理器将在声明一个注释时被调用。
xml_set_external_entity_ref_handler() 当 XML 解析器遇到指向外部解析的一般实体名时,该处理器将被调用。该指向的目标可以是一个文件,也可以是 URL。请参阅“外部实体名范例”。

大小写折叠(Case Folding)

元素处理函数可能会导致元素名称“大小写折叠”(case-folded)。“大小写折叠”被 XML 标准定义为“一个应用于一系列字符的过程,在该过程中,这些字符中的所有的非大写字符将被替换成它们对应大写等价字符”。换句话说,对于 XML,“大小写折叠”就是指将字符串转换成大写字符。

所有被传递给处理器函数的元素名称将默认的发生“大小写折叠”。该过程可以分别被 xml_parser_get_option()xml_parser_set_option() 函数查询和控制。

错误代码

以下常量被定义为 XML 的错误代码,将由 xml_parse() 返回:

  • XML_ERROR_NONE
  • XML_ERROR_NO_MEMORY
  • XML_ERROR_SYNTAX
  • XML_ERROR_NO_ELEMENTS
  • XML_ERROR_INVALID_TOKEN
  • XML_ERROR_UNCLOSED_TOKEN
  • XML_ERROR_PARTIAL_CHAR
  • XML_ERROR_TAG_MISMATCH
  • XML_ERROR_DUPLICATE_ATTRIBUTE
  • XML_ERROR_JUNK_AFTER_DOC_ELEMENT
  • XML_ERROR_PARAM_ENTITY_REF
  • XML_ERROR_UNDEFINED_ENTITY
  • XML_ERROR_RECURSIVE_ENTITY_REF
  • XML_ERROR_ASYNC_ENTITY
  • XML_ERROR_BAD_CHAR_REF
  • XML_ERROR_BINARY_ENTITY_REF
  • XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF
  • XML_ERROR_MISPLACED_XML_PI
  • XML_ERROR_UNKNOWN_ENCODING
  • XML_ERROR_INCORRECT_ENCODING
  • XML_ERROR_UNCLOSED_CDATA_SECTION
  • XML_ERROR_EXTERNAL_ENTITY_HANDLING

字符编码

PHP 的 XML 扩展库支持不同字符编码(character encoding)的 » Unicode 字符集。字符编码有两种形式,它们分别是“源编码”(source encoding)和“目标编码”(target encoding)。PHP 对文档内部表示的编码方式是 UTF-8

源编码将在 XML 文档被解析后完成。源编码可在建立一个 XML 解析器时指明(该编码方式在 XML 解析器的生命周期中不能被再次改变)。支持的编码方式包括 ISO-8859-1US-ASCIIUTF-8。前两种为单字节编码,即每个字符被一个单一的字节表示。UTF-8 支持 1 至 4 个字节的多 bit(最多 12)字符编码。PHP 默认使用 ISO-8859-1 作为源编码方式。

目标编码将在 PHP 向 XML 处理器函数传送数据时被完成。当 XML 解析器被建立后,目标编码将被设置成与源编码相同的编码方式,但该方式可在任何时候被更改。目标编码将影响字符数据、标记符名称以及处理指令目标(PI target)。

如果 XML 解析器遇到其源编码方式表示能力之外的字符,它将返回一个错误。

当 PHP 在被解析的 XML 文档中遇到当前目标编码无法表示的字符时,这些字符将被“降级”。简单的说,这些字符将被问号替换。

范例

以下是 PHP 脚本解析 XML 文档的一些范例。

XML 元素结构范例

第一个范例用缩进格式显示一个文档中起始元素的结构。

Example#1 显示 XML 元素结构

<?php
$file 
"data.xml";
$depth = array();

function 
startElement($parser$name$attrs)
{
    global 
$depth;
    for (
$i 0$i $depth[$parser]; $i++) {
        echo 
"  ";
    }
    echo 
"$name\n";
    
$depth[$parser]++;
}

function 
endElement($parser$name)
{
    global 
$depth;
    
$depth[$parser]--;
}

$xml_parser xml_parser_create();
xml_set_element_handler($xml_parser"startElement""endElement");
if (!(
$fp fopen($file"r"))) {
    die(
"could not open XML input");
}

while (
$data fread($fp4096)) {
    if (!
xml_parse($xml_parser$datafeof($fp))) {
        die(
sprintf("XML error: %s at line %d",
                    
xml_error_string(xml_get_error_code($xml_parser)),
                    
xml_get_current_line_number($xml_parser)));
    }
}
xml_parser_free($xml_parser);
?>

XML 标记符映射范例

Example#2 将 XML 映射为 HTML

以下范例将 XML 文档中的标记符直接映射成 HTML 标记符。在“映射数组”中不存在的元素将被忽略。当然,该范例将只对一个特定的 XML 文档有效。

<?php
$file 
"data.xml";
$map_array = array(
    
"BOLD"     => "B",
    
"EMPHASIS" => "I",
    
"LITERAL"  => "TT"
);

function 
startElement($parser$name$attrs)
{
    global 
$map_array;
    if (isset(
$map_array[$name])) {
        echo 
"<$map_array[$name]>";
    }
}

function 
endElement($parser$name)
{
    global 
$map_array;
    if (isset(
$map_array[$name])) {
        echo 
"</$map_array[$name]>";
    }
}

function 
characterData($parser$data)
{
    echo 
$data;
}

$xml_parser xml_parser_create();
// 使用大小写折叠来保证我们能在元素数组中找到这些元素名称
xml_parser_set_option($xml_parserXML_OPTION_CASE_FOLDINGtrue);
xml_set_element_handler($xml_parser"startElement""endElement");
xml_set_character_data_handler($xml_parser"characterData");
if (!(
$fp fopen($file"r"))) {
    die(
"could not open XML input");
}

while (
$data fread($fp4096)) {
    if (!
xml_parse($xml_parser$datafeof($fp))) {
        die(
sprintf("XML error: %s at line %d",
                    
xml_error_string(xml_get_error_code($xml_parser)),
                    
xml_get_current_line_number($xml_parser)));
    }
}
xml_parser_free($xml_parser);
?>

XML 外部实体范例

该范例能够高亮显示 XML 源代码。它将说明如何外部实体指向处理器来包含和解析其它文档,如何处理 PIs,以及一种确定包含有 PIs 的代码的可信度。

能被该范例使用的的 XML 文档(xmltest.xmlxmltest2.xml)被列在该范例之后。

Example#3 外部实体范例

<?php
$file 
"xmltest.xml";

function 
trustedFile($file)
{
    
// only trust local files owned by ourselves
    
if (!eregi("^([a-z]+)://"$file)
        && 
fileowner($file) == getmyuid()) {
            return 
true;
    }
    return 
false;
}

function 
startElement($parser$name$attribs)
{
    echo 
"&lt;<font color=\"#0000cc\">$name</font>";
            if (
count($attribs)) {
                foreach (
$attribs as $k => $v) {
            echo 
" <font color=\"#009900\">$k</font>=\"<font
                   color=\"#990000\">$v</font>\""
;
        }
    }
    echo 
"&gt;";
}

function 
endElement($parser$name)
{
    echo 
"&lt;/<font color=\"#0000cc\">$name</font>&gt;";
}

function 
characterData($parser$data)
{
    echo 
"<b>$data</b>";
}

function 
PIHandler($parser$target$data)
{
    switch (
strtolower($target)) {
        case 
"php":
            global 
$parser_file;
            
// If the parsed document is "trusted", we say it is safe
            // to execute PHP code inside it.  If not, display the code
            // instead.
            
if (trustedFile($parser_file[$parser])) {
                eval(
$data);
            } else {
                
printf("Untrusted PHP code: <i>%s</i>",
                        
htmlspecialchars($data));
            }
            break;
    }
}

function 
defaultHandler($parser$data)
{
    if (
substr($data01) == "&" && substr($data, -11) == ";") {
        
printf('<font color="#aa00aa">%s</font>',
                
htmlspecialchars($data));
    } else {
        
printf('<font size="-1">%s</font>',
                
htmlspecialchars($data));
    }
}

function 
externalEntityRefHandler($parser$openEntityNames$base$systemId,
                                  
$publicId) {
    if (
$systemId) {
        if (!list(
$parser$fp) = new_xml_parser($systemId)) {
            
printf("Could not open entity %s at %s\n"$openEntityNames,
                   
$systemId);
            return 
false;
        }
        while (
$data fread($fp4096)) {
            if (!
xml_parse($parser$datafeof($fp))) {
                
printf("XML error: %s at line %d while parsing entity %s\n",
                       
xml_error_string(xml_get_error_code($parser)),
                       
xml_get_current_line_number($parser), $openEntityNames);
                
xml_parser_free($parser);
                return 
false;
            }
        }
        
xml_parser_free($parser);
        return 
true;
    }
    return 
false;
}

function 
new_xml_parser($file)
{
    global 
$parser_file;

    
$xml_parser xml_parser_create();
    
xml_parser_set_option($xml_parserXML_OPTION_CASE_FOLDING1);
    
xml_set_element_handler($xml_parser"startElement""endElement");
    
xml_set_character_data_handler($xml_parser"characterData");
    
xml_set_processing_instruction_handler($xml_parser"PIHandler");
    
xml_set_default_handler($xml_parser"defaultHandler");
    
xml_set_external_entity_ref_handler($xml_parser"externalEntityRefHandler");

    if (!(
$fp = @fopen($file"r"))) {
        return 
false;
    }
    if (!
is_array($parser_file)) {
        
settype($parser_file"array");
    }
    
$parser_file[$xml_parser] = $file;
    return array(
$xml_parser$fp);
}

if (!(list(
$xml_parser$fp) = new_xml_parser($file))) {
    die(
"could not open XML input");
}

echo 
"<pre>";
while (
$data fread($fp4096)) {
    if (!
xml_parse($xml_parser$datafeof($fp))) {
        die(
sprintf("XML error: %s at line %d\n",
                    
xml_error_string(xml_get_error_code($xml_parser)),
                    
xml_get_current_line_number($xml_parser)));
    }
}
echo 
"</pre>";
echo 
"parse complete\n";
xml_parser_free($xml_parser);

?>

Example#4 xmltest.xml

<?xml version='1.0'?>
<!DOCTYPE chapter SYSTEM "/just/a/test.dtd" [
<!ENTITY plainEntity "FOO entity">
<!ENTITY systemEntity SYSTEM "xmltest2.xml">
]>
<chapter>
 <TITLE>Title &plainEntity;</TITLE>
 <para>
  <informaltable>
   <tgroup cols="3">
    <tbody>
     <row><entry>a1</entry><entry morerows="1">b1</entry><entry>c1</entry></row>
     <row><entry>a2</entry><entry>c2</entry></row>
     <row><entry>a3</entry><entry>b3</entry><entry>c3</entry></row>
    </tbody>
   </tgroup>
  </informaltable>
 </para>
 &systemEntity;
 <section xml:id="about">
  <title>About this Document</title>
  <para>
   <!-- this is a comment -->
   <?php echo 'Hi!  This is PHP version ' . phpversion(); ?>
  </para>
 </section>
</chapter>

以下文档将被 xmltest.xml 文件调用:

Example#5 xmltest2.xml

<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY testEnt "test entity">
]>
<foo>
   <element attrib="value"/>
   &testEnt;
   <?php echo "This is some more PHP code being executed."; ?>
</foo>

Table of Contents



utf8_decode> <xdiff_string_patch
Last updated: Mon, 26 Nov 2007
 
add a note add a note User Contributed Notes
XML
roopa
01-Jul-2008 07:11
hi.

how to parse a remote xml file??

is there any settings we need to do????

thanks,
Anonymous
19-May-2008 03:18
This is peace of the code. It edit xml file.
<?
$songs = Array();
function start_element($parser, $name, $attrs){
    global $songs;
    if($name == "song"){
        array_push($songs, $attrs);
    }
}
function end_element ($parser, $name){}
$playlist_string = file_get_contents("test.xml");
$parser = xml_parser_create();
xml_set_element_handler($parser, "start_element", "end_element");
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);
xml_parse($parser, $playlist_string) or die("Error parsing XML document.");
print "<br />";
if($_POST['action'] == "ins"){
    array_push($songs, Array(
                "title" => $_POST['title'],
                "artist" => $_POST['artist'],
                "path" => $_POST['path']));
    $songs_final = $songs;
}else if($_POST['action'] == "del"){
    $songs_final = Array();
    foreach($songs as $song){
        if($song['title'] != $_POST['title']){
            array_push($songs_final, $song);
        }
    }
}
$write_string = "<songs>";
foreach($songs_final as $song){

    $write_string .= "<song>";
    $write_string .= "<title>".$song['title']."</title>";
    $write_string .= "<artist>".$song['artist']."</artist>";
    $write_string .= "<path>".$song['path']."</path>";
    $write_string .= "</song>";

}
$write_string .= "</songs>";
$fp = fopen("test.xml", "w+");
fwrite($fp, $write_string) or die("Error writing to file");
fclose($fp);
print "<em>Song inserted or deleted successfully :)</em><br />";
print "<a href=\"index.php\" title=\"return\">Return</a>";
?>
galen dot senogles at gmail dot com
30-Apr-2008 04:48
An update to the function below.  Fixes a bug where the data of the first tag, would occasionally get appended to the beginning of the tag data of the second tag.

<?php

   
foreach($dom['child_nodes'][0]['child_nodes'] as $key => $value) {
     
$tagname  = $value['tag_name'];
      if(isset(
$value['child_nodes'][0])) {
       
$numarrays = count($value['child_nodes']);
        if(
$numarrays > 1) {
         
$contents = "";
          foreach(
$value['child_nodes'] as $key => $value2) {
           
$contents .= $value2;
          }
        }else {
         
$contents = $value['child_nodes'][0];
        }
      }else {
       
$contents = 'isempty';
      }
    
     
$artmp = array($tagname => $contents);
     
array_push_associative($xmlarray,$artmp);
      unset(
$artmp);
    }

?>
galen dot senogles at gmail dot com
26-Apr-2008 05:28
If anyone else is having issues figuring out how to utilize the xml class that people have created and  modified, don't worry as you are not alone.  It took me a bit to come up with a solution that I liked, but I feel this does the job quite nice.

I read through the entire structure of the xml file and create an associative array based on the tag names.

I didn't worry about tag attributes as I didn't need to use them; so remember that if you use this method, you are only getting the tag name and the data inside the tag...that is all, no attributes!!

I am not going to include the xml class as it has been copy pasted multiple times already on this thread. Just scroll down for the xml class.

First let me show just an example of the EXTREMELY simple xml structure I was working with. Again, you will need to make modifications depending on the structure of the xml file you are working with! (I know I could use simplexml but I have php4 and not 5).

<?xml version="1.0"?>
<menuitems>
  <menutype>1</menutype>
  <product>Just some product info</product>
  <shipping>some stuff</shipping>
</menuitems>

The custom associative array push function taken from:
http://us.php.net/manual/en/function.array-push.php#58705

The xml class file is located here:
http://us.php.net/manual/en/ref.xml.php#81910

<?php
   
// Obtain the exact path to the xml file
   
$xmlfile = "mydata.xml";
   
$fp = fopen($xmlfile,"r");             // open the xml file
   
$xml = fread($fp, filesize($xmlfile)); // read in the size of the file into the variable xml
   
fclose($fp);                           // close the stream
   
   
$xml_parser = new xml();  // create a new xml class instance
   
$xml_parser->parse($xml); // parse the variable xml which contains our xml data
   
$dom = $xml_parser->dom// make a variable that holds the entire dom

/*
      This part extracts the xml nodes from the dom and places them into an associative array.
      The associative array key is the name of the tag; the value is the tag contents.
      We simply create an array on the fly using the name and contents, and hit that array
      with our original array using the array_push_associative function. We then check if
      isset to prevent errors from being displayed.  If the tag contents are empty,
      I put the string isempty inside so I can easily check to see later if there is contents or not.
*/ 

   
$xmlarray = array(); // the array we are going to store the information within the tag
   
$contents = "";
   
    foreach(
$dom['child_nodes'][0]['child_nodes'] as $key => $value) {
     
$tagname  = $value['tag_name'];
      if(isset(
$value['child_nodes'][0])) {
       
$numarrays = count($value['child_nodes']);
        if(
$numarrays > 1) {
          foreach(
$value['child_nodes'] as $key => $value2) {
           
$contents .= $value2;
          }
        }else {
         
$contents = $value['child_nodes'][0];
        }
      }else {
       
$contents = 'isempty';
      }
     
     
$artmp = array($tagname => $contents);
     
array_push_associative($xmlarray,$artmp);
      unset(
$artmp);
    }

    unset(
$xml);        // free up resources
   
unset($xml_parser); // free up resources
   
unset($dom);        // free up resources
?>

You may be wondering why there is a nested count and foreach loop inside the main foreach loop.  The reason that exists is that the xml class that I am using in this example, the one that is four posts down from this one, has the wonderful behavior in that when something hits the length of 1024 characters, it creates a new element in the array and puts the next 1024 characters into that next element etc.  This caused me massive confusion as to why some of my data was getting cut off.

So say I wanted to display the data inside the product tag, all I would need to do is this:

<?php
  
echo $xmlarray['product']
?>

I sincerely hope this helps people figure out how to utilize the xml class quicker than I did!

If anyone has suggestions, modifications, or whatever, please post it here!

Thanks
galen dot senogles at gmail dot com
20-Apr-2008 03:10
I used shawn's code that is an ongoing fix/update of a very nice php 4 & 5 compatible class.

It works great, only it kept giving me errors when the array isn't set, (I have errors set to show all).
<?php
// Here is the old function that gave errors:
   
function makeChildNode() {
        if (!
is_array($this->pointer['child_nodes'])){
           
$this->pointer['child_nodes'] = array();
        }
        return
count($this->pointer['child_nodes']);
    }

// Here is the new function that does not spit errors:
   
function makeChildNode() {
        if (!isset(
$this->pointer['child_nodes'])){
           
$this->pointer['child_nodes'] = array();
        }
        return
count($this->pointer['child_nodes']);
    }

?>
shawn dot rapp at gmail dot com
03-Apr-2008 03:24
Well I posted my script with an example fread($fp, 4096) meaning that it will only read 4k.  It was just for a quick example.  If you used that to input data from a really long XML file to the parser that would be the problem.
you could replace the 4096 with filesize("file.xml") or try replacing that example test code part with:

$xml = implode('',file("http://localhost/test.xml"));
$xml_parser = new XML_Class();
$xml_parser->parse($xml);
print_r($xml_parser->dom);

I've tried to recreate your problems by posting entire howto of installing LDAP into character data space of a node and can't get it to fail.  Please email with more info if the above isn't the problem.
But on that routine you posted from that website.  The problem with that one is it seems to be padding with unnecessary arrays.  It will overwrite different nodes with the same name if they are within the same parent.  And the number one biggest issue for me is that it drops attributes.  That is totally bogus.  It's a lot cleaner to store most values in attributes than making a zillion nodes and storing the data for something small like a integer or a float as character data.
Example: 
<coords x="1.53234" y="56.287" z="4.32" />
VS
<coords><x>1.53234</x><y>56.287</y><z>4.32</z></coords>

To me the top is very readable where the later makes my eyes bleed.
Any ways what is good about the links code is the error checking.  Isolating all the code in the parse method instead of constructor so the object is recyclable. And than releases the xml parser.
I'm definitely going to be putting that stuff into my class after I post this note.
 
But let me know if its still truncating.
shawn dot rapp at gmail dot com
19-Mar-2008 10:52
The reason why you would want to make your own simplistic DOM parser is because a lack of compatible between PHP 4's domxml and PHP 5's dom.
So it is for portability without having to wrapper the two different DOMs.
If you need a simple light weight XML parser that is portable this is the best way.  If you are writing applications for a particular server and more concerned with functionality and speed go with a compiled in DOM.
Here is the fix to Emmetts code...

<?PHP
$fp
= fopen("test.xml","r");
$xml = fread($fp, 4096);
fclose($fp);
$xml_parser = new xml();
$xml_parser->parse($xml);
$dom = $xml_parser->dom;
print_r($dom);

class
xml  {
    var
$parser;
    var
$pointer;
    var
$dom;
    function
xml() {
       
$this->pointer =& $this->dom;
       
$this->parser = xml_parser_create();
       
xml_set_object($this->parser, $this);
       
xml_parser_set_option($this->parser, XML_OPTION_CASE_FOLDING, false);
       
xml_set_element_handler($this->parser, "tag_open", "tag_close");
       
xml_set_character_data_handler($this->parser, "cdata");
    }

    function
parse($data) {
       
xml_parse($this->parser, $data);
    }
  
    function
makeChildNode() {
        if (!
is_array($this->pointer['child_nodes'])){
           
$this->pointer['child_nodes'] = array();
        }
        return
count($this->pointer['child_nodes']);
    }

    function
tag_open($parser, $tag, $attributes) {
       
$idx = $this->makeChildNode();
       
$this->pointer['child_nodes'][$idx] = Array(
           
'_idx' => $idx,
           
'_parent' => &$this->pointer,
           
'tag_name' => $tag,
           
'attributes' => $attributes,
        );
       
$this->pointer =& $this->pointer['child_nodes'][$idx];
    }

    function
cdata($parser, $cdata) {
       
//drop text nodes that are just white space formatting characters
       
if (trim($cdata) != "") {
           
$idx = $this->makeChildNode();
           
$this->pointer['child_nodes'][$idx] = $cdata;
            }
    }

    function
tag_close($parser, $tag) {
       
$idx =& $this->pointer['_idx'];
       
$this->pointer =& $this->pointer['_parent'];
        unset(
$this->pointer['child_nodes'][$idx]['_idx']);
        unset(
$this->pointer['child_nodes'][$idx]['_parent']);
    }
}
?>
jesdisciple at gmail dot com
08-Mar-2008 06:00
@[emmett dot thesane at yahoo dot com]: That code didn't work for me, but it seems that using the DOM functions (http://php.net/manual/en/ref.dom.php) would be more efficient.
emmett dot thesane at yahoo dot com
11-Dec-2007 08:19
There's a couple of vital flaws in aquariusrick's example:
1. Multiple tags of the same name will overwrite one another.
2. Text nodes within an element are all strung together, with no information saved regarding their order with respect to non-text nodes.

It provided a good starting point, however, for a DOM-builder that *does* allow those things.  This should be a more familiar structure for people used to DOM-walking in the browser; children of each node are stored in "childNodes". Text nodes are simply a child node that is only a string, instead of an array.

$xml_parser = new xml();
$xml_parser->parse($xml);
$dom = $xml_parser->dom;
print_r($dom);

class xml  {
    var $parser;
    var $pointer;
    var $dom;
    function xml() {
        $this->pointer =& $this->dom;
        $this->parser = xml_parser_create();
        xml_set_object($this->parser, $this);
        xml_parser_set_option($this->parser, XML_OPTION_CASE_FOLDING, false);
        xml_set_element_handler($this->parser, "tag_open", "tag_close");
        xml_set_character_data_handler($this->parser, "cdata");
    }

    function parse($data) {
        xml_parse($this->parser, $data);
    }
   
    function makeChildNode() {
        if (!isset($this->pointer['childNodes'])){
            $this->pointer['childNodes'] = array();
        }
        return count($this->pointer['childNodes']);
    }

    function tag_open($parser, $tag, $attributes) {
        $idx = $this->makeChildNode();
        $this->pointer['childNodes'][$idx] = Array(
            '_idx' => $idx,
            'tagName' => $tag,
            'parentNode' => &$this->pointer,
            'attributes' => $attributes,
        );
        $this->pointer =& $this->pointer['childNodes'][$idx];
    }

    function cdata($parser, $cdata) {
        $idx = $this->makeChildNode();
        $this->pointer['childNodes'][$idx] = $cdata;
        //text node -- has no other attributes than the content
    }

    function tag_close($parser, $tag) {
        $idx =& $this->pointer['_idx'];
        $this->pointer =& $this->pointer['_parent'];
        unset($this->pointer['childNodes'][$idx]['_idx']);
    }
}
aquariusrick
06-Dec-2007 12:43
Here's another attempt at a very simple script that parses XML into a structure:

<?php
#Usage:
    //$xml_parser = new xml();
    //$xml_parser->parse($xml);
    //$dom = $xml_parser->dom;

class xml  {
    var
$parser;
    var
$pointer;
    var
$dom;
    function
xml() {
       
$this->pointer =& $this->dom;
       
$this->parser = xml_parser_create();
       
xml_set_object($this->parser, $this);
       
xml_parser_set_option($this->parser, XML_OPTION_CASE_FOLDING, false);
       
xml_set_element_handler($this->parser, "tag_open", "tag_close");
       
xml_set_character_data_handler($this->parser, "cdata");
    }

    function
parse($data) {
       
xml_parse($this->parser, $data);
    }

    function
tag_open($parser, $tag, $attributes) {
       
$this->pointer[$tag] = Array(
           
'_parent'   => &$this->pointer,
           
'_content'  => null,
           
'_attributes' => $attributes,
        );
       
$this->pointer =& $this->pointer[$tag];
    }

    function
cdata($parser, $cdata) {
       
$this->pointer['_content'] .= $cdata;
    }

    function
tag_close($parser, $tag) {
       
$this->pointer =& $this->pointer['_parent'];
        unset(
$this->pointer[$tag]['_parent']);
    }

}
// end xml class
?>
yousuf at philipz dot com
25-Nov-2007 03:53
Here is my modification of < dmeekins att gmail doot com > XMLParser class, as i have used it for quite a bit. There were 2 problems with his post, which of course was a modification of an earlier post, so the problem continued through the many versions. The problems were in the dataHandler function. The first problem was with '$data = trim($data);' which removed line breakers from data which went over many lines and the second problem was when a tag had a value 0. So here is the corrected function.

<?php
   
function dataHandler($parser, $data)
    {
        if(!empty(
$data) || strval($data) != "" )
        {

            if(isset(
$this->currTag['data']))
               
$this->currTag['data'] .= $data;
            else
               
$this->currTag['data'] = $data;
        }
    }
?>

By removing '$data = trim($data);', you will notice that some [data] elements, mainly the root ones, will have alot of line breakers in them with no actual data.

The code by < geoffers [at] gmail [dot] com > was also quite good as it keeps things alot smaller than XMLParser and here's my modification of part of his code, as i preferred to have it look similar to how XMLParser has it (removes the ['child'] entry and changes 'attribs' to 'attr').

<?php
   
function parse($data)
    {
       
$this->parser = xml_parser_create('UTF-8');
       
xml_set_object($this->parser, $this);
       
xml_parser_set_option($this->parser, XML_OPTION_SKIP_WHITE, 1);
       
xml_set_element_handler($this->parser, 'tag_open', 'tag_close');
       
xml_set_character_data_handler($this->parser, 'cdata');
        if (!
xml_parse($this->parser, $data))
        {
           
$this->data = array();
           
$this->error_code = xml_get_error_code($this->parser);
           
$this->error_string = xml_error_string($this->error_code);
           
$this->current_line = xml_get_current_line_number($this->parser);
           
$this->current_column = xml_get_current_column_number($this->parser);
        }
        else
        {
           
$this->data = $this->data;
        }
       
xml_parser_free($this->parser);
    }

    function
tag_open($parser, $tag, $attribs)
    {
       
$this->data[$tag][] = array('data' => '', 'attr' => $attribs);
       
$this->datas[] =& $this->data;
       
$this->data =& $this->data[$tag][count($this->data[$tag])-1];
    }
?>

The code by < adamaflynn at criticaldevelopment dot net > and < geoff at spacevs dot com > are also quite good but use xmlObject object rather than standard arrays.
geoff at spacevs dot com
08-Nov-2007 01:13
Reading xml into a class:

<?PHP
       
class XmlData