It's me ;-)

Special problem when use simplexml_load_string function

Today, my boss ask me to fix bug cannot parse XML string: simplexml_load_string cannot parse data correctly because of XML file contain some special characters.

Here is my simple code to test:

$xml_file = '139_1356677622_o1_13.jpg.xml';
if (file_exists($xml_file)) {	
    $xml = file_get_contents($xml_file);
    $xml = simplexml_load_string($xml);	
    print_r($xml);
}

After run it, system shows some warning/error message like bellow:

Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 36: parser error : Input is not proper UTF-8, indicate encoding

and

Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 56: parser error : CData section not finished

I thought about replacing special characters, but what is special characters? How do we know how many character cannot parse by simplexml_load_string/simplexml_load_file?

After search on internet, I found this article and luckily, I found the code of eZ Public too 😀

$xmlDoc = preg_replace('/[\x00-\x08\x0b-\x0c\x0e-\x1f]/', '', $xmlDoc);

So my code will be:

$xml_file = '139_1356677622_o1_13.jpg.xml';
if (file_exists($xml_file)) {	
    $xml = file_get_contents($xml_file);
    $xml = utf8_encode($xml);
    $xml = preg_replace('/[\x00-\x08\x0b-\x0c\x0e-\x1f]/', '', $xml);
    $xml = simplexml_load_string($xml);	
    print_r($xml);
}

It works! The problem is solved 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *