Today, my boss ask me to fix bug cannot parse XML string: simplexml_load_string cannot parse data correctly because of XML file contain some special characters.
Here is my simple code to test:
$xml_file = '139_1356677622_o1_13.jpg.xml';
if (file_exists($xml_file)) {
$xml = file_get_contents($xml_file);
$xml = simplexml_load_string($xml);
print_r($xml);
}
After run it, system shows some warning/error message like bellow:
Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 36: parser error : Input is not proper UTF-8, indicate encoding
and
Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 56: parser error : CData section not finished
I thought about replacing special characters, but what is special characters? How do we know how many character cannot parse by simplexml_load_string/simplexml_load_file?
After search on internet, I found this article and luckily, I found the code of eZ Public too 😀
$xmlDoc = preg_replace('/[\x00-\x08\x0b-\x0c\x0e-\x1f]/', '', $xmlDoc);
So my code will be:
$xml_file = '139_1356677622_o1_13.jpg.xml';
if (file_exists($xml_file)) {
$xml = file_get_contents($xml_file);
$xml = utf8_encode($xml);
$xml = preg_replace('/[\x00-\x08\x0b-\x0c\x0e-\x1f]/', '', $xml);
$xml = simplexml_load_string($xml);
print_r($xml);
}
It works! The problem is solved 🙂