Convert gbk encoding to utf-8 after php curl
Issue
I use curl to get the website content, but the encoding of this page is gbk. Some characters are unreadable. The results are shown below.
<html>
<head>
<meta charset="gbk">
<meta charset="GBK" />
<title>����ΪP40 8GB/128GB/ȫ��ͨ/5G���������Ϊ P40 8GB/128GB/ȫ��ͨ/5G���ֻ�����_���_����_����-ZOL�йش�����</title>
<meta name="keywords" content="P40 8GB/128GB/ȫ��ͨ/5G�����,��ΪP40 8GB/128GB/ȫ��ͨ/5G�����,��ΪP40 8GB/128GB/ȫ��ͨ/5G����,��ΪP40 8GB/128GB/ȫ��ͨ/5G������,��ΪP40 8GB/128GB/ȫ��ͨ/5G�湦��" />
<meta name="description" content="ZOL�йش�����ΪP40 8GB/128GB/ȫ��ͨ/5G���ֻ������ṩ��ȫ�Ļ�ΪP40 8GB/128GB/ȫ��ͨ/5G���������ΪP40 8GB/128GB/ȫ��ͨ/5G���?�ΪP40 8GB/128GB/ȫ��ͨ/5G�����ܡ���ΪP40 8GB/128GB/ȫ��ͨ/5G�湦�ܽ���,Ϊ������ΪP40 8GB/128GB/ȫ��ͨ/5G���ֻ��ṩ�м�ֵ�IJο�" />
Solution
With mb_convert_encoding method, we can convert character encoding. It need mbstring extension, add --enable-mbstring
when configure php. It Supports Character Encodings.
$url = "http://detail.zol.com.cn/1317/1316635/param.shtml";
$curl = curl_init();
...
$html = curl_exec($curl);
$utf8Body = mb_convert_encoding($html, 'utf-8','GBk');
After convert encoding, the content is normal.
<html>
<head>
<meta charset="gbk">
<meta charset="GBK" />
<title>【华为P40 8GB/128GB/全网通/5G版参数】华为 P40 8GB/128GB/全网通/5G版手机参数_规格_性能_功能-ZOL中关村在线</title>
<meta name="keywords" content="P40 8GB/128GB/全网通/5G版参数,华为P40 8GB/128GB/全网通/5G版参数,华为P40 8GB/128GB/全网通/5G版规格,华为P40 8GB/128GB/全网通/5G版性能,华为P40 8GB/128GB/全网通/5G版功能" />
<meta name="description" content="ZOL中关村在线华为P40 8GB/128GB/全网通/5G版手机参数提供最全的华为P40 8GB/128GB/全网通/5G版参数、华为P40 8GB/128GB/全网通/5G版规格、华为P40 8GB/128GB/全网通/5G版性能、华为P40 8GB/128GB/全网通/5G版功能介绍,为您购买华为P40 8GB/128GB/全网通/5G版手机提供有价值的参考" />