Table of contents

Convert gbk encoding to utf-8 after php curl

PHP cURL Jun 03, 2020 Viewed 24 Comments 0

Issue

I use curl to get the website content, but the encoding of this page is gbk. Some characters are unreadable. The results are shown below.

<html>
<head>
    <meta charset="gbk">
    <meta charset="GBK" />
    <title>����ΪP40 8GB/128GB/ȫ��ͨ/5G���������Ϊ P40 8GB/128GB/ȫ��ͨ/5G���ֻ�����_���_����_����-ZOL�йش�����</title>
    <meta name="keywords" content="P40 8GB/128GB/ȫ��ͨ/5G�����,��ΪP40 8GB/128GB/ȫ��ͨ/5G�����,��ΪP40 8GB/128GB/ȫ��ͨ/5G����,��ΪP40 8GB/128GB/ȫ��ͨ/5G������,��ΪP40 8GB/128GB/ȫ��ͨ/5G�湦��" />
    <meta name="description" content="ZOL�йش����߻�ΪP40 8GB/128GB/ȫ��ͨ/5G���ֻ������ṩ��ȫ�Ļ�ΪP40 8GB/128GB/ȫ��ͨ/5G���������ΪP40 8GB/128GB/ȫ��ͨ/5G���?�ΪP40 8GB/128GB/ȫ��ͨ/5G�����ܡ���ΪP40 8GB/128GB/ȫ��ͨ/5G�湦�ܽ���,Ϊ������ΪP40 8GB/128GB/ȫ��ͨ/5G���ֻ��ṩ�м�ֵ�IJο�" />

Solution

With mb_convert_encoding method, we can convert character encoding. It need mbstring extension, add --enable-mbstring when configure php. It Supports Character Encodings.

$url = "http://detail.zol.com.cn/1317/1316635/param.shtml";
$curl = curl_init();
...

$html = curl_exec($curl);
$utf8Body = mb_convert_encoding($html, 'utf-8','GBk');

After convert encoding, the content is normal.

<html>
<head>
    <meta charset="gbk">
    <meta charset="GBK" />
    <title>【华为P40 8GB/128GB/全网通/5G版参数】华为 P40 8GB/128GB/全网通/5G版手机参数_规格_性能_功能-ZOL中关村在线</title>
    <meta name="keywords" content="P40 8GB/128GB/全网通/5G版参数,华为P40 8GB/128GB/全网通/5G版参数,华为P40 8GB/128GB/全网通/5G版规格,华为P40 8GB/128GB/全网通/5G版性能,华为P40 8GB/128GB/全网通/5G版功能" />
    <meta name="description" content="ZOL中关村在线华为P40 8GB/128GB/全网通/5G版手机参数提供最全的华为P40 8GB/128GB/全网通/5G版参数、华为P40 8GB/128GB/全网通/5G版规格、华为P40 8GB/128GB/全网通/5G版性能、华为P40 8GB/128GB/全网通/5G版功能介绍,为您购买华为P40 8GB/128GB/全网通/5G版手机提供有价值的参考" />
Updated Jun 03, 2020