PHP Big5 Utf-8 GB2312 编码互转的解决办法_php编码

PHP Big5 Utf-8 GB2312 编码互转的解决办法: 发布时间：2020-10-27编辑：脚本学堂

PHP Big5 Utf-8 GB2312 编码互转的解决办法

编写 PHP 代码的过程中，经常会遇到需要对中文转码的问题，如 GB2312 <=> Unicode、GB2312 <=> Big5 等等。如果 PHP 编译时带有 mbstring 的话，可以使用 Multi-Byte String Function 实现部分转码工作。然而由于很多虚拟主机不支持 mbstring，或者 mbstring 的编译、配置过于麻烦，很多 PHP 代码无法使用这一序列的函数。

最近为了解决这个问题，找到一个不错的项目：PHP News Reader，这是一个基于 WEB 的新闻阅读器，支持基于 NNTP (RFC 977) 协议的新闻文章的阅读、发布、删除、回复等功能。这个项目实现了 GB2312 Big5 Unicode(UTF-8) 之间的相互转码，这个正是我所关心的部分。

使用 CVS 客户端（linux 下直接用命令行就行，Windows 下推荐使用 Tortoise CVS）将项目的代码 Check Out 出来：
# cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/pnews login
Logging in to :pserver:anonymous@cvs.sourceforge.net:2401/cvsroot/pnews
CVS password: (Press Enter)
# cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/pnews co pnews
cvs server: Updating pnews
…

查看 pnews/language 目录，此目录下包含了如下文件：
big5-gb.tab
big5-unicode.tab
gb-big5.tab
gb-unicode.tab
unicode-big5.tab
unicode-gb.tab

这些都是用于字符转换的码表，然后再看看 pnews/language.inc.php 文件，其中包含了几个用于编码转换的函数：

复制代码代码如下:

// Big5 => GB

function b2g( $instr ) {

 $fp = fopen( 'language/big5-gb.tab', 'r' );

 $len = strlen($instr);

 for( $i = 0 ; $i < $len ; $i++ ) {

  $h = ord($instr[$i]);

  if( $h >= 160 ) {

   $l = ord($instr[$i+1]);

   if( $h == 161 && $l == 64 )

    $gb = '  ';

   else {

    fseek( $fp, (($h-160)*255+$l-1)*3 );

    $gb = fread( $fp, 2 );

   }

   $instr[$i] = $gb[0];

   $instr[$i+1] = $gb[1];

   $i++;

  }

 }

 fclose($fp);

 return $instr;

}

// GB => BIG5

function g2b( $instr ) {

 $fp = fopen( 'language/gb-big5.tab', 'r' );

 $len = strlen($instr);

 for( $i = 0 ; $i < $len ; $i++ ) {

  $h = ord($instr[$i]);

  if( $h > 160 && $h < 248 ) {

   $l = ord($instr[$i+1]);

   if( $l > 160 && $l < 255 ) {

    fseek( $fp, (($h-161)*94+$l-161)*3 );

    $bg = fread( $fp, 2 );

   }

   else

    $bg = '  ';

   $instr[$i] = $bg[0];

   $instr[$i+1] = $bg[1];

   $i++;

  }

 }

 fclose($fp);

 return $instr;

}

// Big5 => Unicode(UtF-8)

function b2u( $instr ) {

 $fp = fopen( 'language/big5-unicode.tab', 'r' );

 $len = strlen($instr);

 $outstr = '';

 for( $i = $x = 0 ; $i < $len ; $i++ ) {

  $h = ord($instr[$i]);

  if( $h >= 160 ) {

   $l = ord($instr[$i+1]);

   if( $h == 161 && $l == 64 )

    $uni = '  ';

   else {

    fseek( $fp, ($h-160)*510+($l-1)*2 );

    $uni = fread( $fp, 2 );

   }

   $codenum = ord($uni[0])*256 + ord($uni[1]);

   if( $codenum < 0x800 ) {

    $outstr[$x++] = chr( 192 + $codenum / 64 );

    $outstr[$x++] = chr( 128 + $codenum % 64 );

#    printf("[%02X%02X]<br>n", ord($outstr[$x-2]), ord($uni[$x-1]) );

   }

   else {

    $outstr[$x++] = chr( 224 + $codenum / 4096 );

    $codenum %= 4096;

    $outstr[$x++] = chr( 128 + $codenum / 64 );

    $outstr[$x++] = chr( 128 + ($codenum % 64) );

#    printf("[%02X%02X%02X]<br>n", ord($outstr[$x-3]), ord($outstr[$x-2]), ord($outstr[$x-1]) );

   }

   $i++;

  }

  else

   $outstr[$x++] = $instr[$i];

 }

 fclose($fp);

 if( $instr != '' )

  return join( '', $outstr);

}

// Unicode(UTF-8) => BIG5

function u2b( $instr ) {

 $fp = fopen( 'language/unicode-big5.tab', 'r' );

 $len = strlen($instr);

 $outstr = '';

 for( $i = $x = 0 ; $i < $len ; $i++ ) {

  $b1 = ord($instr[$i]);

  if( $b1 < 0x80 ) {

   $outstr[$x++] = chr($b1);

#   printf( "[%02X]", $b1);

  }

  elseif( $b1 >= 224 ) { # 3 bytes UTF-8

   $b1 -= 224;

   $b2 = ord($instr[$i+1]) - 128;

   $b3 = ord($instr[$i+2]) - 128;

   $i += 2;

   $uc = $b1 * 4096 + $b2 * 64 + $b3 ;

   fseek( $fp, $uc * 2 );

   $bg = fread( $fp, 2 );

   $outstr[$x++] = $bg[0];

   $outstr[$x++] = $bg[1];

#   printf( "[%02X%02X]", ord($bg[0]), ord($bg[1]));

  }

  elseif( $b1 >= 192 ) { # 2 bytes UTF-8

   printf( "[%02X%02X]", $b1, ord($instr[$i+1]) );

   $b1 -= 192;

   $b2 = ord($instr[$i]) - 128;

   $i++;

   $uc = $b1 * 64 + $b2 ;

   fseek( $fp, $uc * 2 );

   $bg = fread( $fp, 2 );

   $outstr[$x++] = $bg[0];

   $outstr[$x++] = $bg[1];

#   printf( "[%02X%02X]", ord($bg[0]), ord($bg[1]));

  }

 }

 fclose($fp);

 if( $instr != '' ) {

#  echo '##' . $instr . " becomes " . join( '', $outstr) . "<br>n";

  return join( '', $outstr);

 }

}

// GB => Unicode(UTF-8)

function g2u( $instr ) {

 $fp = fopen( 'language/gb-unicode.tab', 'r' );

 $len = strlen($instr);

 $outstr = '';

 for( $i = $x = 0 ; $i < $len ; $i++ ) {

  $h = ord($instr[$i]);

  if( $h > 160 ) {

   $l = ord($instr[$i+1]);

   fseek( $fp, ($h-161)*188+($l-161)*2 );

   $uni = fread( $fp, 2 );

   $codenum = ord($uni[0])*256 + ord($uni[1]);

   if( $codenum < 0x800 ) {

    $outstr[$x++] = chr( 192 + $codenum / 64 );

    $outstr[$x++] = chr( 128 + $codenum % 64 );

#    printf("[%02X%02X]<br>n", ord($outstr[$x-2]), ord($uni[$x-1]) );

   }

   else {

    $outstr[$x++] = chr( 224 + $codenum / 4096 );

    $codenum %= 4096;

    $outstr[$x++] = chr( 128 + $codenum / 64 );

    $outstr[$x++] = chr( 128 + ($codenum % 64) );

#    printf("[%02X%02X%02X]<br>n", ord($outstr[$x-3]), ord($outstr[$x-2]), ord($outstr[$x-1]) );

   }

   $i++;

  }

  else

   $outstr[$x++] = $instr[$i];

 }

 fclose($fp);

 if( $instr != '' )

  return join( '', $outstr);

}

// Unicode(UTF-8) => GB

function u2g( $instr ) {

 $fp = fopen( 'language/unicode-gb.tab', 'r' );

 $len = strlen($instr);

 $outstr = '';

 for( $i = $x = 0 ; $i < $len ; $i++ ) {

  $b1 = ord($instr[$i]);

  if( $b1 < 0x80 ) {

   $outstr[$x++] = chr($b1);

#   printf( "[%02X]", $b1);

  }

  elseif( $b1 >= 224 ) { # 3 bytes UTF-8

   $b1 -= 224;

   $b2 = ord($instr[$i+1]) - 128;

   $b3 = ord($instr[$i+2]) - 128;

   $i += 2;

   $uc = $b1 * 4096 + $b2 * 64 + $b3 ;

   fseek( $fp, $uc * 2 );

   $gb = fread( $fp, 2 );

   $outstr[$x++] = $gb[0];

   $outstr[$x++] = $gb[1];

#   printf( "[%02X%02X]", ord($gb[0]), ord($gb[1]));

  }

  elseif( $b1 >= 192 ) { # 2 bytes UTF-8

   printf( "[%02X%02X]", $b1, ord($instr[$i+1]) );

   $b1 -= 192;

   $b2 = ord($instr[$i]) - 128;

   $i++;

   $uc = $b1 * 64 + $b2 ;

   fseek( $fp, $uc * 2 );

   $gb = fread( $fp, 2 );

   $outstr[$x++] = $gb[0];

   $outstr[$x++] = $gb[1];

#   printf( "[%02X%02X]", ord($gb[0]), ord($gb[1]));

  }

 }

 fclose($fp);

 if( $instr != '' ) {

#  echo '##' . $instr . " becomes " . join( '', $outstr) . "<br>n";

  return join( '', $outstr);

 }

}

在自己的 PHP 文件中需要转码时，只需要 .tab 码表文件及相对应的转码函数，将函数中的 fopen 打开的文件路径修改为正确的路径即可。

上一篇：返回列表
下一篇：PHP可用于地址栏的base64编码和解码

与 PHP Big5 Utf-8 GB2312 编码互转的解决办法有关的文章

本文标题：PHP Big5 Utf-8 GB2312 编码互转的解决办法
本页链接：http://www.jb200.com/article/1786.html

浏览排行

热点文章

PHP Big5 Utf-8 GB2312 编码互转的解决办法