MTGoogleSearch with Perl 5.8

by john on January 24, 2003

I’m really starting to regret upgrading to Perl 5.8.0. When I did it I didn’t realize I’d be on the bleeding edge – but that’s where I always seem to find myself so so be it.

Tonight’s problem was with an MT tag called MTGoogleSearch. Well that’s not entirely true – the problem was actually within the Movable Type code but the problem was exposed when I tried to get MTGoogleSearch implemented. After enabling the tag I started getting this error when rebuilding:

Wide character in subroutine entry at C:/Program Files/Apache Group/Apache2/cgi-bin/lib/MT/FileMgr/Local.pm line 130.

Here is what that line looks like:

return $ctx->digest ne Digest::MD5::md5($$content);

According to the CPAN documentation for the MD5 function Perl 5.8 supports Unicode characters in strings and passing strings that contain chars with ordinal numbers above 255 to the MD5 function will cause it to “croak.” They suggested using the UTF-8 representation of the string, which seems to work like a charm.

So now the line looks like this (I also added the encode function at the top of local.pm):

use Encode qw(encode_utf8);
...
return $ctx->digest ne Digest::MD5::md5(encode_utf8($$content));

I believe that what is happening is that some Google results are being returned in one of my entries that includes Unicode strings with the problematic high ordinal characters.

Now I just have to decide how I want the results to look…

{ 3 comments }

Eric Angel May 23, 2003 at 3:20 pm

Thanks for your post. I would like to add that this also helped in reading base64 files (for mail attachements).

cinu.net August 20, 2004 at 7:35 pm

Google Web API ์ ์šฉ

์–ด๋–ป๊ฒŒ ์‚ฌ์šฉ๋˜๋Š” ์ง€ ๊ถ๊ธˆํ–ˆ๋‹ค. ๋ช‡ ๋‹ฌ์ „์— Google Web API license key๋ฅผ ๋ฐ›์•„๋งŒ ๋†“๊ณ ์žŠ๊ณ ์žˆ๋‹ค๊ฐ€ Individual Entry Archive Template์— ์ ์šฉํ•ด ๋ณด์•˜๋‹ค. ๊ทธ๋Ÿฐ๋ฐ rebuild์‹œ ์•„๋ž˜์™€ ๊ฐ™์€ ์—๋Ÿฌ๊ฐ€ ๋–ณ๋‹ค. Wide character in p…

Tom Keating February 2, 2005 at 11:37 am

I use MTGoogleSearch for displayed Related Entries on my blog (http://blog.tmcnet.com/blog/tom-keating/).

MTGoogleSearch works well, but it changes the encoding on my blog from ANSI/iso-8859-1 encoding to UTF-8. The result is that quotes, apostrophes, em-dashes, etc. display with gibberish characters.

When I View Source on the page (Notepad) and do a File, Save, it says “UTF-8” in the File type instead of the usual “ANSI”, which indicates to me that MTGoogleSearch is changing the encoding.

Here’s a sample page – note the weird characters.
http://blog.tmcnet.com/blog/testblog/main-test.asp

As soon as I take out the Google code in the template, the webpage looks fine.
i.e.
http://blog.tmcnet.com/blog/testblog/main-test2.asp

Do you know if I can get the Google results to encode as ANSI/iso-8859-1 so it doesn’t mess up the page’s encoding?

Thanks in advance.

{ 2 trackbacks }

Previous post:

Next post: