Consider this code:
use 5.010;
use Encode 'encode';
use URI;
my $uri = URI->new( 'http://example.com/' );
say $uri->path( encode("UTF-8", "can\x{00B4}t-make-it-work" ) );
say $uri->path( "can\x{00B4}t-make-it-work" );
The output (in perl 5.14.0) is:
http://example.com/can%C2%B4t-make-it-work
http://example.com/can%B4t-make-it-work
If your page is encoded in UTF8 - then the first one is correct: %C2%B4 is the URI encoded UTF8 encoding of Unicode Character 'ACUTE ACCENT' (U+00B4). If your page encoding is Latin1 - then the second one would be correct - but this is only by accident - in that case you should still use encode("iso-8859-1", ...).
There are probably many other string manipulating libs that should document if their input should be binary encoded data or decoded character strings.
No comments:
Post a Comment