URI->path expects binary data

Tuesday, September 20, 2011

URI->path expects binary data

Update: changed new to path - with new it would be reasonable to require that the uri fed to the parser is already an ASCI string containing the already URI encoded url.
Consider this code:
use 5.010; use Encode 'encode'; use URI; my $uri = URI->new( 'http://example.com/' ); say $uri->path( encode("UTF-8", "can\x{00B4}t-make-it-work" ) ); say $uri->path( "can\x{00B4}t-make-it-work" );
The output (in perl 5.14.0) is:
http://example.com/can%C2%B4t-make-it-work http://example.com/can%B4t-make-it-work
If your page is encoded in UTF8 - then the first one is correct: %C2%B4 is the URI encoded UTF8 encoding of Unicode Character 'ACUTE ACCENT' (U+00B4). If your page encoding is Latin1 - then the second one would be correct - but this is only by accident - in that case you should still use encode("iso-8859-1", ...).

There are probably many other string manipulating libs that should document if their input should be binary encoded data or decoded character strings.

Perl Alchemy - notes of a programmer

Tuesday, September 20, 2011

URI->path expects binary data

No comments:

About Me

Search This Blog

Blog Archive

Followers