Friday, September 30, 2011

Courriel::MMS

I think it is time to announce Courriel::MMS - it is an extension for the new email handling library by Dave Rolsky for processing MMS messages forwarded as emails by the mobile operators. This is still just a github link, no CPAN package yet, but it works in our production servers since Wednesday - so I bet on this code :)

It is a bit heuristic - for example many operators send a subject like 'You have received an mms' - which is useless for us, so the library tries to find something else that would act as the subject we need. For dealing with the various mobile operators, that each send a slightly different format of these emails, I used the Factory design pattern together with Module::Pluggable - this is a novel design for me so I wait for comments.

Tuesday, September 20, 2011

URI->path expects binary data

Update: changed new to path - with new it would be reasonable to require that the uri fed to the parser is already an ASCI string containing the already URI encoded url.
Consider this code:
use 5.010;
use Encode 'encode';
use URI;

my $uri = URI->new( 'http://example.com/' );
say $uri->path( encode("UTF-8", "can\x{00B4}t-make-it-work" ) );
say $uri->path( "can\x{00B4}t-make-it-work" );

The output (in perl 5.14.0) is:
http://example.com/can%C2%B4t-make-it-work
http://example.com/can%B4t-make-it-work

If your page is encoded in UTF8 - then the first one is correct: %C2%B4 is the URI encoded UTF8 encoding of Unicode Character 'ACUTE ACCENT' (U+00B4). If your page encoding is Latin1 - then the second one would be correct - but this is only by accident - in that case you should still use encode("iso-8859-1", ...).

There are probably many other string manipulating libs that should document if their input should be binary encoded data or decoded character strings.

Thursday, September 01, 2011

Names are special

At perl.com Tom Christiansen talks about sorting names, among other things. It appears surprisingly difficult to do properly with many special rules for each language. By coincidence at programming.reddit.com just above a link to that article there is Personal names around the world - a link to a w3c article about even more complications with names. At PhilPapers we had to solve *somehow* a few of these - the result is Text::Names. It is still rather limited: "While it tries to accommodate non-Western names, this module definitely works better with Western names, especially English-style names" - but there is already lots of logic embedded there.