Sunday, August 28, 2011

is_utf8 is useless - can we have is_character?

Consider this code:

$data_structure = utf8::is_utf8($json)
? from_json($json)
: decode_json($json);

taken, together with the is_character suggestion, from otherwise very informative post: Quick note on using module JSON. I have seen similar code in many places. The idea is to check if the string you have is character data or a string of bytes and treat it appropriately. Unfortunately is_utf8 does not do that check:

use strict;
use warnings;

use utf8;
use HTML::Entities;
use JSON;

my $a = HTML::Entities::decode( ' ' );
my $json = qq{{ "a": "$a" }};
print 'is_utf8: ' . ( utf8::is_utf8( $json ) ? 'yes' : 'no' ) . "\n";

my $data_structure = utf8::is_utf8( $json )
? from_json( $json )
: decode_json( $json );

This fails (on my machine) with following output:

is_utf8: no
malformed UTF-8 character in JSON string, at character offset 8 (before "\x{8a0}" }") at line 12.

If that still is a mystery try this:

use strict;
use warnings;

use HTML::Entities;
use Devel::Peek;

Dump( HTML::Entities::decode( ' ' ) );

the output (on my machine) is:

SV = PV(0x24f2090) at 0x24f3de8
PV = 0x2501620 "\240"\0
CUR = 1
LEN = 16

this string is internally encoded as "\240" i.e. "\x{0a}" which is Latin1 encoding of non-breaking space. It does not have the utf8 flag set - so the code above tries to treat it as UTF8 encoded stream of bytes and fails.

I don't know if we can have is_character easily - but the lack of introspection here is surely painful.

Wednesday, August 24, 2011

CPAN, decoupling and Dependency Injection

Consider the code:

sub fetch
my ($self, $uri) = @_;
my $ua = LWP::UserAgent->new;
my $resp = $ua->get( $uri );


Yes - this is taken from a post by chromatic.

Now imagine that this is code from a CPAN module you installed and that some security concerns require you to replace LWP::UserAgent with LWPx::ParanoidAgent there. Bad luck - you'll probably need to subclass it, override that whole fetch method and pray that it will not change too much with every new release of the original module.

This is really why I am drumming this Dependency Injection drum over and over again - code that uses it is more reusable, more universal:

use Moose;
has 'ua', is => 'ro', default => sub { LWP::UserAgent->new };

sub fetch
my ($self, $uri) = @_;
my $ua = $self->ua;
my $resp = $ua->get( $uri );


Now you would not have any problem with providing a LWPx::ParanoidAgent object for the fetch method to use.

By the way, with classical DI you'd move that LWP::UserAgent->new completely out from the class, here it stays as a 'default' that can be overridden from outside if you need. The problem with classical DI is that you need to have a place where to move that initialization code - here it is sidestepped for the 'normal' usage and you need to worry about it only in the cases where you really need to. Java probably does not have this 'default' mechanism.

Thursday, August 18, 2011

Dependency Injeciton - the cooking metaphor

Let's take a typical recipe. It first describes the goal - and then it goes:


  • 2-1/4 cups sifted cake flour
  • 2 teaspoons baking powder
  • 1/2 teaspoon salt
  • 1/2 pound (2 sticks) sweet butter, room temperature

Preheat the oven to 350 degrees Fahrenheit. Butter and line two 8 x 3-inch baking pans or one 12 x 3-inch pan with parchment.

It is not:

Preheat the oven to 350 degrees Fahrenheit. Find a cow and milk her, wait until ...

neither it is:

Preheat the oven to 350 degrees Fahrenheit. Take your credit card and go to the grocery around the corner ...

Dependency injection is about writing your programs in a very similar manner - you first declare the collaborators and then go on with using them.

Thursday, August 11, 2011

So what is Dependency Injection again?

The definition I like the most is that DI is simply about separating object creation from business logic. Object creation - wiring the application - is a special type of code, different from the rest and it is useful to keep it separated. This is similar to how we remove hardcoded magic constants from our code into config files, it can be thought as a Object Oriented extension of that practice. We remove magic constants because we need to change them more frequently than the rest of the code. We do DI because we need to change the object wiring much more frequently than we change business logic and in particular we need to change it in the tests - without that unit tests would not be possible. But it is also different - because the object wiring code is much more complex than configuration files.

But this is not all - DI is also about keeping all object collaborators in it's attributes instead of reaching out for global objects (or signletons which are globals in disguise or class attributes). It thus improves object's encapsulation, makes them more self-reliant and testable. Or maybe this part is not DI - but simply writing Object Oriented code?

On the other hand, the separated out object factories are hard to test because they depend on all the objects classes they create and you want to keep them as small and simple as possible. How many such factories you need? If we have something that has a http request object as attribute - then we cannot build it until the http request arrives from the user. If we keep all collaborators in objects attributes - then we cannot build them until we have all information needed to build all these collaborators first. We thus need one factory per scope.

Tuesday, August 02, 2011

Subclassing applications

Subclassing is a great tool for making small changes to a piece of code to fit it to new requirements. It is as easy as copying code - but it still keeps the new constructs synchronized with later changes to the original. There are problems with inheritance hierarchies - but you need to have a 'hierarchy', not just two classes, to get there.

I imagine that it would be perfect for extended configuration of applications - including web applications. Wouldn't it be great if you could run a slightly changed version of you main web app by making it's code available from PERL5LIB and then subclassing it to change the colors used, add some minor new features and remove some pages for and affiliated site? Or if you could install a blog engine from CPAN, and then subclass it to add new and override old features? This could even make distribution of CPANized applications more popular.

This is one of the things I am experimenting with at Nblog (see also the screencast: Experiments with inheritance in WebNano based applications).