Perl Alchemy - notes of a programmer: October 2009

Tuesday, October 27, 2009

Controller immutability

On IRC I've got an answer from t0m and others about why they would not use controller attributes for storing data related to the current request - they want to keep their controller object immutable for the request phase. The idea is that if you have an object that is being reused then it would be nice if after each use of that object it is still in the same clean state. That is it is nice to always have the changes contained in a minimal area.

I'll leave it to the experts to list all the advantages of immutable data structures, but I've seen why we needed a similar compartmentalisation in HTML::FormHandler. The FormHandler form object is reusable, you can use it to validate many parameter hashes. We started coding it as a mutable object and storing all the changing values in the form object itself, the first problem was that we kept forgetting to reset them at the start of processing of new data. This lead to many subtle bugs, sometimes difficult to track so we started to just recreate the form object for each processing run. This worked OK, because the form object itself was not that heavy to create until someone had a form object that contained other heavy object and recreating it would require recreating them. So eventually we separated all the state that changes during the processing into a separate Result object, now to clean the form state we need to recreate only that separate object.

It needs to be stated clearly that this separation of immutability is a trade-off (as always) - it is an additional requirement so it increases the complexity of the code and means for example that the $self object in controller actions is underused - because actions should not change it's state (including storing something in it's attributes). Certainly it is not feasible to make such change now and this also would be a trade-off but for the sake of exercising imagination: maybe actions could be methods on the changing part (and not on the controller object) and only use the controller object in some way - this way all the basic code (like that from the Manual) would use $self object in the way common to object oriented code.

Saturday, October 24, 2009

Catalyst::Component::InstancePerContext

How often do you use the controller object? It is the first, most important parameter passed to all the controller methods - but it is nearly never used in the body of these methods. In the code samples in the whole Catalyst::Manual distribution $self is assigned to 117 times - but is used only 38 times (and not all code samples are controllers there). This sounds like a strange case for object oriented code which bases it's definition on the usage of the $self object.

There is much discussion about the stash and how to replace it. The main problem with stash is that it is global - there is no use in replacing it with another global thing. Just like global variables it needs to be replaced by many specialized local things like parameter passing and attributes, the controller object attributes, and for that to be entirely clean we need the a new controller object per response.

Wednesday, October 21, 2009

OpenID support in LoginSimple

This week I've added some OpenID support to LoginSimple. Please clone the git repository and try it out! The hard part of the whole LoginSimple idea is to make it universal enough. We need people to try to apply it in many different scenarios and give us some feedback.

The test application is in t/lib/TestAppOpenID.pm, it has some tests in t/07-openid-live.t, but you can also run it with perl -Ilib t/lib/script/testappopenid_server.pl and then go to the http://localhost:3000/login page and test it manually.

Tuesday, October 13, 2009

HTML::FormHandler - a few notes

This is not an introduction to HTML::FormHandler - the docs there are very good, but I wanted to write down the main ideas I had when contributing to the design of it.

Let's look at the functions of a form processor:

processing the input into internal representation
checking the values
saving the internal representation into the persistence layer and loading values from it
rendering the form (with input, error messages, etc)

The 'best practice' of rendering the HTML view of an application page is to do it in the templates. Unfortunately there is just too much logic needed for the correct rendering of forms (mostly with the display of errors) to leave that for the templates mini language (like that of TT) - that's why this needs to be done by the form processor. But because of the diversity of the rendering requirements it also needs to be easily replaceable, ideally part by part. Then the application could start with a crude but working HTML rendering, introduced with minimal effort from the programmer since it is built into the processor and later gradually refined and replaced.

CGI.pm and after it Catalyst and other libraries parse the url-encoded and body parameters into a hash of scalar values but internally we would like to manipulate objects like DateTime instead of lists of values for year, month and day. In HFH this conversion is done in two phases - first the input is turn into a deeper structure built of hashes and arrays - then then the processor goes recursively from the leaves and at each node applies a conversion if it is defined for that node.

There is always a question when we should do the checks - sometimes it is easier do do them on the input data sometimes on the internal values. A good example is DateTime - we can check if the day field is a number between 1 and 31, but we also need to check if the whole date is correct and the business rules could add another requirement that the day of the week is Monday or some other operation that is most convenient on the DateTime object. In FormHandler all of this is possible:

How this works? As explained above the processor starts with the leave nodes - and checks the day, month and year fields (here they don't define any transformations only checks, but they could). Then it goes to the parent node date_time and there it has the hash of values from the leave nodes ready for the next transformation and checks - it is passed to:

sub { DateTime->new( shift ) }

If that subroutine triggers an exception, for example when someone passes

{ year => 2009, month => 2, day => 30 }

to it - then the error is recorded with the message This is not a correct date. If it works - the value returned from it, a DateTime object, is then passed to the next check, and later presumably saved to the database. The checks and transformations are taken from a single list - so you can interweave them as you need.
Instead of transform and check in the apply attribute we can also use the Moose types and coercions and also we can mix these two styles (for example coerce some value using Moose types and then check it again). If someone needs even more power - he can also write his own field class with it's own validation method - and do both validation and transformations in there, that method would receive the same hash of child values.

To save the full data structure of values that can contain many inter-related records HTML::FormHandler::Model::DBIC uses
DBIx::Class::ResultSet::RecursiveUpdate. I separated it so that HFH is not related to the hacks I made there :) Or maybe I hoped that perhaps other form processors could reuse my module. Using it you can update a company record together with all it's related addresses in one form.

Wednesday, October 07, 2009

Barriers to entry

Chromatic calls us to Remove the Little Pessimizations, to sand off the tiny, rough corners of our software that individually are small, ignorable distractions. I've read somewhere that each barrier to entry reduces the number of users by some percent - and it is a good model to think about it. Each one reduces the number of possible uses and users and the effect is cumulative - hundreds of small barriers make the tool useful in only a narrow niche, but even just one small barrier can make a tool not usable in a particular setting. The question is when fixing these little pessimizations becomes needlesly pedantic.

I believe that fixing CPAN installation issues, each one usually a small issue occurring in some special cases is needed (at least for the most popular modules) if we don't want to confine CPAN users, and since CPAN is the killer application of Perl, then it follows that also Perl users, to a narrow niche.

Another thing is how off-putting are the pessimizations that seem gratuitous, you immediately feel neglected and you imagine how the arrogant developers just don't care about your case. Then there are also pessimizations that look small if you have a big project - but are impenetrable barriers for small projects - this is how Perl lost the place of the most used language for WWW programming to PHP.

Tuesday, October 06, 2009

CPANHQ

There is a project to build a Catalyst based CPAN search engine, with the idea to make it more hackable and ultimately incorporate some more 'social' features into it. The original effort was started by Brian Cassidy, he created a cpanhq github repository and wrote some initial code, later there were some features added by Shlomi Fish (in his repo) and now I am working on it. In my repo you'll find a version that finally has a search page and it even works. It uses the SQLite FTS3 full text search engine.

The biggest problem for me is that I had to modify the database quite extensively and now the loading scripts don't work any longer. To show something I decided to load the database into the repo - so now it can take awful long time to download the software, but in return right after the download you should be able to try out the searching. It is not at end user quality - you need to know some internals to use if effectively (like using 'me.name desc' in the order field to sort the packages by name), but it should be reasonably fast. Now I am thinking what are the most useful search strategies.

Perl Alchemy - notes of a programmer