Monday, May 07, 2012

On the importance of intuitive names.

PARTICIPANT:

[Reading names of classes]

Binary reader, Buffered stream, Reads and writes...

[Pauses, Scrolls through list of classes].

So let me just...stream writer. Implements a text writer writing characters to a stream in a particular encoding. Umm...stores an underlying string. Text...hmmm. Text reader, text writer. Represents a writer that can write a sequential series of characters. This class is abstract.

[Pause]

Ummm...

[scrolls up and down through list of classes]

So it, you know, it sounds to me like it's, you know, it's more low-level kind of stuff. Whereas I just want to create a text file. Umm.

[Points at the description of one of the classes]

Characters to a stream in a particular encoding. I'm not sure what...obviously a text writer for writing characters to a stream.

[Clicks on the link to view more details about the TextWriter class. Then looks at list of classes that derive from TextWriter]

System dot IO dot StringWriter. This seems too low-level to me to be a text writer thing but maybe I am wrong. Umm...

[scrolls through description of the TextWriter class]

Text writer is designed for character output, whereas the stream class is designed for byte input and output.

[Sigh. Clicks on link to view TextWriter members]

Probably going where no man should have gone before here.

This guy did not make it.

Neither did any of the other 7 professional programmers that were participants in that experiment! Their task was "to write a program that would write to and read from text files on disk". They had 2 hours for that and could browse all of the relevant documentation. They were testing the API of the then new programming framework called .NET - the programmers did not know it yet - but they had programmed in VisualBasic. This is an example code fragment using the file writing API that they were expected to write: After that experiment they added a new 'File' API: and ran the experiment again. This time all participants were able to complete each task in 20 minutes and without browsing the documentation. This is the story from the "Chapter 29 How Usable Are Your APIs?" in Making Software.

Fascinating puzzle - isn't it? The article proposes following solution to it: there are three types of programmers - opportunistic, pragmatic and systematic. The opportunists tend to use the high-level abstractions and try and experiment with what would work and they intuitively get the File API as opposed to the StreamWriter API. And it just happened that all 8 participants of that study were opportunist programmers?!

The developers who participated in the file I/O studies were very much in the opportunistic camp. The way they used the FileObject class in the second study was almost a perfect example of opportunistic programming. The contrast with the more abstract classes they had to use in the first study was stark. Indeed, one of the main findings from the file I/O studies was that without such high-level concrete components, opportunistic developers are unlikely to be successful.
Sounds like a weak argument:
  1. Copy pasting examples and playing with the code is the most efficient way to learn a new API - so I suspect that what they call 'opportunistic programming' is actually learned behaviour that would be characteristic of any experienced programmer.
  2. Expecting a file related API sounds quite natural and not really related to being opportunistic or systematic, the task at hand was exactly file related IO and in most programming languages there is such an API, it also makes sense that there is one because file operations are very common.
  3. I don't see anything higher level or more concrete in the FileObject related example code - it looks on the same level of abstraction as the StreamWriter code. The authors claim that it is the fact that you have both StreamWriter and StreamReader that makes it lower level then FileObject which is only one - but I don't see how that follows.
Phil Karlton wrote:
There are only two hard things in Computer Science: cache invalidation and naming things.
Naming things also comes out as quite important.

Wednesday, May 02, 2012

Tricky problems of the Perl language - a completely arbitrary list

  1. Overloading and parameter types validation.
  2. Clash between overloading hashification and arrayification and the each, keys and values with the new dereferencing semantics.
  3. There is no way to know if you received characters or binary data, lots of libraries and even core functions work differently in these cases - but often it is not documented.
  4. In Perl observing a variable changes it - for example - reading a variable containing a string in a number context will fill in its number slot (as far as I understand it - see perlguts for the details). Normally it does not matter - but it makes threading less efficient (because shared variables need to go through additional loops to work).
Two bonus points - using too much of the $ character makes
  1. Perl code ugly
  2. Perl programmers not team players
And one more fixed recently - the one making checking $@ after eval unreliable.

Friday, April 27, 2012

If all of your state is on the stack then you are doing functional programming

Consider this: This is imperative code, it uses variables with state, but from the outside it is a pure mathematical function. This is because all of its variables are created anew when the function is being executed and destroyed after that. It does not matter that the variables contain simple values: This can also be purely functional provided that Markdent::Simple::Document does not use globals or system calls.

Imagine that this was enforced by the compiler/interpreter - maybe with a new keyword function, or something. I have the feeling that this would be very similar to how use strict works - giving the end user some kind of safety.

Tuesday, April 17, 2012

A Data::Dumper bug - or Latin1 strikes again

I did not believe my boss that he found a bug in Data::Dumper - but run this:

To get a character that has internal representation in Latin1 I could also use HTML::Entities::decode( 'ó' ) there with the same result. The output I get on 5.14.2 and 5.10.1 is:

When I check the dumped string - it has the right character encoded in Latin1 - and apparently eval expects UTF8 when use utf8 is set. Without use utf8 eval works OK on it. If the internal representation of the initial character is UTF8 (like when the first line is my $initial = 'รณ';) - then the dumped string contains UTF8 (which is again might be interpreted incorrectly if the code does not have use utf8 preamble).

Considering that Data::Dumper is a core module and one that is one of the most commonly used and that its docs say:

The return value can be evaled to get back an identical copy of the original reference structure.
this looks like a serious bug.

Is that a known problem? Should I post it to the Perl RT?

Update: Removed the initial eval - "\x{f3}" is enough to get the Latin1 encoded character. Some editing.
Update: I tested it also on 5.15.9 and it fails in the same way.
Update: I've reported it to the Perl RT - I am not sure about the severity chosen and the subject - this was my first Perl bug report.
Update: In reply to the ticket linked above Father Chrysostomos explains: "The real bug here is that ‘eval’ is respecting the ‘use utf8’ from outside it." and later adds that 'use v5.16' will fix the problem in 5.16.

Saturday, April 14, 2012

Breaking problems down and defaults

In a classic essay Dave Rolsky wrote: Want Good Tools? Break Your Problems Down. I wish more people have read this and applied the advice - CPAN libraries would be more useful then. But stating the goal is probably not enough - we need also to talk about how it can be reached and about problems encountered on the way there. For example let's take the module that was the result of the process described in the essay linked above:

The problem is that the criticized approach, a unified library that just converts Markdown to HTML, would result in a simpler API - for example something like this:

Maybe the difference does not look very significant - but after a while it can get annoying. For the 99% of cases you don't need to extra flexibility that comes with the replaceable parser - so why should you pay for it?  If I had to use Markdent frequently I would write a wrapper around it with an API like above.

By the way, Text::Markdown already has this wrapper and it does present a double, functional/object oriented API - where the presented above simple, functional one does the most common thing, while the object oriented one gives you more control over the choices made.  Only that it still couples parsing and generation.

Another way of simplifying the API is providing defaults to function arguments. For example to the object constructor.  Dependency Injection is all about breaking the problem down and making flexible tools -  but it might become unbearable if we not soften it up a bit with defaults.

Programming is always about doing trade-offs - here we add some internal complexity (by adding the wrappers or providing the defaults) and in exchange get a simplified API that covers the most common cases while still maintaining the full power under the hood.  I think this is a good trade off in most cases, and especially in the case of libraries published to CPAN that need to be as universal as possible.

Wednesday, April 04, 2012

What if "character != its utf8 encoding" is overengineering?

"You shell not assume anything about the internal representation of characters in Perl" - is a mantra that has been repeated over and over by the Perl pundits for something like a decade. But there are still people who refuse to take that advice and want to peek into the internal representation of characters. What if our sophisticated approach about isolating the 'idea of a character' and its representation is a case of overengineering? People often overreact for past traumas - programming is not an exception - and the conversion from many national 'charsets' to unicode was a big event. Maybe expecting another conversion soon is such an overreaction?

Getting rid of the Latin1 internal encoding does not look like a big price for improving simplicity and getting rid of all these subtle mistakes. I think it is important that the language is understood by its users and if it is not, then maybe, instead of blaming the programmers, we could make it easier to understand? Sure it is nice to have the possibility to change the internal encoding from UTF8 to UTF16 or maybe something completely different in the future - but I have the feeling that this might be case of architecture astronautics.

Saturday, March 31, 2012

Plack::Middleware::Auth::Form - some updates and a possible name change

I'll make a new release of Plack::Middleware::Auth::Form soon. There are quite a few fixes in the Plack::Middleware::Auth::Form repository gathered since the last release. It is all from external contributors - thanks a lot!

The bug reported in #75896: Cookie Expiry Date not set for "remember" session is quite interesting. Apparently Plack::Middleware::Session sends the session cookie on each request and if you don't set Expiry Date each time it will happily unset it.

I am thinking about changing the name to WebPrototypes::LoginForm. Some people did not like the name Plack::Middleware::Auth::Form from the start, because it is a bit more high-level then the other Auth middlewares, and now I have two more elements for quick web application prototyping under the WebPrototypes namespace.

Sunday, March 25, 2012

Blog writing and assuming stupidity

Writing a blog is not easy. People did not change much since the 'bread and circuses' times. You need to spicy your writing up with strong statements or you'll not get any audience. On the other hand ridiculing someone while having a very superficial knowledge of the matter makes you a bully.

What happened there? Again, it is hard to tell anything interesting without some speculation - and possibly I'll have to apologize to Dave for this - but I think Dave has read "You must hate version control systems, we won't be using any" and assumed that this is is from a company that superficially rejected version control because they did not want to learn or, in other words, from someone that assumed that version control is useless. Talk about beams and eyes. That's not to say that I vouch for the 'pipelines' system or for replacing version control with it. I still don't know much about these pipelines - but new ideas don't have to work in every possible aspect to be worthwhile and you'll not have a break-through idea if you always stick to the accepted wisdom.

It is easy to assume stupidity - on average people are mediocre - but the internet is a big search space - expect to be surprised from time to time :)

Monday, March 19, 2012

Verbs and Nouns

There is a popular, if a bit long and blurry, rant by Steve Yegge: Execution in the Kingdom of Nouns - it is about how we overuse nouns and under-use verbs when programming in Java. Of course it is not different in other object oriented imperative languages. Programs do something, subroutines do something - verbs should be at least as prominent as nouns in programming - but when we need to write an application we build it out of objects. Even if it is a web application - something that translates the HTTP request into the HTTP response - we code it as an object with fields and all that stuff. Even if we code against an API that defines the web application as a subroutine reference, we still write it as an object and then make a closure over it to pass to the backend.

Do we overuse nouns? Or maybe it is that actions are opaque and unstructured - and when we need to get to the the details, the parts that compose them - then it is more natural to treat them as things? Wouldn't it be easier to incorporate streaming in PSGI if the application there was an object with methods and attributes?

Sunday, March 11, 2012

WebNano - code experiments

WebNano is only a few hundreds lines - but you can arrange it in many many ways - and then you need to test it with all kinds of URL schemas and controller architecture. I do a lot of exploratory coding - testing all the possible arrangements. I feel that I keep forgetting about the things that suggested me to choose one design over others. Maybe I'll keep some notes here. In the past two weeks I tried a few things:
  1. Keeping the parsed path as an attribute in the controller.
  2. Additionally to the above I tried adding three more controller methods: 'action_name', 'action_args' and 'action_postfix'.
  3. I wrote two additional test controllers for the simple url schema, both redirecting handling to DvdDatabase::Controller::Dvd::Record for the case where we have a record to handle: overriding local_dispatch, overriding handle
The conclusions:
  1. having the path as attribute is handy for code retrieving the record
  2. the additional controller methods help with writing custom dispatchers
  3. splitting the processing to two controllers - one for the case where you have one object to work on (like viewing, editing, deleting), second for the case where we don't (like listing, creating) is very clean - you can have the object as controller attribute
  4. the the additional dispatcher methods are less useful for that more clean architecture
  5. the biggest problem was always preventing the methods that require the object to be called when we don't have the record id on the path (like '/view' when we assume that it should be '/1/view') - and the best method to do that is having the two controller classes
  6. overriding 'handle' is actually simpler - because it is a very simple method

Tuesday, March 06, 2012

Why Bread::Board looks mostly redundant

This is based on two assumptions - that you don't use BB as a kind of Service Locator (but I agree with for example Dependency Injection != using a DI container that this is an anti-pattern) and what 'mostly follows' that the product of your BB container is just one object - the application class. I believe these are good guidelines for software architecture. With those two assumptions all that BB gives you is that you can name your partial results and then use them in later computations, but Perl has a good support for this - it is called variables.

For example let's take the original example from Bread::Board synopsis: Now - let's do the same with just variables: You can also feel fancy and do it with Moose lazy attributes: This is not longer than the BB example and it uses generic tools.

Friday, March 02, 2012

Mason 2

Mason 2 looks very interesting. First of all it has the a file a page modus operandi that works so well for PHP, then it has all the template inheritance and Moose template candies that look very powerful, finally the page code works in the request scope - i.e. it can access the page parameters and stuff from attributes which is so much more convenient then passing these values around as method parameters as you do in Catalyst. The only part lacking from my cursory look at the documentation is anything that works in the application scope. Most probably it is just that I did not found anything in the most exposed documents - but this omission still looks ominous.

Saturday, January 14, 2012

Schlep

Schlep is tedious, unpleasant task.  According to Paul Graham schlep is also what really defines a company - it is doing the tasks that are unpleasant and tedious for someone that they would pay you for.

Narrowing this down to my own Perl web development work - the schlep for me was always getting the basic web app running with user registration, login pages, password reset mechanisms, etc. - in every new project that was the most repeatable, boring work.  I think everyone has the feeling that this does not need to be like that.  I've started thinking about what could be a solution to this and here are my first experiments about fixing it: Plack-Middleware-Auth-FormWebPrototypes::ResetPass, WebPrototypes::Registration (I might rename the first one to WebPrototypes as well).  The point is to solve it across the multiple web frameworks, templating languages and storage layers - so that it can survive moving from project to project.

What is your schlep?

Saturday, December 10, 2011

A kind of call by name

I often write code like this:
$self->create_user( username => $username, email => $email, pass_token => $pass_token );
I wish I could get rid of the naming redundancy in this call:
$self->create_user( $username, $email, $pass_token );
(without changing 'create_user' of course).

Probably some new syntax would be needed.

Friday, November 18, 2011

'use strict' and cargo cult programming

I've just read  mjd's confession "Why I Hate strict", it is from 2003 so he might have changed his views now, but there is nothing that would indicate that on this web page.  Anyway, his main argument that the usual advice to use strict is automatic and mindless  and that it often does not really prevent the problems that people think it does.  In other words it is a cargo cult programming to which he contrast programming with thinking and deep analysis of everything you do.

I used to program without use strict; use warnings but after exposure to the usual propaganda I switched and I found that the cost of mindlessly adding it is negligible, the cases where I need no strict are very rare, and there are many benefits of doing it, especially when working with old code.  This cult is rather effective in luring the cargo planes to land in my atoll.  On the other hand I am all for deep analysis and checking your assumptions from time to time.  There are many valid points in Marc Lehman common::sense and I would like to see them discussed.  While we are on the road to have use strict by default we might also try to make it better.

Saturday, November 12, 2011

$ primes for money

The thesis above sounds uncontroversial.  It is also rather uncontroversial that '$' is relatively frequently used when programming in Perl.  Now - what can be the consequences of that?
Money has been said to change people's motivation (mainly for the better) and their behavior toward others (mainly for the worse). The results of nine experiments suggest that money brings about a self-sufficient orientation in which people prefer to be free of dependency and dependents. Reminders of money, relative to nonmoney reminders, led to reduced requests for help and reduced helpfulness toward others. Relative to participants primed with neutral concepts, participants primed with money preferred to play alone, work alone, and put more physical distance between themselves and a new acquaintance.
from one of the first links in the query above.  Pretty sad - can that apply to the Perl community? Another link from that list, an entertaining BBC video report suggests also some other effects: hunger and pain insensitivity.

Monday, November 07, 2011

Thesis: simple - antythesis: easy - synthesis: ...

Rich Hickey's Simple Made Easy is a great talk, a must see, with lot's of insight, but together with that it also misrepresents what Agile is about.  Hickey's main point is that we should try to write simple software, because this is the only way to have reliable software, and he is right of course.   He notes that when you encounter a new bug and try to fix it - all the existing tests pass - so they will not help you in finding the cause of it.  You need to do the bug analysis on your own  and the complexity of your code is your enemy there.  He is also right when he talks about how easy means familiar a not simple and that it is a trap because it drives us away from the other (in small increments I would add).  He is insightful when he talks about things that are source of complexity.   He is funny, but missing the point in his critique of Agile.

The development sprints he attacks are not about doing the bulk of the work - they are about building a prototype on which we can test our assumptions.  Without the understanding that we get from these prototypes we could simplify as much as we want but it would not change the fact that our solution solves the wrong problem.  Agile is not an enemy of simple, it puts a lot of weight to doing the easy - but not because this is the goal - rather it uses easy as a mean to get to the correct. Agile is the answer to the paradox that we don't know what we should make until we already have a prototype of that thing.  I wish more developers cared about simple - but only after they know what is needed.

Tuesday, November 01, 2011

Notes on the Synthesis of Form

According to Wikipedia the origin of  Design Patterns lays in the Pattern Language ideas by the unorthodox architect and philosopher Christopher Alexander, but his earlier work also used to be widely read by computer scientists:
Alexander's Notes on the Synthesis of Form was required reading for researchers in computer science throughout the 1960s. It had an influence[8] in the 1960s and 1970s on programming language design, modular programming, object-oriented programming, software engineering and other design methodologies. Alexander's mathematical concepts and orientation were similar to Edsger Dijkstra's influential A Discipline of Programming.
The solution to the design problem that he proposes there does not look too attractive now, but his models, his metaphors, his insight into the design process - it's all still relevant and spot on.  I am surprised that the Agile movement does not quote "Notes" as one of their foundation texts.

Saturday, October 15, 2011

Concentration and Flow or Yet Another Dependency Injection Note

Imagine that you need to do some small home improvement or maintenance work and you have all the needed tools, in good quality, clean and well maintained with all cutting blades sharpened and no missing screwdriver heads. A nice feeling - isn't it? When you start work like this you can concentrate on the task at hand instead of thinking where you can borrow that drill tool.

Collaborators in an algorithm are like those tools, having them readily available lets you concentrate on the problem.

Friday, October 14, 2011

Object oriented versus functional interface

I use DateTime::Format::W3CDTF for parsing my dates:

my $w3c = DateTime::Format::W3CDTF->new;
my $dt = $w3c->parse_datetime( $date_string );

I wish it was:

my $dt = DateTime::Format::W3CDTF->parse_datetime( $date_string );

and that the library created the parser on the fly as needed. It's not only less typing - but also much simpler mental model. This simpler model is sometimes too simple - for example if you parse a lot of dates then sparing the parser creation each time can make a difference.

I think the optimal thing to do is provide two APIs - like JSON - a functional one:

$perl_hash_or_arrayref = decode_json $utf8_encoded_json_text;

and an object oriented one:

$json = JSON->new->allow_nonref;
$perl_scalar = $json->decode( $json_text );

for those that need that extra control.