Friday, June 13, 2008

DBIx::Class gotchas


DBIC is a complex beast - and it has it's rather obscure corners where it can burn the inexperienced. Here are two of those, comments with others are welcome.

Don't use $rs->update_or_create(...) (or $rs->find_or_create) on a table with autoincrement primary key



The ->find method requires that all the columns comprising the key used for the search are present in the query or in the resultsource conditions. Otherwise it will return some random record (and emit a warning). This is because currently there is no way to determine if all the needed columns are present. To do this you would have to parse the arbitrary SQL::Abstract query that can be in the resultsource condition and check how the columns are used there. Think for example about:
$new_rs = $cd_rs->search( { -and => [ id => [1, 2], id => { '>' => 1 } ] } )
and then $new_rs->update_or_create( { artist => 1 } ).


But for autoincrement primary keys the ->create method requires that the primary key is not present in the query. If you provide { pk => 'undef' } to ->create, this would generate a query like: INSERT INTO cd ( pk, ...) VALUES ( NULL, ... ) - some databases would accept that but others (PostgreSQL for example) not.


So in summary, for tables with autoincrement primary keys ->find requires that you provide the primary key in the query, and ->create requires that you don't - this means that you cannot pass the same query hash to both methods and this is what ->update_or_create does.


Update: Looks like the 4773 revision fixes the problem described below. That's why I moved this point to the end of the post. I did not check it too deeply though.

Don't call $row->some_relation when the $row has not loaded the columns required to resolve some_relation



Column values in DBIC are kept in a hash - this means that a column can have some defined value, have undefined value or can be completely non-existent - and DBIC interprets all of those states differently. When $row->some_relation is called and ->some_relation uses a foreign key column that is non-existent then the result is currently random (and in the future most probably an exception will be raised).


When this can happen? When you create a row and don't provide the value for the column (or if you retrieve the row from the database and explicitelly omit the column). For example if you do
$cd = $cd_rs->create( { name => "Cisza" } )
(or $cd = $cd_rs->new...). Now the 'artist' column in $cd is non-existent and when you try to call $cd->artist it blows up.
In a library where you don't know how a record was created you need to check first if the relevant columns are loaded before you call a relation on the record.


I thought that a good remedy would be also to always provide undef in the ->create and ->new calls for all the columns that we don't have values for, but allegedly this is not a good strategy (don't ask me why).

5 comments:

Anonymous said...

I made the switch to DBIx::Class for a refactoring project where the old code was using raw sql all over the shop and using a custom db library.

It's both easier to use and more efficient, and I'm happy to drop down to SQL - it's just a matter of grabbing the schema object and calling a search against a resultset, or grabbing a db handle if you need to do something directly.

I still miss CDBI's ability to add custom sql more easily - DBIx::Class requires subclassing, where as CDBI allowed you to specify a named query for a relation, search or even store/retrieve.

Over all though, I've learnt that even if the ORM can do something, you shouldn't be afraid of doing it yourself.

ivan kurmanov said...

> Column values in DBIC are kept in a hash - this means that a column can have some defined value, have undefined value or can be completely non-existent - and DBIC interprets all of those states differently. When $row->some_relation is called and ->some_relation uses a foreign key column that is non-existent then the result is currently random (and in the future most probably an exception will be raised).

Why shouldn't it return undef for undefined and uninitialized column values? Is it because that may happen in a number of different circumstances?

zby said...

Hi Kurmanov - unfortunately I cannot answer your question. This is what the author chosen - but fortunately it seems that he improved it a bit lately (see the update to the article above).

ivan kurmanov said...

Thanks, Zbignew.

It is not exactly clear how well does this fix improves things for the case we observe here, but it's hopeful.

Anonymous said...

Helpful?
http://search.cpan.org/~ash/DBIx-Class-0.08010/lib/DBIx/Class/Manual/Cookbook.pod#Arbitrary_SQL_through_a_custom_ResultSource