Thursday, October 26, 2006

Tags and search and DBIx::Class

Update: Advanced Search in web DBIx::Class based applications (with tags, full text search and searching by location) is a more elaborated version of this article.

Some time ago I had an idea for a bookmarking site - nothing really revolutionary, but with an effective interface. I've decided that it needs to combine search, browsing by tags and other properties, ordering and jumping to pages. I have really thought much how to make it effective - that is letting the user find some web page with least number of operations assuming that she remembers only random bits from it and that the data we display perhaps reminds her about some new info. These interface ideas are material for another post - here I would like to concentrate on implementation of tagging in DBIx::Class.

I had following requirements:
  • tags can be combined together and with other search terms
  • all search should use indices
  • use database paging of results
  • for tag clouds I need a list of tags used by bookmarks matching a search criterium
With only the first three criteria I could use a simple table with tags concatenated in one field and use full text search on it. In fact the first implementation used that technique. This meant tags can be only one word - but that is a reasonable constraint. The searches for a combination of three tags there looked like WHERE ? @@ tags AND ? @@ tags AND ? @@ tags AND THE_OTHER_CRITERIA where the @@ operator is a full text match for PostgreSQL and tags was the name of the aggregated tags column.

Unfortunately I did not found any way to meet the fourth requirement with this database schema, so I decided to have separate bookmark and tag tables. The tag table has two columns bookmark_id and tag_text with indices on both of them. One idea how to make the combined search here could be like:

SELECT b.*
FROM bookmark b, tag t
WHERE t.bookmark_id = b.id
AND (t.tag_text IN (?, ?, ?))
GROUP BY b.id
HAVING COUNT( b.id )=3

But that would mean a full scan on the bookmark table and I wanted a solution using indices. So I devised another query for that schema. For a combination of many tags I would do as many joins to the tag table as there are tags in the query:

WHERE tag_1.bookmark_id = bookmark.id AND
tag_1.tag_text = ? AND
tag_2.bookmark_id = bookmark.id AND
tag_2.tag_text = ? AND
...

For the first glance this looks hard to do in DBIx::Class - but actually it is a lot easier than it seems, the key learning here is: If the same join is supplied twice, it will be aliased to rel_2 (and similarly for a third time) (from POD for DBIx::Class::ResultSet).

I had to build hash with search parameters with the proper key names (tag, tag_2, tag_3 ...) and values from the @tags array :

my $suffix = '';
my $i = 1;
for my $tag (@tags){
$sqlparams{'tag' . $suffix . '.tag'} = $tag;
$suffix = '_' . ++$i;
}


And then the search:

my $it = $schema->resultset('Bookmark')->search(
\%sqlparams, {
join => [ ('tag') x scalar(@tags), 'usr' ],
order_by => \@order,
page => $page,
rows => $maxrows,
},
);


2 comments:

claco said...

Just for the sake of passing on the link, Jay Pipes of MySQL has a nice presentation on tags and folksonomy in MySQL.

http://jpipes.com/presentations/tagging.pdf

Chris said...

Dear Internet, you rock. You're always around to show me someone who already did all the hard work for me. Thanks!