Vendredi 31 octobre 2008
At last! I know why I feel unconfortable with simple database abstraction layers like propel. Just imagine you want to manage users. Users may be in one or several groups that grant them with credentials.

We need something like this:


If we do let our database schema as is, we let the complexity of data modelisation up to the application programmer. This often leads to multiple queries and poorly optimized database usage. In facts what we would really be interested in from this schema would be having a view combining every elements of app_user and the associated credentials so in one query we can get everything we want. The problem here is we might have multiple credentials for each user. The solution would be here to collect all the credentials associated with a user and put them in an array in our view.

Collecting data over a query and sending them as a single value is an aggregate, so we are going to declare an aggregate collect_elt that returns an array of all the values of a column and return them in an array.

CREATE AGGREGATE collect_elt(anyelement) (SFUNC = array_append, STYPE = anyarray, INITCOND = '{}');

Now we can safely create our view in an other schema to supersed our app_user table:

CREATE VIEW greg.app_user AS
  SELECT
    a.first_name,
    a.last_name,
    a.email,

    collect_elt(b.credential)
  FROM
    app_user_group
      JOIN app_user a ON a.id = app_user_group.user_id
      JOIN app_group b ON b.name = app_user_group.group_name
  GROUP BY
    a.first_name,
    a.last_name,
    a.email
    ;

So if I am logged as the user greg, a SELECT on the app_user table should show me something like this

 first_name | last_name |       email        |      collect_elt
------------+-----------+--------------------+------------------------
 firstname2 | lastname2 | email2@domain.com  | {administrator,normal}
 firstname1 | lastname1 | email1@domain1.com | {normal}
(2 rows)

With only one query, we are able to retrieve all the relevant informations about our user. Of course, app_group is still available if you want to deal with groups administration. Doing this, we saved lot and lot of extra queries.

As we know now about arrays, it may be intersting to say : let s have several credentials for a group stored in an array in the table group. This presents some advantages :

  • no need for a new complex set of tables to handle a N-M relation between groups and credentials.
  • we let the application admins free to save whatever credentials they want without having a table credentials polluted with 80% of non used records.
           Table "public.app_group"
   Column    |        Type         | Modifiers
-------------+---------------------+-----------
 name        | character varying   | not null
 credentials | character varying[] |

SELECT * FROM app_group ;
   name    |       credentials
-----------+--------------------------
 normal    | {normal_users}
 admin     | {normal_users,admins}
 webmaster | {normal_users,webmasters}
The type VARCHAR[] indicates our credentials are now an array of VARCHAR. What we would like now is an aggregate that collect elements of arrays that are not already in my collected array of credentials. This means, if a user has several times the same credential, we want it to appear once in our view. Such a function does not exist with a fresh install of Postgresql, we need to create it (assuming the language plpgsql is available in this database).

CREATE OR REPLACE FUNCTION array_has_element(array1 anyarray, elt anyelement) RETURNS boolean AS $$
DECLARE
  i integer;
BEGIN
  FOR i IN SELECT * FROM generate_series(1, array_upper(array1, 1)) LOOP
    IF array1[i] = elt THEN
      RETURN true;
    END IF;
  END LOOP;
  RETURN false;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION array_merge(array1 anyarray, array2 anyarray) RETURNS anyarray AS $$
DECLARE
  i integer ;
  return_array array1%TYPE;
BEGIN
  return_array := array1;
  FOR i IN SELECT * FROM generate_series(1, array_upper(array2, 1)) LOOP
    IF NOT array_has_element(return_array, array2[i]) THEN
      return_array := return_array || array2[i];
    END IF;
  END LOOP;

  RETURN return_array;
END;
$$ LANGUAGE plpgsql;

Let's try it :

SELECT array_merge('{a,bb,ccc}'::VARCHAR[], '{bb, ccc, dddd}'::VARCHAR[]);
   array_merge
-----------------
 {a,bb,ccc,dddd}
(1 row)
That looks good. Let's now create an aggregate using our new function :
CREATE AGGREGATE merge_arrays (anyarray) (
  SFUNC = array_merge,
  STYPE = anyarray
);
This aggregate is even simplier than the previous one ! ... Here is now the view of the corresponding app_user :

SELECT * FROM app_user;
 first_name | last_name |       email        |     merge_arrays
------------+-----------+--------------------+-----------------------
 firstname2 | lastname2 | email2@domain.com  | {normal_users,admins}
 firstname1 | lastname1 | email1@domain1.com | {normal_users}
(2 rows)


Did I tell you I love Postgresql ? ah ... maybe yes.

This comforts me with the idea we can have 2 layers in the database:
  • one «physical» layer with the tables to handle all the data
  • one «logical» layer to present to data depending on the application
In general, we work with only the first layer so our applications' models are complicated and is merely about combining data from different tables using numerous queries. There must be a way to implement a second layer with some logic of data presentation in order to make the developpers life more easy and save database ressources.
Par greg - Publié dans : postgresql
Ecrire un commentaire - Voir les commentaires - Recommander
Retour à l'accueil
 
Créer un blog sur over-blog.com - Contact - C.G.U. - Rémunération en droits d'auteur - Signaler un abus - Articles les plus commentés