At last! I know why I feel unconfortable with simple database abstraction layers like propel. Just imagine you want to manage users. Users may be in one or several groups that grant them with
credentials.
We need something like this:
If we do let our database schema as is, we let the complexity of data modelisation up to the application programmer. This often leads to multiple queries and poorly optimized database usage. In
facts what we would really be interested in from this schema would be having a view combining every elements of app_user and the associated credentials so in one query we can get
everything we want. The problem here is we might have multiple credentials for each user. The solution would be here to collect all the credentials associated with a user and put them in an
array in our view.
Collecting data over a query and sending them as a single value is an aggregate, so we are going to declare an aggregate collect_elt that returns an array of all the values of a column
and return them in an array.
CREATE AGGREGATE collect_elt(anyelement) (SFUNC = array_append, STYPE = anyarray, INITCOND = '{}');
Now we can safely create our view in an other schema to supersed our app_user table:
CREATE VIEW greg.app_user AS
SELECT
a.first_name,
a.last_name,
a.email,
collect_elt(b.credential)
FROM
app_user_group
JOIN app_user a ON a.id = app_user_group.user_id
JOIN app_group b ON b.name = app_user_group.group_name
GROUP BY
a.first_name,
a.last_name,
a.email
;
So if I am logged as the user greg, a SELECT on the app_user table should show me something like this
first_name | last_name | email | collect_elt
------------+-----------+--------------------+------------------------
firstname2 | lastname2 | email2@domain.com | {administrator,normal}
firstname1 | lastname1 | email1@domain1.com | {normal}
(2 rows)
With only one query, we are able to retrieve all the relevant informations about our user. Of course, app_group is still available if you want to deal with groups administration. Doing this, we
saved lot and lot of extra queries.
As we know now about arrays, it may be intersting to say : let s have several credentials for a group stored in an array in the table group. This presents some advantages :
- no need for a new complex set of tables to handle a N-M relation between groups and credentials.
- we let the application admins free to save whatever credentials they want without having a table credentials polluted with 80% of non used records.
Table "public.app_group"
Column | Type | Modifiers
-------------+---------------------+-----------
name | character varying | not null
credentials | character varying[] |
SELECT * FROM app_group ;
name | credentials
-----------+--------------------------
normal | {normal_users}
admin | {normal_users,admins}
webmaster | {normal_users,webmasters}
The type VARCHAR[] indicates our credentials are now an array of VARCHAR. What we would like now is an aggregate that collect elements of arrays that are not already in my collected array
of credentials. This means, if a user has several times the same credential, we want it to appear once in our view. Such a function does not exist with a fresh install of Postgresql, we need to
create it (assuming the language plpgsql is available in this database).
CREATE OR REPLACE FUNCTION array_has_element(array1 anyarray, elt anyelement) RETURNS boolean AS $$
DECLARE
i integer;
BEGIN
FOR i IN SELECT * FROM generate_series(1, array_upper(array1, 1)) LOOP
IF array1[i] = elt THEN
RETURN true;
END IF;
END LOOP;
RETURN false;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION array_merge(array1 anyarray, array2 anyarray) RETURNS anyarray AS $$
DECLARE
i integer ;
return_array array1%TYPE;
BEGIN
return_array := array1;
FOR i IN SELECT * FROM generate_series(1, array_upper(array2, 1)) LOOP
IF NOT array_has_element(return_array, array2[i]) THEN
return_array := return_array || array2[i];
END IF;
END LOOP;
RETURN return_array;
END;
$$ LANGUAGE plpgsql;
Let's try it :
SELECT array_merge('{a,bb,ccc}'::VARCHAR[], '{bb, ccc, dddd}'::VARCHAR[]);
array_merge
-----------------
{a,bb,ccc,dddd}
(1 row)
That looks good. Let's now create an aggregate using our new function :
CREATE AGGREGATE merge_arrays (anyarray) (
SFUNC = array_merge,
STYPE = anyarray
);
This aggregate is even simplier than the previous one ! ... Here is now the view of the corresponding
app_user :
SELECT * FROM app_user;
first_name | last_name | email | merge_arrays
------------+-----------+--------------------+-----------------------
firstname2 | lastname2 | email2@domain.com | {normal_users,admins}
firstname1 | lastname1 | email1@domain1.com | {normal_users}
(2 rows)
Did I tell you I love Postgresql ? ah ... maybe yes.
This comforts me with the idea we can have 2 layers in the database:
- one «physical» layer with the tables to handle all the data
- one «logical» layer to present to data depending on the application
In general, we work with only the first layer so our applications' models are complicated and is merely about combining data from different tables using numerous queries. There must be a way
to implement a second layer with some logic of data presentation in order to make the developpers life more easy and save database ressources.