Sunday, January 20, 2013

Simulation of outer joins in Jasper Analysis

JasperReports Server (Pentaho Mondrian behind the scenes) doesn't use outer joins when parses schema file to build analysis view. Today, at least.

Here I am describing the one of possible ways to change JasperServer behaviour as though it can handles outer joins. We don't have to make changes in any configuration, the solution just encloses manipulations with SQLs have used in schema definition.

As example I will use MySQL World database (available to download here).

World database has very simple structure, just look on World database's entity-relationship diagram
ERD of World database

Note

I have changed its initial structure a bit to have possibility change some data, viz. I set "not-null" attribute of City.CountryCode column to true.

I want to analyse how many urbanites live in each country and how they are apportioned by cities. Countries without any urbanite should be omitted.

So it should be analysis view like this

Returning to World database ERD it's obviously that I will use City table as fact table, Country table as a dimension and City.Population column as a measure.

To build such view I created the following schema
<?xml version="1.0" encoding = "UTF-8"?>
 
<Schema name="world_schema">
  <Cube name="World">
    <Table name="City" />
 
    <Dimension name="Country" foreignKey="CountryCode" >
      <Hierarchy hasAll="true" allMemberCaption="Countries" primaryKey="Code" >
        <Table name="Country" />   
        <Level name="Country" column="Code" nameColumn="Name" type="String" />
      </Hierarchy>
    </Dimension>
 
    <Dimension name="City" >
      <Hierarchy hasAll="true" allMemberCaption="Cities" >
 <Level name="City" table="City" column="ID" nameColumn="Name" ordinalColumn="Name" type="String" />
      </Hierarchy>
    </Dimension>
 
    <Measure name="Urbanite" column="population" aggregator="sum" formatString="#,##0" />
 
  </Cube> 
</Schema>

and the following MDX query
SELECT {[Measures].DefaultMember} ON COLUMNS,
       NON EMPTY([Country].DefaultMember, [City].DefaultMember) ON ROWS
  FROM [World]

Note

I assume you can create JDBC Data Source, Schema File, Mondrian Connection and OLAP View in JasperServer to observe how view's result will be changing.

Well, now I have working World view.

Let's take a look on Angola country. It has five cities. Than I am excluding one of them, for example Lobito, from Angola.

It means I set CountryCode to NULL for Lobito city
UPDATE city 
   SET CountryCode = NULL 
 WHERE id = 58

If I reopen the view I will see that Lobito city disappeared.

Note

I disabled OLAP cache so I see data changing in real time. To see the last data change you might have to flush cache if it is running.

If I design a SQL SELECT to see "lost" cities among others I need outer joins. In general case I need outer joins when some rows in fact table don't contain the reference to dimension table.

Because I don't have any element or attribute to define them in schema file I afford to replace data "on the fly".

To include rows with NULL values into view's result first I should do - replace these values by the concrete value. But I have to be sure that this dummy value will never come up in real data.

So, I changed the definition of fact table in the cube. Now I use element of schema's definition in such way
<Cube name="World">
  <View alias="City">
    <SQL dialect="mysql">
      <![CDATA[
select ID 
     , Name 
     , coalesce(countrycode, '###') as CountryCode 
     , Population 
  from City 
      ]]>
    </SQL>
  </View> 
  ...
 
Here I replaced NULLs in CountryCode column by '###' value.

Note

Be careful if you change view's alias you will have to change it in "table" attribute of City level.

And second step, per contra, I should add dummy row into dimension table (using UNION ALL clause). As identifier of this row I should use the same value I have chosen for NULLs replacement in fact table.

Using element again, I changed dimension's definition
  ...
  <Dimension name="Country" foreignKey="CountryCode" >
    <Hierarchy hasAll="true" allMemberCaption="Countries" primaryKey="Code" >
      <View alias="Country">
        <SQL dialect="mysql">
   <![CDATA[
select Code 
     , Name 
  from Country  
 union all 
select '###' as Code 
     , 'Lost Cities' as Name 
  from dual 
          ]]>
        </SQL>
      </View>    
      <Level name="Country" column="Code" nameColumn="Name" type="String" />
    </Hierarchy>
  </Dimension>
  ...

Now I should reload modified schema file, flush OLAP cache, run the view and make sure of "lost" Lobito city appeares under "Lost Cities" country.


All I have shown above concerns schema organized as star.

If you need outer joins building a dimension in snowflake-like schema you should use View element to create SQL statement with outer joins to obtain all dimension's levels. Instead of the way with Join element described in Mondrian documentation.

All have been tested on JasperServer 4.7.0, MySQL 5.5

No comments:

Post a Comment