To illustrate this I create NEWBORN table
CREATE TABLE newborn ( birthday DATE , name VARCHAR2(32) )
It contains name and date of the birth of fictional newborns. Let's fill it up.
BEGIN INSERT INTO newborn VALUES (DATE '2012-01-10', 'Andrey'); INSERT INTO newborn VALUES (DATE '2012-01-19', 'Kira'); INSERT INTO newborn VALUES (DATE '2012-03-02', 'Anastasia'); INSERT INTO newborn VALUES (DATE '2012-03-15', 'Ivan'); INSERT INTO newborn VALUES (DATE '2012-03-21', 'Andrey'); INSERT INTO newborn VALUES (DATE '2012-04-01', 'Sergey'); INSERT INTO newborn VALUES (DATE '2012-04-17', 'Roman'); INSERT INTO newborn VALUES (DATE '2012-04-22', 'Sergey'); COMMIT; END;So I want to know what names have been used in each month. Something like this
MONTH LIST_OF_NAMES
----- -------------
April Roman, Sergey, Sergey
January Andrey, Kira
March Anastasia, Andrey, Ivan
Before Oracle 11gR2 it was not trivial task. To implement such functionality we could
- create user-defined aggregate function
- use undocumented function WM_CONCAT
- use some "exotic" ways based on Oracle SQL features
Note
Varchars in Oracle SQL have length restriction - 4000 symbols. So keep in mind that it's also restriction for a result obtained in any way described below.LISTAGG function
The minimal form of LISTAGG isLASTAGG (expr) WITHIN GROUP (order_by_expr)
expr is any expression returns varchar (or converted to varchar implicitly) value. order_by_expr is ORDER BY clause to sort values in returned list.
To implement what I talk about above write the following query
SELECT TO_CHAR(birthday, 'Month') AS MONTH , LISTAGG(name) WITHIN GROUP (ORDER BY name) AS list_of_names FROM newborn GROUP BY TO_CHAR(birthday, 'Month')
The result
MONTH LIST_OF_NAMES
------- -------------
April RomanSergeySergey
January AndreyKira
March AnastasiaAndreyIvan
Obviously some delimiter should be to separate names in the list.
To achieve it just use the second parameter of LISTAGG where define a symbol-delimiter.
SELECT TO_CHAR(birthday, 'Month') AS MONTH , LISTAGG(name, ', ') WITHIN GROUP (ORDER BY name) AS list_of_names FROM newborn GROUP BY TO_CHAR(birthday, 'Month')
The result
MONTH LIST_OF_NAMES
----- -------------
April Roman, Sergey, Sergey
January Andrey, Kira
March Anastasia, Andrey, Ivan
It's not all. Also LISTAGG is analytic function. To use it as analytic one just add OVER operator and describe needed analytic clause. Unfortunately in analytic clause just PARTITION BY can be used.
For example, to show what names were used in the same month for each newborn we can write
SELECT birthday , name , LISTAGG(name, ', ') WITHIN GROUP (ORDER BY name) over (PARTITION BY TO_CHAR(birthday, 'Month')) AS list_of_names FROM newborn ORDER BY birthday
BIRTHDAY NAME LIST_OF_NAMES
-------- ---- -------------
10.01.2012 Andrey Andrey, Kira
19.01.2012 Kira Andrey, Kira
02.03.2012 Anastasia Anastasia, Andrey, Ivan
15.03.2012 Ivan Anastasia, Andrey, Ivan
21.03.2012 Andrey Anastasia, Andrey, Ivan
01.04.2012 Sergey Roman, Sergey, Sergey
17.04.2012 Roman Roman, Sergey, Sergey
22.04.2012 Sergey Roman, Sergey, Sergey
Disadvantage
One thing I regret about is that LISTAGG function does not have an ability to make its result to be unique. Kind of LISTAGG(DISTINCT name)...And a bit about performance... Here LISTAGG's performance is compared with performance of others possible implementations of SUM for strings function. LISTAGG shew itself as "the fastest technique for string aggregation".
User-defined aggregate function
Before Oracle 11R2 the most recommended way to implement SUM for strings is using of user-defined aggregate function.All what needs is create a type which implements some methods used in any aggregate function and a function based on this type used to call from SQL statements.
Any aggregate function performs the following four operations in their work
- initialization (ODCIAggregateInitialize method) - It's static factory method - here an instance of the type is created and returned. Created object is also named as aggregation context. The context is used in all following computations. This step is done just once before all computations.
- iteration (ODCIAggregateIterate method) - Here there are current aggregation context and the value is currently inputed into aggregation function. The context could be updated depending on the value and then is returned. This step is invoked for every not-null value.
- merging (ODCIAggregateMerge method) - Here two aggregation context are "gummed up" and one merged context is returned. It's optional step but it's needed, for example, when calculations are made in parallel mode - results of each process should be merged in one.
- termination (ODCIAggregateTerminate method) - Here any final actions are performed, such as, extra calculations with the context, any cleanups, etc. And the context is returned as completed result of aggregation computation. It's last step and is done just once after all.
So a type with implementation of operations described above should be created.
Then we should declare aggregate function with single input parameter and link it with created type. It means that in the declaration AGGREGATE USING
As example I want to create SUMSTR function.
First, I create T_SUMSTR type and its body
CREATE OR REPLACE TYPE t_sumstr AS object ( str VARCHAR2(4000) , delimiter VARCHAR2(4) , static FUNCTION ODCIAggregateInitialize (ctx IN OUT t_sumstr) RETURN NUMBER , member FUNCTION ODCIAggregateIterate (self IN OUT t_sumstr , newStr IN VARCHAR2) RETURN NUMBER , member FUNCTION ODCIAggregateMerge(self IN OUT t_sumstr , ctx2 IN t_sumstr) RETURN NUMBER , member FUNCTION ODCIAggregateTerminate (self IN t_sumstr , returnStr OUT VARCHAR2 , flag IN NUMBER) RETURN NUMBER ); CREATE OR REPLACE TYPE BODY t_sumstr IS static FUNCTION ODCIAggregateInitialize (ctx IN OUT t_sumstr) RETURN NUMBER IS BEGIN ctx := t_sumstr(NULL, ', '); RETURN ODCIConst.Success; END; member FUNCTION ODCIAggregateIterate (self IN OUT t_sumstr , newStr IN VARCHAR2) RETURN NUMBER IS BEGIN self.str := self.str || newStr || self.delimiter; RETURN ODCIConst.Success; END; member FUNCTION ODCIAggregateMerge(self IN OUT t_sumstr , ctx2 IN t_sumstr) RETURN NUMBER IS BEGIN self.str := self.str || ctx2.str; RETURN ODCIConst.Success; END; member FUNCTION ODCIAggregateTerminate (self IN t_sumstr , returnStr OUT VARCHAR2 , flag IN NUMBER) RETURN NUMBER IS BEGIN returnStr := RTRIM(self.str, self.delimiter); RETURN ODCIConst.Success; END; END;
Some comments concerning T_SUMSTR type. Inside I declare STR and DELIMITER fields. STR field accumulates all values in single string (in ODCIAggregateIterate method). DELIMITER field contains a symbol used as delimiter in result string. I set the delimiter when initialize the aggregate context (ODCIAggregateInitialize method). And I use this predefined delimiter through other methods.
Second, I create SUMSTR function and link it with T_SUMSTR type
CREATE OR REPLACE FUNCTION sumstr (str IN VARCHAR2) RETURN VARCHAR2 deterministic parallel_enable aggregate using t_sumstr;
Now let's check it out. I use STRSUM function instead of LISTAGG from the previous queries
SELECT TO_CHAR(birthday, 'Month') AS MONTH , SUMSTR(name) AS list_of_names FROM newborn GROUP BY TO_CHAR(birthday, 'Month')
The result
MONTH LIST_OF_NAMES
----- -------------
April Sergey, Sergey, Roman
January Andrey, Kira
March Anastasia, Andrey, Ivan
SUMSTR function also supports an ability to make the result list to be unique
SELECT TO_CHAR(birthday, 'Month') AS MONTH , SUMSTR(DISTINCT name) AS list_of_names FROM newborn GROUP BY TO_CHAR(birthday, 'Month')
The result
MONTH LIST_OF_NAMES
----- -------------
April Sergey, Roman
January Andrey, Kira
March Anastasia, Andrey, Ivan
And SUMSTR function can be used as analytic one. Without any restriction in analytic clause (unlike LISTAGG function).
SELECT birthday , name , SUMSTR(name) over (PARTITION BY TO_CHAR(birthday, 'Month') ORDER BY name RANGE BETWEEN unbounded preceding AND unbounded following) AS list_of_names FROM newborn ORDER BY birthday
The result
BIRTHDAY NAME LIST_OF_NAMES
-------- ---- -------------
10.01.2012 Andrey Andrey, Kira
19.01.2012 Kira Andrey, Kira
02.03.2012 Anastasia Anastasia, Andrey, Ivan
15.03.2012 Ivan Anastasia, Andrey, Ivan
21.03.2012 Andrey Anastasia, Andrey, Ivan
01.04.2012 Sergey Roman, Sergey, Sergey
17.04.2012 Roman Roman, Sergey, Sergey
22.04.2012 Sergey Roman, Sergey, Sergey
Disadvantage
The main thing I don't have enough is only one input parameter in aggregate function. Because of it I can't expand abilities of SUMSTR. For example, to pass desired delimiter or sort order.As workaround Tom Kyte advices to use sys_context, search here by 'sys_context'.
WM_CONCAT function
Often to achieve SUM for strings behaviour WMSYS.WM_CONCAT function is advised (before 11gR2 of course). But be careful it's undocumented! It's strongly not recommended to use in your production code. Because it's sintax or functionality might be changed in next Oracle versions. For example, in one of minor releases of 10g the returned type was substituted from VARCHAR2 to CLOB.WM_CONCAT's source code is wrapped, but I suspect it's identical with SUMSTR user-defined function we have created in the previous section. Just as delimiter ',' is used (we used ', ').
Example of using
SELECT TO_CHAR(birthday, 'Month') AS MONTH , WMSYS.WM_CONCAT(name) AS list_of_names FROM newborn GROUP BY TO_CHAR(birthday, 'Month')
The result
MONTH LIST_OF_NAMES
----- -------------
April Sergey, Sergey, Roman
January Andrey, Kira
March Anastasia, Andrey, Ivan
And one more time - I would not use WM_CONCAT in product code, the best way is create your own user-defined aggregate function. Nevertheless, nothing prevents to use WM_CONCAT in single-use queries.
Another Oracle SQL tricks to get SUM for strings
Here I show some ways to implement SUM for strings functionality using plain Oracle SQL. It's more like puzzles for the mind than to the recommendations for use. Just to demonstrate the power of Oracle SQL.I have found all these ideas on different forums and articles and transformed them to use example NEWBORN table.
All queries bellow return the same result (order of strings can be different)
MONTH LIST_OF_NAMES
----- -------------
April Roman, Sergey, Sergey
January Andrey, Kira
March Anastasia, Andrey, Ivan
Using hierarchical queries and rownumber analytic function
SELECT t.MONTH , LTRIM(SYS_CONNECT_BY_PATH(t.name, ', '), ', ') AS list_of_names FROM (SELECT TO_CHAR(birthday, 'Month') AS MONTH , name , ROW_NUMBER() over (PARTITION BY TO_CHAR(birthday, 'Month') ORDER BY name) AS rn FROM newborn) t WHERE connect_by_isleaf = 1 START WITH t.rn = 1 CONNECT BY PRIOR t.rn = (t.rn -1) AND PRIOR t.MONTH = t.MONTH ORDER BY t.MONTH
Using XML related functions
SELECT TO_CHAR(birthday, 'Month') AS MONTH , RTRIM(TO_CHAR(xmlcast(XMLAGG(XMLELEMENT("name", name || ', ')) AS clob)), ', ') AS list_of_names FROM newborn GROUP BY TO_CHAR(birthday, 'Month') ORDER BY MONTH
Using MODEL clause
SELECT t.MONTH , RTRIM(t.names, ', ') AS list_of_names FROM (SELECT * FROM newborn model PARTITION BY (TO_CHAR(birthday, 'Month') AS MONTH) dimension BY (ROW_NUMBER() over (PARTITION BY TO_CHAR(birthday, 'Month') ORDER BY name) AS rn) measures (name AS names) rules (names[ANY] ORDER BY rn DESC = names[cv()] || ', ' || names[cv() + 1]) ) t WHERE t.rn = 1 ORDER BY t.MONTH
No comments:
Post a Comment