Wednesday, September 12, 2012

Using DBMS_PARALLEL_EXECUTE package

In cases when you need to make any changes in data of huge table routines of DBMS_PARALLEL_EXECUTE package can help you to increase performance of your update statement.

The package was introduced in Oracle 11g.

In this article won't be illuminated all features of DBMS_PARALLEL_EXECUTE package, just basic moments. You can find more information in Oracle documentation.

The idea is simple - make changes by pieces, independently, in parallel mode.

The functionality allows you "cut" data of an updated table on fragments (Oracle calls them chunks) and apply your update statement to every fragment.

Oracle can "cut" table's data using:
  • rowid
  • value of any number column of the table
  • custom condition
In any case a chunk must have borders - start and end values. These values are calculated by Oracle in predefined way (except using custom condition).

Then start and end values will be used in your update stetment.

Chunks form queue and are processed one by one. Depends on parameter parallel_level of procedure which runs this process chunks are served in batches parallely. For example, we have 10 chunks and parallel_level is equal to 4. First four chunks start to be executed, other six are waiting for. When anyone of executing chunks has finished, a chunk from the queue is taken and executed. But in the same time only four chunks are run.

Note

You must have CREATE JOB privilege to run chunks in parallel mode.

To illustrate how to use the DBMS_PARALLEL_EXECUTE package I created and filled TEST_TABLE table
CREATE TABLE test_table AS
SELECT LEVEL AS id
     , 'Value ' || TO_CHAR(LEVEL) AS name
     , ROUND(DBMS_RANDOM.VALUE(1, 10)) AS RANK
  FROM dual
CONNECT BY LEVEL <= 1000000;

I want to update data in NAME column for all records using the following statement
UPDATE test_table
   SET name = TO_CHAR(RANK) || ' ' || name;

Using DBMS_PARALLEL_EXECUTE package you should do the next steps
  • create task
  • generate chunks in way you wish
  • run task
  • drop task
Task defines one action you want perform on table's data. You have a statement and chunks of table the statement should be executed for. And task just links all this together. Task has name that identifies task in package's routines.

To monitor tasks and their statuses use USER_PARALLEL_EXECUTE_TASKS view.

To create and drop task use CREATE_TASK and DROP_TASK respectively. These procedures take task's name as first parameter (task_name).

To run task use DBMS_PARALLEL_EXECUTE.RUN_TASK. As the first parameter task's name must be set.

The second parameter (sql_stmt) takes the SQL statement you want to perform on table's data. The statement must be modified - it must have two placeholders :start_id and :end_id. These placeholders will be replaced by values of begin and end of chunk's range. Below we will see how the update statement will be transformed depend on the way of chunking table.

The third parameter (language_flag) says to Oracle how to handle the statement. Use DBMS_SQL.NATIVE value.

Although all others parameters are optional we will use one more parameter - parallel_level. If it's set to 0 then all chunks will be executed one by one (no parallelism). If it's set to NULL then the default parallelism will be used. To achive that chunks will be executed in parallel mode set this parameter in some positive integer. And so much chunks as was set in parallel_level parameter will be processed in the same time.

Note

Keeep in mind Oracle performs COMMIT after finishing every chunk's process.

Now let's look closely how we can chuck table's data.

For each way I will provide anonymous PL/SQL block where a task will be created, table's data be chunked, task be run and then dropped.

All package's procedures for chunking table's data require task's name as first parameter (task_name).

To see created chunks, their borders and statuses use USER_PARALLEL_EXECUTE_CHUNKS view.

Chunking by ROWID

To chunk table by ROWID use CREATE_CHUNKS_BY_ROWID procedure.

Except task's name you should set chunking table's owner (table_owner) and name (table_name). Than you should define size of chunk (chunk_size) and in what way the size is measured (by_rows). If by_rows parameter is set to true then chunk_size is measured in table's row, else in table's block.

Note

It seems that Oracle uses statistic by table to determine count of rows, not doing select with count by table. So if you want to have more precise chunking you should gather statistic by table before.

I should modify my original update statement - add extra restriction by rowid
UPDATE test_table
   SET name = TO_CHAR(RANK) || ' ' || name
 WHERE ROWID BETWEEN :start_id AND :end_id

And the block that performs all steps to make our example update
DECLARE
  c_task_name  CONSTANT VARCHAR2(128) := 'TEST TASK. BY ROWID';
BEGIN
  dbms_parallel_execute.create_task(c_task_name);
 
 
  dbms_parallel_execute.create_chunks_by_rowid (task_name   => c_task_name
                                              , table_owner => USER
                                              , table_name  => 'TEST_TABLE'
                                              , by_row      => TRUE
                                              , chunk_size  => 50000);
 
  dbms_parallel_execute.run_task (task_name      => c_task_name
                                , sql_stmt       => q'$ update test_table
                                                           set name = to_char(rank) || ' ' || name
                                                         where rowid between :start_id and :end_id $'
                                , language_flag  => DBMS_SQL.native
                                , parallel_level => 5);
 
  dbms_parallel_execute.drop_task(c_task_name);
END;

Chunking by number column

To chunk table by number column use CREATE_CHUNKS_BY_NUMBER_COL procedure.

As in the previous case you should set task's name, chunking table's owner (table_owner) and name (table_name). Then you say what column should by use to chunk data. The column must have NUMBER datatype. In my example I chunk table's data by values of ID column. And the parameter is size of chunk (chunk_size) is measured by table's rows.

Oracle calculates min and max values of specified column and cuts data in the following way:
  • for first chunk start value is equals to min value of the column, end value is equals to min value plus chunk_size value
  • for second chunk start value is equals to end value of the first chunk, end value is equals to end value of the first chunk plus chunk_size value 
  • and so on untill end value of current chunk reaches max value of the column.
I should modify my original update statement - add extra restriction by ID column
UPDATE test_table
   SET name = TO_CHAR(RANK) || ' ' || name
 WHERE id BETWEEN :start_id AND :end_id

The block that performs all steps to make our example update
DECLARE
  c_task_name  CONSTANT VARCHAR2(128) := 'TEST TASK. BY NUMBER COLUMN';
BEGIN
 
  dbms_parallel_execute.create_task(c_task_name);
 
  dbms_parallel_execute.create_chunks_by_number_col (task_name    => c_task_name
                                                   , table_owner  => USER
                                                   , table_name   => 'TEST_TABLE'
                                                   , table_column => 'ID'
                                                   , chunk_size   => 50000);
 
  dbms_parallel_execute.run_task (task_name      => c_task_name
                                , sql_stmt       => q'$ update test_table
                                                           set name = to_char(rank) || ' ' || name
                                                         where id between :start_id and :end_id $'
                                , language_flag  => DBMS_SQL.native
                                , parallel_level => 5);
 
  dbms_parallel_execute.drop_task(c_task_name);
END;

Chuncking by custom SQL


To chunk table by number column use CREATE_CHUNKS_BY_SQL procedure.

It's the most flexible way to chunk table's data. The idea is to determine start and end values of every chunk using custom select statement. Obviously that select must return two columns - the first with start values and the second with end values. Count of created chunks will be equals to count of rows returned by the SQL statement. Values returned by the SQL statement must have ROWID or NUMBER datatype. And finally you should modify source update statement where you define a column for what generated start and end values should be applied.

We must pass to the procedure task's name, SQL statement for forming chunks (sql_stmt) and boolean flag (by_rowid) to indicate a type of start and end values of a chunk (true - ROWID, false - NUMBER).

Returning to my example call in memory that table test_table has RANK column. It can contain value from 1 to 10. I use this column for chunking with the help of the following select statement
SELECT DISTINCT
       RANK
     , RANK
  FROM test_table

It may return from 1 to 10 rows so I may have from 1 to 10 chunks. Start and end values of chunks will be equals to 1, 2, 3, etc.

I should modify my original update statement - add extra restriction by RANK column
UPDATE test_table
   SET name = TO_CHAR(RANK) || ' ' || name
 WHERE RANK BETWEEN :start_id AND :end_id

The block that performs all steps to make our example update
DECLARE
  c_task_name  CONSTANT VARCHAR2(128) := 'TEST TASK. BY CUSTOM SQL';
BEGIN
 
  dbms_parallel_execute.create_task(c_task_name);
 
  dbms_parallel_execute.create_chunks_by_sql (task_name => c_task_name
                                            , sql_stmt  => q'$ select distinct
                                                                      rank
                                                                    , rank
                                                                 from test_table $'
                                            , by_rowid  => FALSE);
 
  dbms_parallel_execute.run_task (task_name      => c_task_name
                                , sql_stmt       => q'$ update test_table
                                                           set name = to_char(rank) || ' ' || name
                                                         where rank between :start_id and :end_id $'
                                , language_flag  => DBMS_SQL.native
                                , parallel_level => 5);
 
  dbms_parallel_execute.drop_task(c_task_name);
END;

And about performance. On my environment I have got the following time results (for each case I recreated TEST_TABLE table)
  • without using DBMS_PARALLEL_EXECUTE package - 30.88 sec
  • using chunking by ROWID - 27.49 sec
  • using chunking by number column - 28.69 sec
  • using chunking by custom SQL - 35.45 sec

Code snippets have been tested on Oracle Database 11g Release 11.2.0.1.0

1 comment: