Statspack, Orca and Oracle Database Tuning Statistics

Historically, statistics for Oracle databases tended to be taken since instance start-up. This was reasonable in the days where databases were shut down regularly for maintenance. However, with hot backups and 24x7 operations, an average figure over the life-time of the instance doesn't say very much about what it has been up to lately. Oracle's introduction of statspack with 8.1.6 recognised this. Statspack is capable of collecting and displaying a great deal of useful and detailed information. There is a cost in space within the database, though, whilst getting an overview from a pile of statspack text files is also difficult.

Orca graphing can help to provide that overview. It is not a substitute for understanding the statspack reports, but can help with deciding which ones to read. As orca uses rrds to hold it's data, it can retain headline figures in the long term without consuming an excessive amount of space. This helps in understanding long term trends.

The statistics graphed here are almost all extracted from the statspack tables and derived in the same manner as the data in a text based statspack snapshot.

orca_oracle

For the orca_oracle "collection agent" and the cfg files for orca, drop me a line at orca at hoopoes dot com. Alternatively, Hoopoes Limited can provide support on a commercial basis.

The change log for orca_oracle is as follows:

CHANGES                                            DATE
Initial Version                                    August    1999
Add in support for changing number of data columns October   1999
First attempt to rewrite to use perfstat data      September 2003
Implementation of fully perfstat-based collection  May       2004
                             -- effectively v2.0
Correct free space accounting for temp tspaces     November  2004
Get stats for current instance correctly in RAC
  also, output shared sql area and free memory     January   2005
Output raw value for reloads per sec               February  2005
Correct bug in gathering instance start time       May       2005
Upgrade to Oracle 10, and assume RAC               April     2006
                             -- orca_oracle for 9i becomes "maintenance only"
Fix bug in snapshot selection for instance > 1     April     2006
Add option to adjust timezone of snapshots         May       200

Note on Pseudo Code

As these statistics tend to be derived from statspack, most of them are dependent on the changes in certain counters over the period. To indicate this, there is the pesudocode entry
valdiff( table, name )
valdiff = value at end snap in table where name equals given endsnap
- value in table at begin snap where name equals given endsnap

Contents

Ratios

The first section is ratios, but no ratio is meaningful of itself. As Oracle say "Database tuning never must be driven by hit ratios. They only provide additional information to understand how the instance is operating." (MetaLink note 228913.1)

A step towards monitoring actual values is taken with the SQL Area Size graph, which shows actual values. Otherwise, the most relevant instance parameters are listed with each parameter.

Counters

Over the Interval

The first few values are totals for the monitoring interval.

Per Second and Per Transaction

The remaining counters give average values per second and per transaction. This will tend to flatten the smaller value if your site has a very different profile between actions per second and actions per transaction.

Tablespace Information

Buffer Cache Hit Ratio

100*( 1 -  ( valdiff( stats$sysstat, 'physical reads' )
- valdiff( stats$sysstat, 'physical reads direct' )
- valdiff( stats$sysstat, 'physical reads direct (lob)' )
) / valdiff( stats$sysstat, 'session logical reads' )
);
The buffer cache hit ratio is a measure of the proportion of requests for data which is satisfied by data already in the buffer cache. Higher ratios are better as access to data in memory is speedier than an IO operation to disk. There comes a point of diminishing returns when increasing the size of the database buffer. Also, remember that this is part of the SGA and it may be more important to use additional memory for other parts of the SGA. It is vital that the whole SGA fits within main memory, as paging of the SGA is disastrous for performance.

Optimum High
init.ora parameter DB_BLOCK_BUFFERS

Sorts in Memory

 100*( valdiff( stats$sysstat, 'sorts (memory)' )
/ ( valdiff( stats$sysstat, 'sorts (memory)' )
+ valdiff( stats$sysstat, 'sorts (disk)' )
)
)
This is a measure of the proportion of data sorts which occur within memory rather than on disk. Sorts on disk make use of the user's tempoary table space. The maximum size of sort which will occur in memory is defined by the sort area size, which is the size within the PGA which will be used. Each Oracle process which sorts will allocate this much memory, though it may not need all of it. Use of memory for this purpose reduces that available to the SGA.

Optimum High
init.ora parameter SORT_AREA_SIZE

SQL Area Size

This graph stacks two values

SQL Area

select bytes from stats$sgastat
where snap_id = endsnap
and pool = 'shared pool'
and name = 'sql area'

Shared Pool Free

select bytes from stats$sgastat
where snap_id = endsnap
and pool = 'shared pool'
and name = 'free memory'

The current amount of space occupied by the SQL Area can vary dramatically, dependent on current usage levels. However, the sum of used SQL Area and free memory is likely to be less volatile. This sum needs to be sufficient for all currently required parsed SQL. Note that this graph cannot indicate whether space is fragmented.

Shared Pool

This graph includes two figures

Shared Pool Free

100*
select sum(bytes) from stats$sgastat
where snap_id = endsnap
and pool in ('shared pool','all pools')
and name = 'free memory'
/
select sum(bytes) from stats$sgastat
where snap_id = endsnap
and pool in ('shared pool','all pools')
The percentage of the shared pool not currently in use. If a large proportion of the shared pool is always free, it is likely that the size of the shared pool can be reduced. Low free values are not a cause for concern unless other factors also indicate problems, e.g. large values for reloads .

Optimum Small but non-zero
init.ora parameter SHARED_POOL_SIZE

Shared Pool Reloads

100*( valdiff( stats$librarycache, reloads)
/ valdiff( stats$librarycache, pins)
)
The proportion of attempts to execute code where the code was in memory but isn't any more. Shared pool reloads occur when Oracle has to implicitly reparse SQL or PL/SQL at the point when it attempts to execute it. A larger shared pool wil reduce the number of times that code needs to be reloaded. Also, ensuring that similar pieces of SQL are written identically will increase sharing of code.

See also Reloads per Second.

To take advantage of additional memory available for shared SQL areas, you may also need to increase the number of cursors permitted for a session. You can do this by increasing the value of the initialization parameter OPEN_CURSORS.

Optimum Low
init.ora parameter SHARED_POOL_SIZE
init.ora parameter OPEN_CURSORS

Reloads per Second

Library Cache Hits

This graph is closely related to shared pool reloads . Improving these figures is carried out in the same manner as discussed in that section. The graph includes two figures

Library Cache Get Hit Ratio

100*( valdiff( stats$librarycache, gethits)
/ valdiff( stats$librarycache, gets)
)
The proportion of requests for a lock on an object which were satisfied by finding that object's handle already in memory.

Optimum High
init.ora parameter SHARED_POOL_SIZE
init.ora parameter OPEN_CURSORS

Library Cache Pin Hit Ratio

100*( valdiff( stats$librarycache, pinhits)
/ valdiff( stats$librarycache, pins)
)
The proportion of attempts to pin an object which were satisfied by finding all the pieces of that object already in memory.

Optimum High
init.ora parameter SHARED_POOL_SIZE
init.ora parameter OPEN_CURSORS

Wait Percentages

Buffer Nowait Percentage

100*( 1 - ( valdiff( stats$waitstat, wait_count )
/ valdiff( stats$sysstat, 'session logical reads' )
)
)
The buffer no wait percentage is an indication of the proportion of requests for a buffer block which immediately succeeded. Buffer waits indicate a problem with the size of the buffer cache (though check the Buffer Cache hit Ratio) or with the database writer(s) becoming overloaded.

Optimum High
init.ora parameter DBWR_IO_SLAVES
init.ora parameter DB_WRITER_PROCESSES

Redo Nowait Percentage

100*( 1 - ( valdiff( stats$sysstat, 'redo log space requests' )
/ valdiff( stats$sysstat, 'redo entries' )
)
)
This figure gives an indication of the level of contention occuring for the redo buffer, which holds information about changes made to the database. A redo space wait is when there is insufficient space in the redo buffer for a transaction to write redo information. It is an indication that the redo buffer is too small given the rate of transactions occurring in relation to the rate at which the log writer is writing data to the redo logs, or that there are problems with checkpointing or archiving.

Optimum High
init.ora parameter LOG_BUFFER
init.ora parameter LOG_PARALLELISM

Non-Parse CPU

100*( 1 - ( valdiff( stats$sysstat, 'parse time cpu' )
/ valdiff( stats$sysstat, 'CPU used by this session' )
)
)
All of the values in the Parse graph may be insignificant if parsing is not using a great deal of CPU time. Essentially, minimising the parsing as per that section will increase the percentage of CPU time available for other activities.

Parsing

The parsing graph has three elements

Soft Parse

100*(1 - ( valdiff( stats$sysstat, 'parse count (hard)' )
/ valdiff( stats$sysstat, 'parse count (total)' )
)
)
This is the proportion of parsing in the database which qualifies as a soft parse. A hard parse occurs when a SQL statement is executed which is unknown to the shared pool, requiring memory allocation, semantic checking, etc. In comparison, a soft parse occurs when the database can match a SQL statement to one already present in the shared pool, and so is a much lighter process.

Low values indicate either that poor use is being made of shared SQL or that SQL is being aged out of the shared pool before it is re-used. The latter may indicate a problem with the size of the Shared Pool.

Optimum High
init.ora parameter SESSION_CACHED_CURSORS

Execute to Parse

100*( 1 - ( valdiff( stats$sysstat, 'parse count (total)' )
/ valdiff( stats$sysstat, 'execute count' )
)
)
This is an indication of how many times a piece of SQL is executed for each time it is parsed. A repeated execution without reparsing is achieved by holding an open cursor on the statement.

Optimum High
init.ora parameter OPEN_CURSORS
init.ora parameter SESSION_CACHED_CURSORS

Parse CPU to Parse Elapsed

100*( valdiff( stats$sysstat, 'parse time cpu' )
/ valdiff( stats$sysstat, 'parse time elapsed' )
)
This figure indicates the amount of CPU time taken by parsing compared with the actual elapsed time. It is, therefore, an indicator of whether CPU is available for parsing.

Latches

Latches are simple, low-level serialization mechanisms to protect shared data structures in the SGA. When attempting to get a latch a process may be willing to wait, hence this graph includes two figures.

Willing to Wait Latch Gets

100*(1 - ( valdiff( stats$latch, misses )
/ valdiff( stats$latch, gets )
)
)
An attempt by a process to obtain a latch which is willing to wait will sleep and retry until it obtains the latch.

Optimum High

Immediate Latch Gets

100*(1 - ( valdiff( stats$latch, immediate_misses )
/ valdiff( stats$latch, immediate_gets )
)
)
An attempt to obtain a latch which a process is not allowed to wait for will timeout.

Optimum High

Rollback Segment Contention

100*( valdiff( stats$rollstat, waits )
/ valdiff( stats$rollstat, gets )
)
This figure is an indication of whether a process had to wait to get access to a rollback segment. To improve figures, increase the number of rollback segments available.

Optimum Low

Total Transactions over interval

valdiff( stats$sysstat, 'user rollbacks' )
+
valdiff( stats$sysstat, 'user commits' )

Total Sessions over interval

valdiff( stats$sysstat, 'user rollbacks' )
+
valdiff( stats$sysstat, 'user commits' )

Transactions per Second

( valdiff( stats$sysstat, 'user rollbacks' )
+
valdiff( stats$sysstat, 'user commits' )
)
/
interval seconds

Average Open Cursors

stats$syssstat 'opened cursors current'
/ stats$syssstat 'logons current'
Note that this value is the number for the end snapshot time - not an average over the interval!

Rollbacks and Commits

This graph separates out the transaction total into its constituent parts

Rollbacks

valdiff( stats$sysstat, 'user rollbacks' )

Commits

valdiff( stats$sysstat, 'user commits' )
/ trancount

valdiff( stats$sysstat, 'user commits' )
/ interval seconds

Redo Size

valdiff( stats$sysstat, 'redo size' )
/ trancount

valdiff( stats$sysstat, 'redo size' )
/ interval seconds

Logical Reads

valdiff( stats$sysstat, 'session logical reads' )
/ trancount

valdiff( stats$sysstat, 'session logical reads' )
/ interval seconds

Block Changes

valdiff( stats$sysstat, 'db block changes' )
/ trancount

valdiff( stats$sysstat, 'db block changes' )
/ interval seconds

Physical Reads

valdiff( stats$sysstat, 'physical reads' ) * blocksize / 1024
/ trancount

valdiff( stats$sysstat, 'physical reads' ) * blocksize / 1024
/ interval seconds
This measures the amount of kilobytes read

Physical Writes

valdiff( stats$sysstat, 'physical writes' ) * blocksize / 1024
/ trancount

valdiff( stats$sysstat, 'physical writes' ) * blocksize / 1024
/ interval seconds
This measures the amount of kilobytes written

User Calls

valdiff( stats$sysstat, 'user calls' )
/ trancount

valdiff( stats$sysstat, 'user calls' )
/ interval seconds

Parse Count

valdiff( stats$sysstat, 'parse count (total)' )
/ trancount

valdiff( stats$sysstat, 'parse count (total)' )
/ interval seconds

Hard Parse Count

valdiff( stats$sysstat, 'parse count (hard)' )
/ trancount

valdiff( stats$sysstat, 'parse count (hard)' )
/ interval seconds

Sort Count

( 
valdiff( stats$sysstat, 'sorts (memory)' )
+
valdiff( stats$sysstat, 'sorts (disk)' )
)
/ trancount

(
valdiff( stats$sysstat, 'sorts (memory)' )
+
valdiff( stats$sysstat, 'sorts (disk)' )
)
/ interval seconds

Logons Count

valdiff( stats$sysstat, 'logons cumulative' )
/ trancount

valdiff( stats$sysstat, 'logons cumulative' )
/ interval seconds

Executes

valdiff( stats$sysstat, 'execute count' )
/ trancount

valdiff( stats$sysstat, 'execute count' )
/ interval seconds

Reloads per Second

valdiff( stats$librarycache, sum(reloads) )
/ trancount

valdiff( stats$librarycache, sum(reloads) )
/ interval seconds

Database Size

Data Size

select sum(bytes) from sys.dba_data_files
+
select sum(bytes) from sys.dba_temp_files
-
select sum(bytes) from sys.dba_free_space
The Data Size is the total amount of space allocated to database segments including data, indexes, rollback and temporary.

Free Space Size

select sum(bytes) from sys.dba_free_space
This is all the space in the database which has not been allocated to objects.

These add up to the total "external" size of the database i.e. the disk space allocated to Oracle datafiles and tempfiles.

Max & Mean Tablespace Percent Full

This graph uses the percent full values for each tablespace, so a large tablespace and a small one are evaluated equally.

Average Percent Full

This figure simply divides the sum of percent full by the number of tablespaces.

Maximum Percent Full

This picks out the fullest tablespace - excluding those with TEMP, ROLL or RBS in their names.

Further Information

There are many sources of further information on all the subjects included here.

Online

A valuable source is Oracle's Technology Network, particularly the Documentation on Concepts and Tuning.

Other sources include:

Naturally, all these sites also have links.

Offline

A large number of books have been written on the subject. Two publishers worth checking out are O'Reilly and Oracle Press.