Thursday, May 03, 2007

Improve SQL performance

1. Know your data and business application well.
Identical information can often be retrieved from different business data sources. Familiarize yourself with these sources; you must be aware of the data volume and distribution in your database. Your should also have a thorough understanding of your data model--such as the relationships among business entities--before writing your SQL. This understanding will help you write much better queries for retrieving information from multiple tables. CASE tools such as Designer/2000 do a very good job of documenting relations among different business and database objects.

2. Test your queries with realistic data.

Most organizations have three database environments: development, test, and production. Programmers use the development database environment to create and test applications, which are then more rigorously examined in the test environment by programmers and users before they are migrated to the production environment.
When a SQL is being tested in the test environment, make sure the test database contains data that reflects the production database. A SQL statement tested with unrealistic data may behave differently when used in production. To ensure rigorous testing, the data distribution in the test environment must also closely resemble that in the production environment.

3. Write identical SQL statements in your applications.
Take full advantage of bind variables, stored procedures, and packages wherever possible. The benefits of identical SQL statements include reduced memory use on the database server and faster execution, as parsing is unnecessary. For example, these statements are not identical:

select * from employee where empid = 10;
SELECT * FROM EMPLOYEE WHERE EMPID = 10;
select * from employee where empid = 20;
but when using a bind variable named i_empid, the statement would be:
select * from employee where empid = :i_empid;
4. Use indexes on the tables carefully.
Be sure to create all the necessary indexes on the tables. However, too many of them can degrade performance. So how do you choose which columns to index?
* Make indexes on columns that are used frequently in the WHERE clause of the application SQL or queries used by end users.
* Index those columns that are used frequently to join tables in SQL statements.
* Use only index columns that have a small percentage of rows with the same value.
* Do not make indexes on those columns that are used only with functions and operators in the WHERE clause of queries.
* Do not index columns that are frequently modified or when the efficiency gained due to index creation results in more efficiency loss due to inserts, updates, and deletes. These operations will slow down due to the need to maintain indexes.
* Unique indexes are better than nonunique indexes because of better selectivity. Use unique indexes on primary key columns and nonunique indexes on foreign keys and columns frequently used in the WHERE clause.
* Create the index so that the columns used in the WHERE clause make up a leading portion of the index.

5. Make an indexed path available.
To take advantage of indexes, write your SQL in such a manner that an indexed path is available to it. The optimizer cannot use an access path that uses an index simply because the index exists; the access path must be made available by the SQL. Using SQL hints is one of the ways to ensure the index is used. See the following tips to select a particular access path.

6. Use Explain Plan and TKPROF where possible.
If your SQL statements are not well-tuned, they may be inefficient even if the Oracle database itself is "well-oiled." Become familiar with Explain Plan and TKPROF tools and use them to your advantage. Explain Plan helps in discovering the access path used by a SQL; TKPROF shows its actual performance statistics. These tools are bundled with Oracle server software and can help you improve SQL performance.


8. Think globally when acting locally.
Always remember: Any changes you make in the database to tune one SQL statement may affect the performance of other statements used by applications and users.

9. The WHERE clause is crucial.
The following WHERE clauses would not use the index access path even if an index is available (COL1 and COL2 are in the same table and the index is created on COL1):
* COL1 > COL2
* COL1 < COL2
* COL1 > = COL2
* COL1 <= COL2
* COL1 IS NULL
* COL1 IS NOT NULL. An index does not store the ROWIDs for columns that have null values. So querying for rows with null values cannot use that index.
* COL1 NOT IN (value1, value2 )
* COL1 != expression
* COL1 LIKE '%pattern'. In this case, the leading edge of the index is suppressed so the index cannot be used. On the other side, COL1 LIKE 'pattern%' or COL1 LIKE 'pattern%pattern%' can use index because they would result in a bounded-range index scan.
* NOT EXISTS subquery
* expression1 = expression2. Any expressions, functions, or calculations involving indexed columns would prohibit usage of the index. In the following example, using the UPPER SQL function would prevent index scan and result in a full table scan.
	SELECT DEPT_NAME
FROM DEPARTMENT
WHERE UPPER(DEPT_NAME) like 'SALES%';

10. Use WHERE instead of HAVING for record filtering.

Avoid using the HAVING clause along with GROUP BY on an indexed column.
In this case, the index is not used. Furthermore, exclude rows with
WHERE clause rather than using HAVING. If the EMP table has an index on
column DEPTID, the following query cannot take advantage of it:

SELECT DEPTID,
SUM(SALARY)
FROM EMP
GROUP BY DEPTID
HAVING DEPTID = 100;


However, the same query can be rewritten to exploit the index:

SELECT DEPTID,
SUM(SALARY)
FROM EMP
WHERE DEPTID = 100
GROUP BY DEPTID;

11. Specify the leading index columns in WHERE clauses.

For a composite index, the query would use the index as long as the
leading column of the index is specified in the WHERE clause. The
following query would use the composite index based on the primary key
constraint on the PART_NUM and PRODUCT_ID columns:

SELECT *
FROM PARTS
WHERE PART_NUM = 100;


whereas this query would not use the composite index:

SELECT *
FROM PARTS
WHERE PRODUCT_ID = 5555;


The same request can rewritten to take advantage of the index. In this
query, it is assumed that the PART_NUM column will always have a value
greater than zero:

SELECT *
FROM PARTS
WHERE PART_NUM > 0
AND PRODUCT_ID = 5555;

12. Evaluate index scan vs. full table scan.

If selecting more than 15 percent of the rows from a table, full table
scan is usually faster than an index access path. In such cases, write
your SQLs so that they use full table scans.

The following statements would not use index scans even if an index
is created on the SALARY column. In the first SQL, using the FULL hint
forces Oracle to employ full table scan. When using an index does more
harm than good, you can also use these techniques to suppress the use
of the index.

SELECT * --+FULL
FROM EMP
WHERE SALARY = 50000;
SELECT *
FROM EMP
WHERE SALARY+0 = 50000;


The following statements would not use index scans even if an index is created on the SS# column:

SELECT *
FROM EMP
WHERE SS# || ' ' = '111-22-333';


An index is also not used if Oracle has to perform implicit data
conversion. For the following example, SALARY is a numeric column in
the table EMP and a character value is converted into a numeric value:

SELECT *
FROM EMP
WHERE SALARY = '50000';


When the percentage of table rows accessed is 15 percent or less, an
index scan will work better because it results in multiple logical
reads per row accessed, whereas a full table scan can read all the rows
in a block in one logical read. Thus, the performance of full table
scan is better when accessing a large percentage of rows from a table.

To illustrate this point, say the ANALYZE command is issued against
the EMP table and all its indexes. Oracle generates the following
statistics in the data dictionary table USER_TABLES and USER_INDEXES:

Table Statistics:
NUM_ROWS = 1000
BLOCKS = 100


Index Statistics:

BLEVEL = 2
AVG_LEAF_BLOCKS_PER_KEY = 1
AVG_DATA_BLOCKS_PER_KEY = 1



Based on these statistics, the following would be the logical reads (block accessed) for different types of scans:

Use of index to return one row = 3

(BLEVEL+(AVG_LEAF_BLOCKS_PER_KEY - 1) + AVG_DATA_PER_KEY
Full table scan = 100
(BLOCKS)
Use of index to return all rows = 3000
(NUM_ROWS * Blocks accessed to return one row using index)

13. Use ORDER BY for index scan.
Oracle's optimizer will use an index scan if the ORDER BY clause is on an indexed column. The following query illustrates this point. This query would use the index available on EMPID column even though the column is not specified in the WHERE clause. The query would retrieve ROWID for each row from the index and access the table using the ROWID.

SELECT SALARY 
FROM EMP
ORDER BY EMPID;

If this query performs poorly, you can try another alternative by rewriting the same query using the FULL hint described earlier (Number 12).

14. Know thy data.
As I've already explained, you have to know your data intimately. For example, say you have a table called BOXER containing two columns--BOXER_NAME and SEX--with a nonunique index on column SEX. If there are an equal number of male and female boxers, the following query will run faster if Oracle performs a full table scan:

SELECT BOXER_NAME
FROM BOXER
WHERE SEX = 'F';
You can ensure the query performs a full table scan by rewriting it as :

SELECT BOXER_NAME --+ FULL 
FROM BOXER
WHERE SEX = 'F';
If the table contains 980 male boxers, this query would be faster because it results in index scan:
SELECT BOXER_NAME --+ INDEX (BOXER BOXER_SEX) 
FROM BOXER
WHERE SEX = 'F';

This example illustrates the importance of being familiar with data distribution. SQL performance will vary as the database grows and data distribution changes. Oracle added a HISTOGRAMS function with Oracle7.3 so that the optimizer can be aware of the data distribution in a table and select an appropriate execution plan.

15. Know when to use large-table scans.
When retrieving rows from a very small table or very large table, a full table scan may result in better performance over an index scan. An index scan on a very large table may require scanning of many index and table blocks. When these blocks are brought in the database buffer cache, they are kept as long as possible. These blocks may not be needed by other queries so the database buffer hit ratio may decline and the performance of a multiuser system may suffer. However, the blocks read by a full table scan are removed from the database buffer cache much earlier and the database buffer hit ratio is not affected.

16. Minimize table passes.
Usually, reducing the number of table passes in a SQL query results in better performance. Queries with fewer table passes mean faster queries. Here's an example. The STUDENT table has four columns named NAME, STATUS, PARENT_INCOME, and SELF_INCOME. The name is the primary key. The values of the STATUS column is 0 for independent students and 1 for dependent students.
The following query returns the name and income for dependent as well as independent students. It results in two passes through the STUDENT table, creates a temporary table for processing, and initiates a sort to eliminate duplicates:

SELECT NAME, PARENT_INCOME
FROM STUDENT
WHERE STATUS = 1
UNION
SELECT NAME, SELF_INCOME
FROM STUDENT
WHERE STATUS = 0;

The same request can be fulfilled by this query, which results in only one pass through the table:
SELECT NAME, PARENT_INCOME*STATUS + SELF_INCOME(1-STATUS)
FROM STUDENT;
17. Join tables in the proper order.
The order in which tables are joined in a multiple table join is crucial. Overall, fewer rows will be processed if tables are joined in the correct order.
Always perform the most restrictive search first to filter out the maximum number of rows in the early phases of a multiple table join. This way, the optimizer will have to work with fewer rows in the subsequent phases of join, improving performance. You should carry fewer rows to the subsequent phases of a join. Furthermore, make sure that the driving table (the table searched first in the nested loop join) is the table that returns the least number of rows. In a master and detail tables join (i.e., ORDERS and ORDER LINE ITEMS tables), ensure that the master table is connected first; connecting a detail table first will usually result in more rows getting joined.
For the rule-based optimizer, the driving table should be the last table specified in the FROM clause. The rule-based optimizer makes the last table specified in the FROM clause the driving table in the nested loop join. If the join is resulting in nested loop join, consider making index available for the inner table search. Explain Plan and TKPROF output show the join type, table join order, and number of rows processed in every phase of a join.
For the cost-based optimizer, the order in which tables appear in the WHERE clause is irrelevant as the optimizer will try to find the best execution plan on its own. Consider using the ORDERED hint to control the table join order. In the following query, tables would be joined in the order in which they appear in the WHERE clause:

SELECT ORDERS.CUSTID, ORDERS.ORDERNO,
ORDER_LINE_ITEMS.PRODUCTNO --+ORDERED
FROM ORDERS, ORDER_LINE_ITEMS
WHERE ORDERS.ORDERNO = ORDER_LINE_ITEMS.ORDERNO;
18. Use index-only searches when possible.
If possible, write queries that use index-only searches. The optimizer will need to search only the index and not the table to satisfy such queries, so performance is better; the optimizer will use an index-only search if all the information to satisfy the SQL can be found in the index itself. If the EMP table has a composite index on LANME and FNAME columns, the following query will result in index-only search:

SELECT FNAME 
FROM EMP
WHERE LNAME = 'SMITH';

whereas this query results in an index scan as well as table search by ROWID:

SELECT FNAME , SALARY
FROM EMP
WHERE LNAME = 'SMITH';

Therefore, pay close attention to the column list in the SELECT clause. Include only those columns that you really need.

19. Redundancy is good.
Provide as much information as possible in the WHERE clause. For example, if the WHERE clause is WHERE COL1 = COL2 and COL1 = 10, the optimizer will infer that COL2 =10. But if the WHERE clause is WHERE COL1 = COL2 and COL2 = COL3, then optimizer will not infer that COL1 = COL3.

20. Keep it simple, stupid.
Make your SQL statements as simple as possible. Very complex SQL statements can overwhelm the optimizer; sometimes writing multiple, simpler SQLs will yield better performance than a single complex SQL statement. Oracle's cost-based optimizer is not completely robust. It's in the development stage but getting better with each Oracle release. As a result, you have to keep an eye on the Explain Plan cost estimate. "Cost" is a relative term--nobody seems to know what the cost numeric value means--but the smaller this value, the better the SQL performance. So tune the statement for the lower cost.
It may often be more efficient to use temporary tables and break up a complex SQL join involving many tables. For example, if a join involves eight tables with large amount of data, it might be better to split the SQL into two or three SQLs, each involving at most a four-table join and storing the intermediate results in precreated temporary tables.

21. You can reach the same destination in different ways.
In many cases, more than one SQL statement can get you the same desired results. Each SQL may use a different access path and may perform differently. For example, the MINUS operator can be much faster than using WHERE NOT IN (SELECT ) or WHERE NOT EXISTS.
Let's say we have an index on a STATE column and another index on an AREA_CODE column. Despite the availability of indexes, the following statement will require a full table scan due to the usage of the NOT IN predicate:

SELECT CUSTOMER_ID 
FROM CUSTOMERS
WHERE STATE IN ('VA', 'DC', 'MD')
AND AREA_CODE NOT IN (804, 410);
However, if the same query is rewritten as the following, it will result in index scans:

SELECT CUSTOMER_ID 
FROM CUSTOMERS
WHERE STATE IN ('VA', 'DC', 'MD')
MINUS
SELECT CUSTOMER_ID
FROM CUSTOMERS
WHERE AREA_CODE IN (804, 410);
If a SQL involves OR in the WHERE clause, it can also be rewritten by substituting UNION for OR in the WHERE clause. You must carefully evaluate execution plans of all SQLs before selecting one to satisfy the information request. You can use Explain Plan and TKPROF tools for this process.

22. Use the special columns.
Take advantage of ROWID and ROWNUM columns. Remember, the ROWID search is the fastest. Here's an example of UPDATE using ROWID scan:

SELECT ROWID, SALARY 
INTO TEMP_ROWID, TEMP_SALARY
FROM EMPLOYEE;
UPDATE EMPLOYEE
SET SALARY = TEMP_SALARY * 1.5
WHERE ROWID = TEMP_ROWID;
A ROWID value is not constant in a database, so don't hard-code a ROWID value in your SQLs and applications.
Use ROWNUM column to limit the number of rows returned. If you're not sure how many rows a SELECT statement will return, use ROWNUM to restrict the number of rows returned. The following statement would not return more than 100 rows:

SELECT EMPLOYE.SS#, DEPARTMENT.DEPT_NAME 
FROM EMPLOYEE, DEPENDENT
WHERE EMPLOYEE.DEPT_ID = DEPARTMENT.DEPT_ID
AND ROWNUM < 100;

23. Use explicit cursors over implicit cursors.
Implicit cursors result in an extra fetch. Explicit cursors are opened by developers by using DECLARE, OPEN, FETCH and CLOSE cursor statements. Implicit cursors are opened by Oracle for DELETE, UPDATE, INSERT and SELECT statements.

24. Explore and take advantage of the Oracle parallel query option.
Using the parallel query option, you can execute SQL in parallel for faster performance. In Oracle7, only queries based on a full table scan can be processed in parallel. In Oracle8, queries based on indexed range scans can be parallelized if the index is partitioned. Parallel query option should be used only for SMP and MPP systems containing multiple disk drives.
The Oracle server has many features, but the presence of these features alone does not ensure faster performance. You must configure the database for these features and write SQL specifically to take advantage of them. For example, the following SQL would be executed in parallel:

SELECT * --+PARALLEL(ORDERS,6)
FROM ORDERS;
25. Reduce network traffic and increase throughput.
Using array processing and PL/SQL blocks can achieve better performance as well as reduce network traffic. Array processing allows a single SQL statement to process multiple rows. For example, using arrays in an INSERT statement can insert 1,000 rows in a table. Significant performance gains can be achieved in client/server and batch systems using this technique.
Multiple SQL statements can cause heavy network traffic. However, if the SQL statements are within a single PL/SQL block, the entire block can be sent to the Oracle server, processed there, and results returned to the application running on the client.

Encrypting DataSource Passwords

1. Encrypt the database password by using SecureIdentityLoginModule in JBoss.

JBoss 3.2.5



$ cd jboss-3.2.5

$ java -cp 'lib/jboss-jmx.jar;lib/jboss-common.jar;server/default/deploy/jboss-jca.sar;server/default/lib/jbosssx.jar'

org.jboss.resource.security.SecureIdentityLoginModule password

Encoded password: 5dfc52b51bd35553df8592078de921bc



JBoss 4.0



$ cd jboss-4.0

$ java -cp "lib/jboss-jmx.jar;lib/jboss-common.jar;server/default/lib/jboss-jca.jar;server/default/lib/jbosssx.jar"

org.jboss.resource.security.SecureIdentityLoginModule password

Encoded password: 5dfc52b51bd35553df8592078de921bc



(P.S. I fail to generate encoded password by JBoss 3.2.5. Thus I use JBoss 4.0 library to generate it instead.)



2. The datasource oracle-ds.xml should then not use the user-name and password settings, and instead specify the security-domain that maps to the login-config.xml entry for the SecureIdentityLoginModule config.


(JBOSS 3.2.5: If oracle-ds.xml doesn’t exist, copy from “%JBOSS_HOME%docsexamplesjca”)


In oracle-ds.xml ( %JBOSS_HOME%serveralldeploy), replace the <user-name> and <password> with <security-domain> as below:



<datasources>

<local-tx-datasource>

<jndi-name>DefaultDS</jndi-name>

<connection-url>jdbc:oracle:thin:@dev-db:1000:abc</connection-url>

<driver-class>oracle.jdbc.driver.OracleDriver</driver-class>

<blocking-timeout-millis>5000</blocking-timeout-millis>

<idle-timeout-minutes>15</idle-timeout-minutes>

<max-pool-size>20</max-pool-size>

<min-pool-size>10</min-pool-size>


<!-- Use the security domain defined in conf/login-config.xml -->

<security-domain>EncryptDBPassword</security-domain>


</local-tx-datasource>

</datasources>



In login-config.xml (%JBOSS_HOME%serverallconf), add below code with encrypted password:



<policy>

<!-- Example usage of the SecureIdentityLoginModule -->

<application-policy name = "EncryptDBPassword">

<authentication>

<login-module code = "org.jboss.resource.security.SecureIdentityLoginModule"

flag = "required">

<module-option name = "username">system</module-option>

<module-option name = "password">5dfc52b51bd35553df8592078de921bc</module-option>

<module-option name = "managedConnectionFactoryName">jboss.jca:service=LocalTxCM,name=DefaultDS</module-option>

</login-module>

</authentication>

</application-policy>


</policy>




Test Example Code in Java


import javax.naming.Context;

import javax.naming.InitialContext;

import javax.rmi.PortableRemoteObject;


import javax.naming.NamingException;

import javax.sql.DataSource;

import java.sql.Connection;

import java.sql.SQLException;


import java.util.Hashtable;

import java.sql.PreparedStatement;

import java.sql.ResultSet;

import java.sql.ResultSetMetaData;


private String getDSData(){

String a = "Nothing";

try

{

Object obj = new InitialContext().lookup("java:DefaultDS");

DataSource ds = (DataSource)obj;

Connection conn = null;

conn = ds.getConnection();

conn.setAutoCommit(true);

PreparedStatement pStmt = null;

pStmt = conn.prepareStatement("select * from TABLE1");


ResultSet rs = pStmt.executeQuery();

ResultSetMetaData rsmd = rs.getMetaData();

int numberOfColumns = rsmd.getColumnCount();

String[] columnNames = new String[numberOfColumns];

for (int i=0; i<columnNames.length; i++) {

columnNames[i] = rsmd.getColumnName(i+1);

}

while (rs.next()) {

Hashtable values = new Hashtable();

for (int i=0; i<columnNames.length; i++) {

Object value = rs.getObject(columnNames[i]);

if (value!=null) values.put(columnNames[i],value);

}

a = values.toString();

}

if (conn!=null)

conn.close();

} catch (NamingException ex) {

a=ex.toString();

} catch (SQLException ex) {

a=ex.toString();

} catch(Exception ex){

a = ex.toString();

}

return a;

}



Reference: http://wiki.jboss.org/wiki/Wiki.jsp?page=EncryptingDataSourcePasswords