Charles Vaz: Simple SQL Server Performance Tips

Always create a data model (ERD).
Consider using an application block or a best practice based design.
Make sure the database is normalized - very important else sql server will not give optimized query plans (Tips for SQL Server 2005 Query Plans) . For the 1 to many (1:m OR m:1) relation, -> ensure that the child table's primary key has one of its composite keys as the parent table's primary key. All dependent tables must have the parent-primary-key (foreign key) and a surrogate key as its primary key eg. a Person - Address relationship, or a Product - Attribute relationship. For an m:n relation ensure that the two tables have a third table to hold the primary key combinations of both the related tables eg. a many to many relationship.
Make sure database security is controlled through views/stored procedures and finally roles.
All commonly used joins have indexes on the where condition columns. Remember foreign key constraint doesn't mean an index.
Always use Inner Joins if possible then Outer Joins . Use Left Outer joins only when foreign keys are nullable. Try to design around NULL (avoid foreign keys being NULL). Use ANSI_NULL to ensure ANSI NULL compatibility. Remember: SELECT * FROM A1 where b not in (SELECT b from B1) would return null if any b is null.
Keep transactions as short as possible.
Reduce lock time. Try to develop your application so that it grabs locks at the latest possible time, and then releases them at the very earliest time.
Always run/display execution plan from query analyzer when testing out stored procs/ad-hoc sql and ensure clustered index seek or nested loops are used. NO HASH JOINs. I/O or hash joins would mean spikes in CPU usage in the performance monitor(perfmon).
Avoid where conditions with functions since SQL Server doesn't have Function based indices. eg don't use select a,b from X where CONVERT(date) > '10/10/2005', instead move the convert to the RHS constant. This guarantees query exec. plan reuse and also usage of index columns by query plan.
Always run sql profiler and run your client application and ensure that the duration column is not too much, if too much run index tuning wizard which will confirm that no indices are required for the queries.
Always use connection pools for guaranteeing caching of queries results etc. Connection Strings should exactly match for connection pooling, if NT USer use same user while connecting to the database from the client. Remember: NT based connection pooling through delegation doesn't work correctly in ASP.NET, also it isn't as scalable as a SQL user based connection pool. You can always encrypt the connection string in the web.config file
SQL Server .NET data provider is the fastest. The SQL Server .NET provider uses TDS (Tabular Data Stream, which is the native SQL Server data format) to communicate with SQL Server. The SQL Server .NET provider can be used to connect to SQL Server 7.0 and SQL Server 2000 databases, but not SQL Server 6.5 databases. If you need to connect to a SQL Server 6.5 database, the best overall choice is the OLE DB.NET data provider.
2 part name - Always use fully qualify tables/views/stored procs like exec dbo.sp_storeusers or sp_sqlexec rsdb.dbo.sp_storeusers to be compatible with future releases of SQL Server.
SQL Server 2005 places no limits on server RAM, supports XML natively, has an inbuilt tuning advisor and works with the same sql syntax as SQL Server 2000. Constant Scan and other operators of SQL 2005
Server side cursors are not scalable in SQL Server => avoid .
Cursors are degradable to the next higher cost cursor - when ORDER BY (not covered by index), TOP, GROUP BY, UNION, DISTINCT,.. is used.
Always use DataReaders, then DataTables then DataSets with ADO.NET in that order of performance hit.
Try to use SELECT ... (with NOLOCK) hint. NOLOCK gives dirty data, useful only when readers are much more than writers. If appropriate, reduce lock escalation by using the ROWLOCK or PAGLOCK. Consider using the NOLOCK hint to prevent locking if the data being locked is not modified often.
Always de-allocate and close cursors, close connections.
To check io costs - set statistics io on -- just get stats for touches on the tables (could be index, clustered index or table)
Non-clustered index leads to a bookmark look-up when the clustered index/rowid data is accessed.
Internationalization: Always use UTC time in database and plan for Unicode. Don't assume locale and number of users, design for most scalability. Don't use the NVARCHAR or NCHAR data types unless you need to store 16-bit character (Unicode) data. They take up twice as much space as VARCHAR or CHAR data types, increasing server I/O and wasting unnecessary space in your buffer cache.
ADO.NET calls ad-hoc queries using sp_ExecuteSQL("...") so they will be cached, so no problems with search pages but use the same connection string/pooling. select * from syscacheobjects to check cache.
Avoid SELECT * ==> leads to table scan, also have at least 1 clustered index on a table (unless its very small) Because there is no index on the column to use for the query. It must do a table scan to evaluate each row. A table scan is also done if all columns are requested or the where condition doesn't contain any indices.
use SET => better for assigning single values rather than SELECT eg. SET @a =10
Openxml is costly - it loads the xml parser in sql server so use bulk insert/bulk copy
DBCC - database consistency check (misnomer now!) DBCC FREEPROCACHE (free proc cache) DBCC REINDEX - at night, high cost, table lock, reindex, reapply fill factor which is applied only initially DBCC CHECK - check db consistency DBCC SHOWCONTIG - show defragmentation (extent level, logical scan, scan density) DBCC INDEXDEFRAG - online operation - during day, low cost, page lock, fix logical scan frag.
Maintainance: update statistics every night, reindex every week.
sp_who - show spids currently running and deadlocked ones
Ask for less data over the wire - its better to work like explorer and ask for parent nodes first then child nodes based on user request.
use of DISTINCT is not very scalable => database model error (may not be relational)
Optimizer uses constraints - so use indices,foreign keys etc
Clustered Index Scan or Full Table Scan are because an index is missing, use index tuning wizard with thorough to find the missing index when the application is running and when profiler is used. Index Tuning Wizard can be run on individual queries too from SQL Query Analyser.
SQL Query Optimizer: Select column would affect BOOKMARK LOOKUP, Predicate column (where clause) determines clustered or non-clustered index seek/scan (scan=>between clause), Estimated Resultant rows determines a Clustered Index Scan is to be done or not.
dbcc memorystatus - value of Stolen under Buffer Distribution increase steadily? => either consuming a lot of memory within SQL Server or is not releasing something. When an application acquires a lot of Stolen memory, SQL Server cannot page this to disk like it can for a data or index page. This is memory that must remain in SQL Server’s Buffer Pool and cannot be aged out. If the application is using cursors, memory associated with a cursor requires Stolen Memory while the cursor is open => Perhaps the application is opening up cursors but not closing them before opening a new one.
OR/'in' clauses are not very performant (most of the time they result in a table scan) ==> use unions for large queries.
Always check for SQL Injection problems including comment -- web page injection issues.
A view - "virtual table" - based on views would all be materialized in the tempdb during execution so the query plan used would be based on the sql (if it contains CONVERT, RTRIM functions etc in the where clause, the index wouldn't be used because there are no function based indexes like ORACLE).
Data Types: char == trailing spaces (padded), varchar == no trailing spaces (not-padded).If the text data in a column varies greatly in length, use a VARCHAR data type instead of a CHAR data type. The amount of space saved by using VARCHAR over CHAR on variable length columns can greatly reduce I/O reads, improving overall SQL Server performance. Don't use FLOAT or REAL data types for primary keys, as they add unnecessary overhead that hurts performance. Use one of the integer data types instead.
Avoid SQL Server Application Roles which do not take advantage of connection pooling
Set following for all stored procs
========================
SET ANSI_NULLS ON -- guarantees ansi null behaviour during concat, IN operations
SET CONCAT_NULL_YIELDS_NULL ON -- any string concat with NULL is NULL
SET NOCOUNT ON -- minimize network traffic.
O/RM - Object-relational mapping - Object-relational mapping, or O/RM, is a programming technique that links relational databases to object-oriented language concepts, creating (in effect) a "virtual object database." http://en.wikipedia.org/wiki/Object-relational_mapping
Simple tips from http://www.sql-server-performance.com/david_gugick_interview2.aspBest way to optimize stored procedures:
- Limit the use of cursors wherever possible. Use temp tables or table variables instead. Use cursors for small data sets only.
- Make sure indexes are available and used by the query optimizer. Check the execution plan for confirmation.
- Avoid using local variables in SQL statements in a stored procedure. They are not as optimizable as using parameters.
- Use the SET NOCOUNT ON option to avoid sending unnecessary data to the client.
- Keep transactions as short as possible to prevent unnecessary locking.
- If your application allows, use the WITH (NOLOCK) table hint in SQL SELECT statements to avoid generating read locks. This is particularly helpful with reporting applications.
- Format and comment stored procedure code to allow others to properly understand the logic of the procedure.
- If you are executing dynamic SQL use SP_EXECUTESQL instead of EXEC. It allows for better optimization and can be used with parameters.
- Access tables across all stored procedures in the same logical order to prevent deadlocks from occurring.
- Avoid non-optimizable SQL search arguments like Not Equal, Not Like, and, Like ‘%x’.
- Use SELECT TOP n [PERCENT] instead of SET ROWCOUNT n to limit the number of rows returned.
- Avoid using wildcards such as SELECT * in stored procedures (or any SQL application for that matter).
- When executing stored procedures from a client, using ADO for example, avoid requesting a refresh of the parameters for the stored procedure using the Parameters.Refresh() command. This command forces ADO to interrogate the database for the procedure’s parameters and causes excessive traffic and application slowdowns.
- Break large queries into smaller, simpler ones. Use table variables or temp tables for temporary storage, if necessary.
- Understand your chosen client library (DB-LIB, ODBC, OLE DB, ADO, ADO.Net, etc.) Understand the necessary options to set to make queries execute as quickly as possible.
- If your stored procedure generates one or more result sets, fetch those results immediately from the client to prevent prolonged locking. This is especially important if your client library is set to use server-side cursors.
- Do not issue an ORDER BY clause in a SELECT statement if the order of rows returned is not important.
- Put all DDL statements (like CREATE TABLE) before any DML statements (like INSERT). This helps prevent unwanted stored procedure recompiles.
- Only use query hints if necessary. Query hints may help performance, but can prevent SQL Server from choosing the best execution plan. A query hint that works today may not work as well tomorrow if the underlying data changes in size or statistical distribution. Try not to out think SQL Server’s query processor.
- Consider using the SQL Server query governor cost limit option to prevent potentially long running queries from ever executing.
Best index tuning:
- Examine queries closely and keep track of column joins and columns that appear in WHERE clauses. It’s easiest to do this at query creation time.
- Look for queries that return result sets based on ranges of one or more columns and consider those columns for the clustered index.
- Avoid creating clustered primary keys if the PK is on an IDENTITY or incrementing DATETIME column. This can create hot-spots at the end of a table and cause slow inserts if the table is “write” heavy.
- Avoid excessive indexes on columns whose statistical distribution indicates poor selectivity, i.e. values found in a large number of rows, like gender (SQL Server will normally do a table scan in this case).
- Avoid excessive indexes on tables that have a high proportion of writes vs. reads.
- Run the Index Tuning Wizard on a Coefficient trace file or Profiler trace file to see if you missed any existing indexes.
- Do not totally rely on the Index Tuning Wizard. Rely on your understanding of the queries executed and the database.
- If possible, make sure each table has a clustered index, which may be declared in the primary key constraint (if you are using a data modeling tool, check the tool’s documentation on how to create a clustered PK).
- Indexes take up extra drive space, slow down INSERTs and UPDATEs slightly, and require longer backup/replication times, but since most tables have a much higher proportion of reads to writes, you can usually increase overall performance creating the necessary indexes, as opposed to not creating them.
- Remember that the order of columns in a multi-column index is important. A query must make use of the columns as they are listed in the index to get the most performance increase. While you don’t need to use all columns, you cannot skip a column in the index and still receive index performance enhancement on that column.
- Avoid creating unique indexes on columns that allow NULL values.
- On tables whose writes far outweigh reads, consider changing the FILLFACTOR during index creation to a value that allows for adequate free space on the index pages to allow for optimal table inserts.
- Make sure SQL Server is configured to auto update and auto create statistics. If these options cause undue strain on the server during business hours and you turn them off, make sure you manually update statistics, as needed. Also, note that sql server trace does cause a strain and slowdown on the server.
- Consider rebuilding indexes on a periodic basis, by recreating them (consider using the DROP_EXISTING clause), using DBCC INDEXDEFRAG (SQL 2000), or DBCC DBREINDEX. These commands defragment an index and return the fill factor space to the leaf level of each index page. Consider a mix/match of each of these commands for your environment.
- Do not create indexes that contain the same column. For example, instead of creating two indexes on LastName, FirstName and LastName, eliminate the second index on LastName.
- Avoid creating indexes on descriptive CHAR, NCHAR, VARCHAR, and NVARCHAR columns that are not accessed often. These indexes can be quite large. If you need an index on a descriptive column, consider using an indexed view on a smaller, computed portion of the column. For example, create a view:
```
CREATE VIEW view_name
WITH SCHEMABINDING
AS
SELECT ID, SUBSTRING(col, 1, 10) as col
FROM table

Then create an index on the reduced-sized column col:

CREATE INDEX name on view_name (col). This index can still be used by SQL Server when querying the table directly (although you would be limited in this example to searching for the first 10 characters only). Note: Indexed views are SQL Server 2000 only.
```
- Use surrogate keys, like IDENTITY columns, for as many primary keys as possible. INT and BIGINT IDENTITY columns are smaller than corresponding alpha-numeric keys, have smaller corresponding indexes, and allow faster querying and joining.
- If a column requires consistent sorting (ascending or descending order) in a query, for example:
```
SELECT LastName, FirstName
FROM Customers
WHERE LastName LIKE N%
ORDER BY LastName DESC

Consider creating the index on that column in the same order, for example:

CREATE CLUSTERED INDEX lastname_ndx
ON customers(LastName, FirstName) DESC. This prevents SQL Server from performing an additional sort on the data. 
```
- ```
Create covering indexes wherever possible. A covering index covers all columns selected and referenced in a query. This eliminates the need to go to the data pages, since all the information is available in the index itself. 
```
Benefits of using stored procedures
- Stored procedures facilitate code reuse. You can execute the same stored procedure from multiple applications without having to rewrite anything.
- Stored procedures encapsulate logic to get the desired result. You can change stored procedure code without affecting clients (assuming you keep the parameters the same and don’t remove any result sets columns).
- Stored procedures provide better security to your data. If you use stored procedures exclusively, you can remove direct Select, Insert, Update, and Delete rights from the tables and force developers to use stored procedures as the method for data access.
- Stored procedures are a part of the database and go where the database goes (backup, replication, etc.).
- Stored procedures improve performance. SQL Server combines multiple statements in a procedure into a unified execution plan.
- Stored procedures reduce network traffic by preventing users from having to send large queries across the network.
- SQL Server retains execution plans for stored procedures in the procedure cache. Execution plans are reused by SQL Server when possible, increasing performance. Note SQL 7.0/2000: this feature is available to all SQL statements, even those outside stored procedures, if you use fully qualified object names.
Top 10 Must Have Features in O/R Mapping Tools at http://www.alachisoft.com/articles/top_ten.html - 1. Flexible object mapping -Tables & views mapping, Multi-table mapping, Naming convention, Attribute mapping, Auto generated columns, Read-only columns, Required columns, Validation, Formula Fields, Data type mapping, 2. Use of existing Domain objects, 3. Transactional operations - COM+/MTS,Stand-alone, 4. Relationships and life cycle management - 1 to 1, many to 1, 1 to many, many to many, 5. Object inheritance - 1 table per object or 1 table for all objects - handling insert, update, delete and load data, 6. Static and dynamic queries, 7. Stored procedure calls,8. Object caching, 9. Customization of generated code and re-engineering support, 10. Code Template Customization
Perform an audit of the SQL Code http://www.sql-server-performance.com/sql_server_performance_audit8.asp
Transact-SQL Checklist

Does the Transact-SQL code return more data than needed?
Are cursors being used when they don't need to be?
Are UNION and UNION SELECT properly used?
Is SELECT DISTINCT being used properly?
Does the WHERE clause make use of indexes in search criteria?
Are temp tables being used when they don't need to be?
Are hints being properly used in queries?
Are views unnecessarily being used?
Are stored procedures being used whenever possible?
Inside stored procedures, is SET NOCOUNT ON being used?
Do any of your stored procedures start with sp_?
Are all stored procedures owned by DBO, and referred to in the form of databaseowner.objectname?
Are you using constraints or triggers for referential integrity?
Are transactions being kept as short as possible?

Application Checklist

Is the application using stored procedures, strings of Transact-SQL code, or using an object model, like ADO, to communicate with SQL Server?
What method is the application using to communicate with SQL Server: DB-LIB, DAO, RDO, ADO, .NET?
Is the application using ODBC or OLE DB to communication with SQL Server?
Is the application taking advantage of connection pooling?
Is the application properly opening, reusing, and closing connections?
Is the Transact-SQL code being sent to SQL Server optimized for SQL Server, or is it generic SQL?
Does the application return more data from SQL Server than it needs?
Does the application keep transactions open when the user is modifying data?

http://www.sql-server-performance.com/