SQL Server DB Design Guidelines

SQL Server DB Design Guidelines

by Stephen R. Mull

http://home.earthlink.net/~a013452/index.htm

Contents

Security

Abstraction

Physical Design

I/O & Logical Design

Locks & Deadlocks

Naming Conventions

Coding for Readability

Release & Documentation

Replication, Clustering, and Log Shipping

Appendix A, Stored Procedures and Code Example(s)

Appendix B & C, Temporary Tables vs. Table Valued Variables

Appendix C, Stored Procedure(s) Template with Example

Appendix D, Optimize Disk Configuration in SQL Server

New:

I sat through a presentation on the "fusion io" www.fusionio.com, brand of SSD. FYI, Steve Wozniac is their Chief Scientist. Here are some specifics:

- It exists as a PCIe card, in order to avoid bottlenecking due to SATA or SCSI type storage communication protocols.- Price point is $7,500.- for a 320 GB card. Current max card size is 640 GB.- The cards use up some server memory (RAM) to store a redundant set of metadata while running. They have config software that installs with them.- Throughput is something like 600 MB/sec, maybe 40,000 IOPs (fantastically huge compared to a mechanical HDD). - Reliability and longevity are comparable/at least as good as HDDs. 8-10 year life (at data rate of 2 TB/day - an industry ~standard for 80 GB drives). Specifically, P(undetected bad data) = EE-20, P(lost data) = EE-30. Also, since the thing monitors itself, it will wear out gradually, and show you/signal to you as it is doing so ..- The cards use something like a RAID 5 algorithm internally, algorithms to evenly wear themselves, and algorithms to proactively monitor and fix any data anomalies. They also use (20%) extra entirely internal storage, to replace worn out flash cells as the drive ages. - Combining the cards in a server-level RAID configuration doesn't noticeably slow anything down.- Adding a second card (and striping it as RAID 0) not only doubles storage size, it doubles storage throughput (!). This linear scaling applies until you saturate the computer bus.. (resulting in a massively huge IO rate). Often servers with a few of these cease to be I/O bound, and then become either CPU or network I/O bound..- The card is insensitive to sequential vs. random I/O operations, since it is massively parallel. There's then no benefit to separating data and log file operations onto separate cards, etc.- There is no need, like spindle loading, to keep extensive freeboard on the drive.- My (strong) recommendation is to:o buy a card and test it, - if you are highly cautious (but these things are more like other PCI cards than HDDs, failure-wise. If they fail they might bring down the whole box.)o test it on a different (from production) server first- by putting tempdb, and the logs on ito assuming it tests well, move that card to production- Combined with current sufficient RAM and CPU, it should massively increase the speed of the box, and dramatically speed current query response, particularly updates and things using tempdb for query processing. You might even be able to put all the (mdf) data files on it too.- Use existing storage for backups, etc.- The hot ticket are these on 2 redundant servers, for all storage, and then put SQL 2k8 EE, maybe using built-in compression, and synchronous mirroring, on the (x64) system(s).

Update: I/we installed a 320 GB card on each of two 8x, x64, 64GB SQL 2k5 StdEd, Win 2k3 EE servers, for one of my clients. I also re-configured the RAID array into aligned RAID 5, with caching + battery backup, and write-through OFF. Data throughput went from ~5 MB/Sec (before) to 300 MB/sec (after). The only issue was that the Fusion i-o cards, and the Dell (&Broadcom) firmware needed the latest releases, to all play nicely together.. (due to higher i/o throughput on the bus)

1. Security

§ Review the Microsoft Technet site on SQL Server security best practices: http://www.microsoft.com/technet/treeview/default.asp?url=/technet/prodtechnol/sql/maintain/security/sp3sec/Default.asp

§ Review Microsoft’s SQL Server Security Checklist:

http://www.microsoft.com/sql/techinfo/administration/2000/security/securingsqlserver.asp

§ Using only Windows integrated security, as Microsoft suggests, is usually impractical. Therefore one needs to take care configuring SQL Server permissions & security.

§ Don’t expose a SQL Server directly to the internet. Consider both firewalls and network segmentation for securing a SQL Server.

§ All SQL Server production installations using SQL Server security should have a very strong “sa” password, to prevent exhaustive search cracks.

§ Don’t grant a login or service any more permissions than it needs to do it’s specific tasks.

§ A set of non-administrative roles should be defined with respect to each production SQL server application.

§ A set of SQL Server production UserIds should be created and assigned to the created roles.

§ All production processes should access production dbs either through these production UserIds or through application roles.

§ Production UserIds/roles should be granted execute permission on stored procedures only, on an as-needed basis (excepting requirements due to dynamic SQL).

§ Before QA of each production release, Quality should change –all- passwords on the test server & confirm that no code is broken as a result (i.e. there is no unmanaged db access by any component, tier, or 3rd party service, especially under the “sa” SQL Server UserId)

§ All sensitive information should be stored in the db in masked form. The masking function should be simple enough so that it can be executed or reversed in SQL, VB, Java, or C++. Masking functions should not be stored on the db server anywhere.

2. Abstraction

Most auto-generated SQL (from many RAD products) scales very inefficiently. With these architectures, as the user load increases, it is necessary to buy more hardware, in increasing proportion to users added, to maintain response time. For maintainability, and ultimately scalability, put SQL into stored procedures. This builds in modifiability on the DBMS, making future maintenance, troubleshooting, and modification faster, potentially transactionally consistent, and much easier.

§ Use stored procedures for all production access (updating and reading) on all new databases.

o On SQL Server 2000, views can be substituted for stored procedures with minimal execution speed impact; however this can still make SQL statement optimization more difficult. On SQL 7, views are universally slower and significantly less featured than stored procedures, and thus should not be used in lieu of stored procedures for abstraction.

§ On existing databases, views are minimally acceptable as a means of after the fact abstraction for read-only operations, when implementation of stored procedures would require extensive modifications.

§ Abstract connection information wherever feasible.

§ Some even abstract stored procedure/SQL call text and parameter name(s)/type(s), especially on high end systems (by making a metadata db). Using this method one can port a system between db platforms much more easily, while maintaining abstraction, and hence the capability to tune the specific SQL to each platform.

3. Physical Design

As a database grows, either in size or load, regularly evaluate partitioning different portions of the database onto different physical files/filegroups, storage devices, and/or servers. Depending on storage subsystem design, this can dramatically speed up db lookup, modification, or recovery. Stay well ahead of storage space & speed needs.

§ Optimum physical design varies dramatically depending on db size, load size, and load type. For a small departmental or testing db, the default db file specifications are fine. Often these small implementations grow into a departmental server, all on RAID 5.

§ Everything on RAID 5 is a convenient, low write performance configuration. On RAID 5 write operations, the RAID subsystem has to calculate and write parity information before the write operation is committed (excluding controller caching). The rule of thumb is that RAID 5 is acceptable for less than 5% writes and unacceptable for more than 10% writes. With Raid 5, it is generally not necessary to partition onto different files/filegroups for the sake of I/O speed. However, filegroup partitioning may still be convenient for back-up/recovery speed optimization.

§ Optimum read/write striping is RAID 0 + 1, which is 100% space redundant, and therefore the most expensive storage configuration. To balance between cost & performance, often data is placed on RAID 5 and the logs (& sometimes tempdb & the system dbs) are placed on RAID 0 + 1. Thus write intensive operations are placed on the fastest (& costliest) redundant storage subsystem, where they will be helped most.

§ Especially consider separating log and data operations onto different physical storage subsystems and controllers. This can be accomplished regardless of whether or not one is using RAID.

§ Note: Don’t use Microsoft’s OS based RAID. For small systems, buy an IDE RAID card instead.

§ Since Write performance is generally proportional to HDD speed, sometimes a ~fast & cheap way to debottleneck a box is simply to upgrade the HDDs to higher speed drives. This may be much faster & cheaper than a redesign & optimization of the data access layer. Generally, however, this just postpones the problem for later.

· A great write-up on physical db design by Denny Cherry is included in Appendix D, “Optimize disk configuration in SQL Server”.

4. I/O & Logical Design

Persistence design based on table rows as persistent instances of object properties (one row in one table per instance of the object) can often result in degraded dbms performance. This object based design can lead to increased round trips to the dbms, by encouraging ISAM operations, which often get executed in a loop in the middle tier. From the whole application’s perspective, encapsulation of related db operations into a single (T-SQL) procedure on the server side will enable enormous relational algorithmic efficiencies, as well as eliminate the encapsulation and parsing overhead that results from multiple dbms round trips. This architectural limitation of ISAMs applies regardless of hardware or dbms platform selection.

§ Design to:

o Minimize round trips to dbms (RAM cache on client(s) as much as possible

o Minimize the size of returned data (no lazy select *, redundant fields, or redundant selects)

o Share/pool connections

§ XML

o Being both delimited and character based, XML is verbose. Further, it can’t be streamed into anything, without first defining the relevant schema. This means that for high throughput applications, one should carefully weigh the cross platform compatibility benefits of XML with its throughput limitations compared to TDS 8.

o Current high throughput best practices still include separating XML services onto a separate server, and communicating with SQL Server using TDS 8. The ADO.Net System.Data.SqlClient class uses native TDS 8. Microsoft’s early rhetoric on this subject, “XML everything”, is misleading.

o Several of Microsoft’s SQL Server/XML architectural models, similar to models implying use of Analysis Services alongside a production SQL Server, are 2-teir. These models were designed for a departmental or development server, and ignore many production considerations, such as throughput under load and maintainability.

§ Use the minimum possible number of select statements to accomplish any lookup or modification. For this purpose sub-queries count as separate select statements.

§ For scalable applications you have to stay stateless. Do -not- use server side cursors, or components that create server side cursors, at all. They’re too resource intensive to scale. For server side looping try “while” loops on (small) table variables w/identities instead (v2k). For more large or complex applications, carefully managed temp tables can also be used. Note that either creating or dropping a temp table loads the system tables and can create concurrency problems. Actually compare/test table valued variables vs. temp tables for OLTP applications, and design to avoid either one if possible. Especially avoid “select into” for OLTP applications. Consider this KB article on tempdb concurrency issues: http://support.microsoft.com/default.aspx?scid=kb;en-us;328551. Generally you still have to manage concurrency manually. Be cautious about trying to scale the new .Net disconnected recordset objects. Test first at load.

§ Avoid/delay use of most Declarative Referential Integrity (DRI) in immature or fast OLTP databases– put check logic in stored procedures or the middle tier

o Exception – when an “in production” schema is not fully abstracted or the new db is deployed in an environment consisting of multiple data updaters of varying skill levels.

§ Where row order is constrained always use order by & asc/desc qualifiers – never rely on default sort order or unconstrained data order in a table.

§ Use outer joins in lieu of correlated sub-queries (for “not exists”, etc ..), unless you first closely check the query plan of the statement to make sure it’s not “acting” like a correlated sub-query.

§ Avoid custom data types. They’re equivalent to DRI and slow down updates to the db.

§ Consider use of the “no lock” optimizer hint for DSS, catalog, and read only table operations.

§ Enable “select into/bulk copy” on all new (v7 only) dbs.

§ Enable “auto create statistics”, “auto update statistics” & “auto shrink” on new small/medium sized dbs. Evaluate changing these configurations on larger/faster dbs.

§ Always use a stored procedure template for header/footer code fragments.

§ Always create a usage example, author, & date in the comment section of the stored procedure definition.

§ Always use return codes in stored procedures such as: 1 or a relevant row count for success, 0 for failure or no records affected/found.

§ Use integer type keys when designing primary keys for lookup speed.

§ Use timestamps in preference to other primary key candidates when designing tables for high speed write access, since you can then guarantee uniqueness while still writing to an un-indexed heap.

§ Use triggers only to auto-generate log files or timestamps – use procedures elsewhere – it keeps the db more maintainable.

§ Avoid bit & smallmoney data types, use integer & money types instead – (word alignment).

§ Use datetime / smalldatetime & CurrentTimestamp / getdate() functions instead of the timestamp data type (DRI).

5. Locks & Deadlocks

Locking is a normal part of db operation. While specifying “NO LOCK” optimizer hints for static reads can sometimes be useful (and sometimes dangerous), other locking tweaks can cause more trouble than they solve. Often locking problems are due to complex, inefficient, multi-part read/write T-SQL transactions (which often result over time from uncoordinated, organic development), combined with a slow I/O subsystem.

§ Theoretically deadlocks can be avoided on all intra-db I/O, by carefully designing all transactions compatibly, with a consistent locking sequence. As a practical matter, that much planning is usually uneconomical. However, on well designed systems, locking problems are minimal. The typical scenario is to repair, well after rollout, SQL code that fails intermittently under increasing load, as deadlocks or timeouts. Use Profiler or sp_lock & sp_who2 to find the locks/deadlocks, and examine the sequence and structure of the associated transaction & T-SQL.