You see things; and you say 'Why?' But I dream things that never were; and I say 'Why not?'

Sunday, June 28, 2009

Database programming guidelines, conventions

Databases are the heart and soul of many of the recent enterprise applications and it is very essential to pay special attention to database programming. I've seen in many occasions where database programming is overlooked, thinking that it's something easy and can be done by anyone. This is wrong. For a better performing database you need a real DBA and a specialist database programmer, let it be Microsoft SQL Server, Oracle, Sybase, DB2 or whatever! If you don't use database specialists during your development cycle, database often ends up becoming the performance bottleneck. I decided to write this article, to put together some of the database programming best practices, so that my fellow DBAs and database developers can benefit!

Here are some of the programming guidelines, best practices, keeping quality, performance and maintainability in mind. This list many not be complete at this moment, and will be constantly updated.
  • Decide upon a database naming convention, standardize it across your organization and be consistent in following it. It helps make your code more readable and understandable. Click here to see the database object naming convention that I follow.
  • Do not depend on undocumented functionality. The reasons being:
- You will not get support from Microsoft, when something goes wrong with your undocumented code
- Undocumented functionality is not guaranteed to exist (or behave the same) in a future release or service pack, there by breaking your code

  • Try not to use system tables directly. System table structures may change in a future release. Wherever possible, use the sp_help* stored procedures or INFORMATION_SCHEMA views. There will be situattions where you cannot avoid accessing system table though!
  • Make sure you normalize your data at least till 3rd normal form. At the same time, do not compromize on query performance. A little bit of denormalization helps queries perform faster.
  • Write comments in your stored procedures, triggers and SQL batches generously, whenever something is not very obvious. This helps other programmers understand your code clearly. Don't worry about the length of the comments, as it won't impact the performance, unlike interpreted languages like ASP 2.0.
  • Do not use SELECT * in your queries. Always write the required column names after the SELECT statement, like SELECT CustomerID, CustomerFirstName, City. This technique results in less disk IO and less network traffic and hence better performance.
  • Try to avoid server side cursors as much as possible. Always stick to 'set based approach' instead of a 'procedural approach' for accessing/manipulating data. Cursors can be easily avoided by SELECT statements in many cases. If a cursor is unavoidable, use a simpleWHILE loop instead, to loop through the table. I personally tested and concluded that a WHILE loop is faster than a cursor most of the times. But for a WHILE loop to replace a cursor you need a column (primary key or unique key) to identify each row uniquely and I personally believe every table must have a primary or unique key. Click here to see one of the many examples of using WHILE loop.
  • Avoid the creation of temporary tables while processing data, as much as possible, as creating a temporary table means more disk IO. Consider advanced SQL or views or table variables of SQL Server 2000 or derived tables, instead of temporary tables. Keep in mind that, in some cases, using a temporary table performs better than a highly complicated query.
  • Try to avoid wildcard characters at the beginning of a word while searching using the LIKE keyword, as that results in an index scan, which is defeating the purpose of having an index. The following statement results in an index scan, while the second statement results in an index seek:
1. SELECT LocationID FROM Locations WHERE Specialities LIKE '%pples'
2. SELECT LocationID FROM Locations WHERE Specialities LIKE 'A%s'

Also avoid searching with not equals operators (<> and NOT) as they result in table and index scans. If you must do heavy text-based searches, consider using the Full-Text search feature of SQL Server for better performance.

  • Use 'Derived tables' wherever possible, as they perform better. Consider the following query to find the second highest salary from Employees table:
SELECT MIN(Salary)
FROM Employees
WHERE EmpID IN
(
SELECT TOP 2 EmpID
FROM Employees
ORDER BY Salary Desc
)

The same query can be re-written using a derived table as shown below, and it performs twice as fast as the above query:

SELECT MIN(Salary)
FROM
(
SELECT TOP 2 Salary
FROM Employees
ORDER BY Salary Desc
) AS A

This is just an example, the results might differ in different scenarios depending upon the database design, indexes, volume of data etc. So, test all the possible ways a query could be written and go with the efficient one. With some practice and understanding of 'how SQL Server optimizer works', you will be able to come up with the best possible queries without this trial and error method.
  • While designing your database, design it keeping 'performance' in mind. You can't really tune performance later, when your database is in production, as it involves rebuilding tables/indexes, re-writing queries. Use the graphical execution plan in Query Analyzer or SHOWPLAN_TEXT or SHOWPLAN_ALL commands to analyze your queries. Make sure your queries do 'Index seeks' instead of 'Index scans' or 'Table scans'. A table scan or an index scan is a very bad thing and should be avoided where possible (sometimes when the table is too small or when the whole table needs to be processed, the optimizer will choose a table or index scan).
  • Prefix the table names with owner names, as this improves readability, avoids any unnecessary confusions. Microsoft SQL Server Books Online even states that qualifying tables names, with owner names helps in execution plan reuse.
  • Use SET NOCOUNT ON at the beginning of your SQL batches, stored procedures and triggers in production environments, as this suppresses messages like '(1 row(s) affected)' after executing INSERT, UPDATE, DELETE and SELECT statements. This inturn improves the performance of the stored procedures by reducing the network traffic.
  • Use the more readable ANSI-Standard Join clauses instead of the old style joins. With ANSI joins the WHERE clause is used only for filtering data. Where as with older style joins, the WHERE clause handles both the join condition and filtering data. The first of the following two queries shows an old style join, while the second one shows the new ANSI join syntax:
SELECT a.au_id, t.title
FROM titles t, authors a, titleauthor ta
WHERE
a.au_id = ta.au_id AND
ta.title_id = t.title_id AND
t.title LIKE '%Computer%'

SELECT a.au_id, t.title
FROM authors a
INNER JOIN
titleauthor ta
ON
a.au_id = ta.au_id
INNER JOIN
titles t
ON
ta.title_id = t.title_id
WHERE t.title LIKE '%Computer%'

Be aware that the old style *= and =* left and right outer join syntax may not be supported in a future release of SQL Server, so you are better off adopting the ANSI standard outer join syntax.
  • Do not prefix your stored procedure names with 'sp_'. The prefix sp_ is reserved for system stored procedure that ship with SQL Server. Whenever SQL Server encounters a procedure name starting with sp_,, it first tries to locate the procedure in the master database, then looks for any qualifiers (database, owner) provided, then using dbo as the owner. So, you can really save time in locating the stored procedure by avoiding sp_ prefix. But there is an exception! While creating general purpose stored procedures that are called from all your databases, go ahead and prefix those stored procedure names with sp_ and create them in the master database.
  • Views are generally used to show specific data to specific users based on their interest. Views are also used to restrict access to the base tables by granting permission on only views. Yet another significant use of views is that, they simplify your queries. Incorporate your frequently required complicated joins and calculations into a view, so that you don't have to repeat those joins/calculations in all your queries, instead just select from the view.
  • Use 'User Defined Datatypes', if a particular column repeats in a lot of your tables, so that the datatype of that column is consistent across all your tables.
  • Do not let your front-end applications query/manipulate the data directly using SELECT or INSERT/UPDATE/DELETE statements. Instead, create stored procedures, and let your applications access these stored procedures. This keeps the data access clean and consistent across all the modules of your application, at the same time centralizing the business logic within the database.
  • Try not to use text, ntext datatypes for storing large textual data. 'text' datatype has some inherent problems associated with it. You can not directly write, update text data using INSERT, UPDATE statements (You have to use special statements like READTEXT, WRITETEXT and UPDATETEXT). There are a lot of bugs associated with replicating tables containing text columns. So, if you don't have to store more than 8 KB of text, use char(8000) or varchar(8000)datatypes.
  • If you have a choice, do not store binary files, image files (Binary large objects or BLOBs) etc. inside the database. Instead store the path to the binary/image file in the database and use that as a pointer to the actual binary file. Retrieving, manipulating these large binary files is better performed outside the database and after all, database is not meant for storing files.
  • Use char data type for a column, only when the column is non-nullable. If a char column is nullable, it is treated as a fixed length column in SQL Server 7.0+. So, a char(100), when NULL, will eat up 100 bytes, resulting in space wastage. So, use varchar(100) in this situation. Of course, variable length columns do have a very little processing overhead over fixed length columns. Carefully choose between char and varchar depending up on the length of the data you are going to store.
  • Avoid dynamic SQL statements as much as possible. Dynamic SQL tends to be slower than static SQL, as SQL Server must generate an execution plan every time at runtime. IF and CASE statements come in handy to avoid dynamic SQL. Another major disadvantage of using dynamic SQL is that, it requires the users to have direct access permissions on all accessed objects like tables and views. Generally, users are given access to the stored procedures which reference the tables, but not directly on the tables. In this case, dynamic SQL will not work. Consider the following scenario, where a user named 'dSQLuser' is added to the pubs database, and is granted access to a procedure named 'dSQLproc', but not on any other tables in the pubs database. The procedure dSQLproc executes a direct SELECT on titles table and that works. The second statement runs the same SELECT on titles table, using dynamic SQL and it fails with the following error:
Server: Msg 229, Level 14, State 5, Line 1
SELECT permission denied on object 'titles', database 'pubs', owner 'dbo'.

To reproduce the above problem, use the following commands:

sp_addlogin 'dSQLuser'
GO
sp_defaultdb 'dSQLuser', 'pubs'
USE pubs
GO
sp_adduser 'dSQLUser', 'dSQLUser'
GO
CREATE PROC dSQLProc
AS
BEGIN
SELECT * FROM titles WHERE title_id = 'BU1032' --This works
DECLARE @str CHAR(100)
SET @str = 'SELECT * FROM titles WHERE title_id = ''BU1032'''
EXEC (@str) --This fails
END
GO
GRANT EXEC ON dSQLProc TO dSQLuser
GO

Now login to the pubs database using the login dSQLuser and execute the procedure dSQLproc to see the problem.
  • Consider the following drawbacks before using IDENTITY property for generating primary keys. IDENTITY is very much SQL Server specific, and you will have problems if you want to support different database backends for your application.IDENTITY columns have other inherent problems. IDENTITY columns run out of numbers one day or the other. Numbers can't be reused automatically, after deleting rows. Replication and IDENTITY columns don't always get along well. So, come up with an algorithm to generate a primary key, in the front-end or from within the inserting stored procedure. There could beissues with generating your own primary keys too, like concurrency while generating the key, running out of values. So, consider both the options and go with the one that suits you well.
  • Minimize the usage of NULLs, as they often confuse the front-end applications, unless the applications are coded intelligently to eliminate NULLs or convert the NULLs into some other form. Any expression that deals with NULL results in a NULL output. ISNULL and COALESCE functions are helpful in dealing with NULL values. Here's an example that explains the problem:
Consider the following table, Customers which stores the names of the customers and the middle name can be NULL.

CREATE TABLE Customers
(
FirstName varchar(20),
MiddleName varchar(20),
LastName varchar(20)
)

Now insert a customer into the table whose name is Tony Blair, without a middle name:

INSERT INTO Customers
(FirstName, MiddleName, LastName)
VALUES ('Tony',NULL,'Blair')

The following SELECT statement returns NULL, instead of the customer name:

SELECT FirstName + ' ' + MiddleName + ' ' + LastName FROM Customers

To avoid this problem, use ISNULL as shown below:

SELECT FirstName + ' ' + ISNULL(MiddleName + ' ','') + LastName FROM Customers
  • Use Unicode datatypes like nchar, nvarchar, ntext, if your database is going to store not just plain English characters, but a variety of characters used all over the world. Use these datatypes, only when they are absolutely needed as they need twice as much space as non-unicode datatypes.
  • Always use a column list in your INSERT statements. This helps in avoiding problems when the table structure changes (like adding a column). Here's an example which shows the problem.
Consider the following table:

CREATE TABLE EuropeanCountries
(
CountryID int PRIMARY KEY,
CountryName varchar(25)
)

Here's an INSERT statement without a column list , that works perfectly:

INSERT INTO EuropeanCountries
VALUES (1, 'Ireland')

Now, let's add a new column to this table:


ALTER TABLE EuropeanCountries
ADD EuroSupport bit

Now run the above INSERT statement. You get the following error from SQL Server:

Server: Msg 213, Level 16, State 4, Line 1
Insert Error: Column name or number of supplied values does not match table definition.

This problem can be avoided by writing an INSERT statement with a column list as shown below:

INSERT INTO EuropeanCountries
(CountryID, CountryName)
VALUES (1, 'England')

  • Perform all your referential integrity checks, data validations using constraints (foreign key and check constraints). These constraints are faster than triggers. So, use triggers only for auditing, custom tasks and validations that can not be performed using these constraints. These constraints save you time as well, as you don't have to write code for these validations and the RDBMS will do all the work for you. * Always access tables in the same order in all your stored procedures/triggers consistently. This helps in avoiding deadlocks. Other things to keep in mind to avoid deadlocks are: Keep your transactions as short as possible. Touch as less data as possible during a transaction. Never, ever wait for user input in the middle of a transaction. Do not use higher level locking hints or restrictive isolation levels unless they are absolutely needed. Make your front-end applications deadlock-intelligent, that is, these applications should be able to resubmit the transaction incase the previous transaction fails with error 1205. In your applications, process all the results returned by SQL Server immediately, so that the locks on the processed rows are released, hence no blocking.
  • Offload tasks like string manipulations, concatenations, row numbering, case conversions, type conversions etc. to the front-end applications, if these operations are going to consume more CPU cycles on the database server (It's okay to do simple string manipulations on the database end though). Also try to do basic validations in the front-end itself during data entry. This saves unnecessary network roundtrips.
  • If back-end portability is your concern, stay away from bit manipulations with T-SQL, as this is very much RDBMS specific. Further, using bitmaps to represent different states of a particular entity conflicts with the normalization rules.
  • Consider adding a @Debug parameter to your stored procedures. This can be of bit data type. When a 1 is passed for this parameter, print all the intermediate results, variable contents using SELECT or PRINT statements and when 0 is passed do not print debug information. This helps in quick debugging of stored procedures, as you don't have to add and remove these PRINT/SELECT statements before and after troubleshooting problems.
  • Do not call functions repeatedly within your stored procedures, triggers, functions and batches. For example, you might need the length of a string variable in many places of your procedure, but don't call the LEN function whenever it's needed, instead, call the LEN function once, and store the result in a variable, for later use.
  • Make sure your stored procedures always return a value indicating the status. Standardize on the return values of stored procedures for success and failures. The RETURN statement is meant for returning the execution status only, but not data. If you need to return data, use OUTPUT parameters.
  • If your stored procedure always returns a single row resultset, consider returning the resultset using OUTPUT parameters instead of a SELECT statement, as ADO handles output parameters faster than resultsets returned by SELECT statements.
  • Always check the global variable @@ERROR immediately after executing a data manipulation statement (like INSERT/UPDATE/DELETE), so that you can rollback the transaction in case of an error (@@ERROR will be greater than 0 in case of an error). This is important, because, by default, SQL Server will not rollback all the previous changes within a transaction if a particular statement fails. This behavior can be changed by executing SET XACT_ABORT ON. The @@ROWCOUNT variable also plays an important role in determining how many rows were affected by a previous data manipulation (also, retrieval) statement, and based on that you could choose to commit or rollback a particular transaction.
  • To make SQL Statements more readable, start each clause on a new line and indent when needed. Following is an example:
SELECT title_id, title
FROM titles
WHERE title LIKE 'Computing%' AND
title LIKE 'Gardening%'
  • Though we survived the Y2K, always store 4 digit years in dates (especially, when using char or int datatype columns), instead of 2 digit years to avoid any confusion and problems. This is not a problem with datetime columns, as the century is stored even if you specify a 2 digit year. But it's always a good practice to specify 4 digit years even with datetime datatype columns.
  • In your queries and other SQL statements, always represent date in yyyy/mm/dd format. This format will always be interpreted correctly, no matter what the default date format on the SQL Server is. This also prevents the following error, while working with dates:
Server: Msg 242, Level 16, State 3, Line 2
The conversion of a char data type to a datetime data type resulted in an out-of-range datetime value.
  • As is true with any other programming language, do not use GOTO or use it sparingly. Excessive usage of GOTO can lead to hard-to-read-and-understand code.
  • Do not forget to enforce unique constraints on your alternate keys.
  • Always be consistent with the usage of case in your code. On a case insensitive server, your code might work fine, but it will fail on a case sensitive SQL Server if your code is not consistent in case. For example, if you create a table in SQL Server or database that has a case-sensitive or binary sort order, all references to the table must use the same case that was specified in the CREATE TABLE statement. If you name the table as 'MyTable' in the CREATE TABLE statement and use 'mytable' in the SELECT statement, you get an 'object not found' or 'invalid object name' error.
  • Though T-SQL has no concept of constants (like the ones in C language), variables will serve the same purpose. Using variables instead of constant values within your SQL statements, improves readability and maintainability of your code. Consider the following example:
UPDATE dbo.Orders
SET OrderStatus = 5
WHERE OrdDate < '2001/10/25'

The same update statement can be re-written in a more readable form as shown below:

DECLARE @ORDER_PENDING int
SET @ORDER_PENDING = 5

UPDATE dbo.Orders
SET OrderStatus = @ORDER_PENDING
WHERE OrdDate < '2001/10/25'
  • Do not use the column numbers in the ORDER BY clause as it impairs the readability of the SQL statement. Further, changing the order of columns in the SELECT list has no impact on the ORDER BY when the columns are referred by names instead of numbers. Consider the following example, in which the second query is more readable than the first one:
SELECT OrderID, OrderDate
FROM Orders
ORDER BY 2

SELECT OrderID, OrderDate
FROM Orders
ORDER BY OrderDate

What are federated database servers?

"Federated database servers" is a feature introduced in SQL Server 2000. A federation is a group of SQL Servers that cooperate to share the processing load of a system. Federated database servers let you scale out a set of servers to support the processing needs of large systems and websites.

Scaling out is the process of increasing the processing power of a system by adding one or more additional computers, or nodes, instead of beefing up the hardware of a single computer (Scaling up).

Federated database servers can be implemented using "Distributed Partitioned Views" (DPV). You can partition tables horizontally across several servers, and define a distributed partitioned view on one server, covering all member server partitions. This view makes it appear as if a full copy of the original table is stored on one server.

SQL Server 7.0 supports partitioned views too, but SQL Server 2000 came up with the following enhancements, that allow the views to scale out and form federations of database servers:
  • In SQL Server 2000, partitioned views are updateable
  • The SQL Server 2000 query optimizer supports new optimizations that minimize the amount of distributed data that has to be transferred. The distributed execution plans generated by SQL Server 2000 result in good performance for a larger set of queries than the plans generated by SQL Server 7.0
Why federated database servers and when to scale out?
When websites, applications generate processing loads that exceed the capacity of large individual servers, scaling out is the best option for increasing the processing capacity of the system. When the server for a particular application is at its maximum potential, and is no longer able to meet user demands, you should consider scaling out.

According to Microsoft, a federation of servers running SQL Server 2000 is capable of supporting the growth requirements of any Web site, or of the largest enterprise systems.

How are we able to gain performance by scaling out?
  • Distributed processing of data, which means individual member servers of the federation are working with smaller subsets of data, instead of the complete data.
  • More CPUs working in parallel.
  • Parallel disk I/O.
  • Availability of more RAM, as each member server is working with a smaller subset of data.
How to create distributed partitioned views?

Creation of partitioned views is explained in SQL Server Books Online, in the page titled "Creating a Partitioned View". The rules for creating updateable partitioned views are also explained in this page.

What is the impact of partitioned views on my front-end applications?

Instead of accessing the base tables directly, your front-end applications or stored procedures will access the partitioned view. SQL Server will take care of getting the right data from right servers, transparent to your application. Even in OLTP scenarios SQL Server manages the INSERT, UPDATE and DELETE commands transparently and sends them to the right partition.

What other things I should consider before implementing federated database servers?

  • Availability: Make sure all table partitions spread across different servers are accessible all the time, or else you will not be able to use the partitioned view.
  • Backup/Restore: If transactional consistency across the partitioned is not a concern, you can backup and restore some or all of your partitions individually. If you must achieve transactional consistency, perform coordinated backup and restore operations. This means that you backup all your partitions simultaneously and in case you have to restore them, you will have to restore all your partitions to a certain point in time.
  • Security: Since you have your data partitioned across multiple database servers, you have to follow consistent security practices across all your servers.
What is the performance gain I can expect by scaling out?

There is no easy way to calculate the performance gain, without actually implementing and testing the scenario, as the performance depends on a lot of other factors too. But in general, in a simulated DSS system, I achieved a performance gain of 25 to 30% by scaling out to just two SQL Servers.

Evaluation of fedarated databases.

Okay, there are no high end systems, no quad processor boxes, no RAID, and no dedicated network. It's an evaluation scenario at it's simplest form.

I horizontally partitioned a huge table (5 Million rows, not so huge anyways!) across two database servers with 2.5 million rows on each server. Idea is to compare the response times in the following two scenarios:
  • Having the table on one server (5 million rows)
  • Having the table spread across two servers (2.5 million rows on each server)
Here is the hardware configuration of both the servers:
Server1: Server1
OS: Microsoft Windows 2000 Advanced Server with SP1
System model: DELL OptiPlex GX1
Processor: Pentium III, 500MHz
RAM: 128 MB

Server2: Server2
OS: Microsoft Windows 2000 Advanced Server with SP1
System model: DELL OptiPlex GX1
Processor: Pentium III, 500MHz
RAM: 128 MB

Here is the table structure:
CREATE TABLE [dbo].[tblPriceDefn] (
[ConfigID] [numeric](19, 0) NOT NULL ,
[PriceListID] [varchar] (40) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[ValidityDate] [datetime] NOT NULL ,
[Price] [money] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[tblPriceDefn] WITH NOCHECK ADD
CONSTRAINT [PK_PriceDefn] PRIMARY KEY CLUSTERED
(
[ConfigID],
[PriceListID],
[ValidityDate]
) ON [PRIMARY]

Horizontal partitioning of the data:
For this evaluation purpose I took the tblPriceDefn table which has 5 Million rows. I divided the table into two equal halves depending upon the value of the PriceListID column. Stored the first half on the first server (Server1) and the second half on the second server (Server2). Then I created a distributed partioned view named View1 in the first server (Server1) which cobmines both the halves of the table tblPriceDefn. Here is the view definition:

CREATE VIEW View1
AS
SELECT * FROM [Server1].test.dbo.table1
UNION ALL
SELECT * FROM [Server2\inst1].test.dbo.table1


Performance testing:
The following queries are executed on the first server locally on both the base table and the partitioned view and the response times were recorded. Each query is executed 5 times and the average response time is recorded

Query 1:
--On the base table:
SELECT
PriceListID,
COUNT(*) [# of rows]
FROM
ArenaDBRel2.dbo.tblPriceDefn
GROUP BY
PriceListID
ORDER BY
PriceListID

Average response time on the base table with 5 million rows: 59 seconds

--On the partitioned view:
SELECT
PriceListID,
COUNT(*) [# of rows]
FROM
test.dbo.View1
GROUP BY
PriceListID
ORDER BY
PriceListID

Average response time on the partitioned view : 41 seconds
A gain of 18 seconds which is a 30% improvement in performance

Query 2:

--On the base table:
SELECT
DISTINCT PriceListID
FROM
ArenaDBRel2.dbo.tblPriceDefn
Average response time on the base table with 5 million rows: 45 seconds

--On the partitioned view:
SELECT
DISTINCT PriceListID
FROM
test.dbo.View1

Average response time on the partitioned view: 32 seconds
A gain of 13 seconds which is a 28% improvement in performance


Query 3:

--On the base table:
SELECT
PriceListID,
ConfigID,
Price
FROM
ArenaDBRel2.dbo.tblPriceDefn
WHERE
PriceListID = 'M05-05'
OR
PriceListID = 'M10-01'

Average response time on the base table with 5 million rows: 59 seconds
--On the partitioned view:
SELECT
PriceListID,
ConfigID,
Price
FROM
test.dbo.View1
WHERE
PriceListID = 'M05-05'
OR
PriceListID = 'M10-01'

Average response time on the partitioned view: 44 seconds
A gain of 15 seconds which is a 25% improvement in performance

Conclusion:
The federated database servers feature in SQL Server 2000 is a very useful one and we can be sure to see performance gains in the query response times. As I mentioned earlier, this is a very simple evaluation of the feature. I am going to try it out with more concurrent connections and complicated queries and will post more information on this site. Be sure to check back after a while! Please let me know if you have any suggestions regarding this page!

SQL Server security model

Security is a major concern for the modern age systems/network/database administrators. It is natural for an administrator to worry about hackers and external attacks while implementing security. But there is more to it. It is essential to first implement security within the organization, to make sure right people have access to the right data. Without these security measures in place, you might find someone destroying your valuable data, or selling your company's secrets to your competitors or someone invading the privacy of others. Primarily a security plan must identify which users in the organization can see which data and perform which activities in the database.

SQL Server security model
To be able to access data from a database, a user must pass through two stages of authentication, one at the SQL Server level and the other at the database level. These two stages are implemented using Logins names and User accounts respectively. A valid login is required to connect to SQL Server and a valid user account is required to access a database.

Login: A valid login name is required to connect to an SQL Server instance. A login could be:
  • A Windows NT/2000 login that has been granted access to SQL Server
  • An SQL Server login, that is maintained within SQL Server

These login names are maintained within the master database. So, it is essential to backup the master database after adding new logins to SQL Server.

User: A valid user account within a database is required to access that database. User accounts are specific to a database. All permissions and ownership of objects in the database are controlled by the user account. SQL Server logins are associated with these user accounts. A login can have associated users in different databases, but only one user per database.

During a new connection request, SQL Server verifies the login name supplied, to make sure, that login is authorized to access SQL Server. This verification is called Authentication. SQL Server supports two authentication modes:

  • Windows authentication mode: With Windows authentication, you do not have to specify a login name and password, to connect to SQL Server. Instead, your access to SQL Server is controlled by your Windows NT/2000 account (or the group to which your account belongs to), that you used to login to the Windows operating system on the client computer/workstation. A DBA must first specify to SQL Server, all the Microsoft Windows NT/2000 accounts or groups that can connect to SQL Server
  • Mixed mode: Mixed mode allows users to connect using Windows authentication or SQL Server authentication. Your DBA must first create valid SQL Server login accounts and passwords. These are not related to your Microsoft Windows NT/2000 accounts. With this authentication mode, you must supply the SQL Server login and password when you connect to SQL Server. If you do not specify SQL Server login name and password, or request Windows Authentication, you will be authenticated using Windows Authentication.

Point to note is that, whatever mode you configure your SQL Server to use, you can always login using Windows authentication.

Windows authentication is the recommended security mode, as it is more secure and you don't have to send login names and passwords over the network. You should avoid mixed mode, unless you have a non-Windows NT/2000 environment or when your SQL Server is installed on Windows 95/98 or for backward compatibility with your existing applications.

SQL Server's authentication mode can be changed using Enterprise Manager (Right click on the server name and click on Properties. Go to the Security tab).

Authentication mode can also be changed using SQL DMO object model.

Here is a list of helpful stored procedures for managing logins and users:

sp_addlogin Creates a new login that allows users to connect to SQL Server using SQL Server authentication
sp_grantlogin Allows a Windows NT/2000 user account or group to connect to SQL Server using Windows authentication
sp_droplogin Drops an SQL Server login
sp_revokelogin Drops a Windows NT/2000 login/group from SQL Server
sp_denylogin Prevents a Windows NT/2000 login/group from connecting to SQL Server
sp_password Adds or changes the password for an SQL Server login
sp_helplogins Provides information about logins and their associated users in each database
sp_defaultdb Changes the default database for a login
sp_grantdbaccess Adds an associated user account in the current database for an SQL Server login or Windows NT/2000 login
sp_revokedbaccess Drops a user account from the current database
sp_helpuser Reports information about the Microsoft users and roles in the current database


Now let's talk about controlling access to objects within the database and managing permissions. Apart from managing permissions at the individual database user level, SQL Server 7.0/2000 implements permissions using roles. A role is nothing but a group to which individual logins/users can be added, so that the permissions can be applied to the group, instead of applying the permissions to all the individual logins/users. There are three types of roles in SQL Server 7.0/2000:
  • Fixed server roles
  • Fixed database roles
  • Application roles
Fixed server roles: These are server-wide roles. Logins can be added to these roles to gain the associated administrative permissions of the role. Fixed server roles cannot be altered and new server roles cannot be created. Here are the fixed server roles and their associated permissions in SQL Server 2000:

Fixed server role Description
sysadmin Can perform any activity in SQL Server
serveradmin Can set server-wide configuration options, shut down the server
setupadmin Can manage linked servers and startup procedures
securityadmin Can manage logins and CREATE DATABASE permissions, also read error logs and change passwords
processadmin Can manage processes running in SQL Server
dbcreator Can create, alter, and drop databases
diskadmin Can manage disk files
bulkadmin Can execute BULK INSERT statements

Here is a list of stored procedures that are helpful in managing fixed server roles:

sp_addsrvrolemember Adds a login as a member of a fixed server role
sp_dropsrvrolemember Removes an SQL Server login, Windows user or group from a fixed server role
sp_helpsrvrole Returns a list of the fixed server roles
sp_helpsrvrolemember Returns information about the members of fixed server roles
sp_srvrolepermission Returns the permissions applied to a fixed server role


Fixed database roles: Each database has a set of fixed database roles, to which database users can be added. These fixed database roles are unique within the database. While the permissions of fixed database roles cannot be altered, new database roles can be created. Here are the fixed database roles and their associated permissions in SQL Server 2000:

Fixed database role Description
db_owner Has all permissions in the database
db_accessadmin Can add or remove user IDs
db_securityadmin Can manage all permissions, object ownerships, roles and role memberships
db_ddladmin Can issue ALL DDL, but cannot issue GRANT, REVOKE, or DENY statements
db_backupoperator Can issue DBCC, CHECKPOINT, and BACKUP statements
db_datareader Can select all data from any user table in the database
db_datawriter Can modify any data in any user table in the database
db_denydatareader Cannot select any data from any user table in the database
db_denydatawriter Cannot modify any data in any user table in the database


Here is a list of stored procedures that are helpful in managing fixed database roles:

sp_addrole Creates a new database role in the current database
sp_addrolemember Adds a user to an existing database role in the current database
sp_dbfixedrolepermission Displays permissions for each fixed database role
sp_droprole Removes a database role from the current database
sp_helpdbfixedrole Returns a list of fixed database roles
sp_helprole Returns information about the roles in the current database
sp_helprolemember Returns information about the members of a role in the current database
sp_droprolemember Removes users from the specified role in the current database

Application roles: Application roles are another way of implementing permissions. These are quite different from the server and database roles. After creating and assigning the required permissions to an application role, the client application needs to activate this role at run-time to get the permissions associated with that application role. Application roles simplify the job of DBAs, as they don't have to worry about managing permissions at individual user level. All they need to do is to create an application role and assign permissions to it. The application that is connecting to the database activates the application role and inherits the permissions associated with that role. Here are the characteristics of application roles:

  • There are no built-in application roles
  • Application roles contain no members
  • Application roles need to be activated at run-time, by the application, using a password
  • Application roles override standard permissions. For example, after activating the application role, the application will lose all the permissions associated with the login/user account used while connecting to SQL Server and gain the permissions associated with the application role.
  • Application roles are database specific. After activating an application role in a database, if that application wants to run a cross-database transaction, the other database must have a guest user account enabled

Here are the stored procedures that are required to manage application roles:

sp_addapprole Adds an application role in the current database
sp_approlepassword Changes the password of an application role in the current database
sp_dropapprole Drops an application role from the current database
sp_setapprole Activates the permissions associated with an application role in the current database


Now that we discussed different kinds of roles, let's talk about granting/revoking permissions to/from database users and database roles and application roles. The following T-SQL commands are used to manage permissions at the user and role level.
  • GRANT: Grants the specific permission (Like SELECT, DELETE etc.) to the specified user or role in the current database
  • REVOKE: Removes a previously granted or denied permission from a user or role in the current database
  • DENY: Denies a specific permission to the specified user or role in the current database
Using the above commands, permissions can be granted/denied/revoked to users/roles on all database objects. You can manage permissions at as low as the column level.

Note: There is no way to manage permissions at the row level. That is, in a given table, you can't grant SELECT permission on a specific row to User1 and deny SELECT permission on another row to User2. This kind of security can be implemented by using views and stored procedures effectively. Click here to read about row level security implementation in SQL Server databases. Just an FYI, Oracle has a feature called "Virtual Private Databases" (VPD) that allows DBAs to configure permissions at row level.
SQL Server security best practices
Here is an ideal implementation of security in a Windows NT/2000 environment with SQL Server 7.0/2000 database server:

  • Configure SQL Server to use Windows authentication mode
  • Depending upon the data access needs of your domain users, group them into different global groups in the domain
  • Consolidate these global groups from all the trusted domains into the Windows NT/2000 local groups in your SQL Server computer
  • The Windows NT/2000 local groups are then granted access to log into the SQL Server
  • Add these Windows NT/2000 local groups to the required fixed server roles in SQL Server
  • Associate these local group logins with individual user accounts in the databases and grant them the required permissions using the database roles.
  • Create custom database roles if required, for finer control over permissions

Here is a security checklist and some standard security practices and tips:

  • Restrict physical access to the SQL Server computer. Always lock the server while not in use.
  • Make sure, all the file and disk shares on the SQL Server computer are read-only. In case you have read-write shares, make sure only the right people have access to those shares.
  • Use the NTFS file system as it provides advanced security and recovery features.
  • Prefer Windows authentication to mixed mode. If mixed mode authentication is inevitable, for backward compatibility reasons, make sure you have complex passwords for sa and all other SQL Server logins. It is recommended to have mixed case passwords with a few numbers and/or special characters, to counter the dictionary based password guessing tools and user identity spoofing by hackers.
  • Rename the Windows NT/2000 Administrator account on the SQL Server computer to discourage hackers from guessing the administrator password.
  • In a website environment, keep your databases on a different computer than the one running the web service. In other words, keep your SQL Server off the Internet, for security reasons.
  • Keep yourself up-to-date with the information on latest service packs and security patches released by Microsoft. Carefully evaluate the service packs and patches before applying them on the production SQL Server. Bookmark this page for the latest in the security area from Microsoft: http://www.microsoft.com/security/
  • If it is appropriate for your environment, hide the SQL Server service from appearing in the server enumeration box in Query Analyzer, using the /HIDDEN:YES switch of NET CONFIG SERVER command.
  • Enable login auditing at the Operating System and SQL Server level. Examine the audit for login failure events and look for trends to detect any possible intrusion.
  • If it fits your budget, use Intrusion Detection Systems (IDS), especially on high-risk online database servers. IDS can constantly analyze the inbound network traffic, look for trends and detect Denial of Service (DoS) attacks and port scans. IDS can be configured to alert the administrators upon detecting a particular trend.
  • Disable guest user account of Windows. Drop guest user from production databases using sp_dropuser.
  • Do not let your applications query and manipulate your database directly using SELECT/INSERT/UPDATE/DELETE statements. Wrap these commands within stored procedures and let your applications call these stored procedures. This helps centralize business logic within the database, at the same time hides the internal database structure from client applications.
  • Let your users query views instead of giving them access to the underlying base tables.
  • Discourage applications from executing dynamic SQL statements. To execute a dynamic SQL statement, users need explicit permissions on the underlying tables. This defeats the purpose of restricting access to base tables using stored procedures and views.
  • Don't let applications accept SQL commands from users and execute them against the database. This could be dangerous (known as SQL injection), as a skilled user can input commands that can destroy the data or gain unauthorized access to sensitive information.
  • Take advantage of the fixed server and database roles by assigning users to the appropriate roles. You could also create custom database roles that suit your needs.
  • Carefully choose the members of the sysadmin role, as the members of the sysadmin role can do anything in the SQL Server. Note that, by default, the Windows NT/2000 local administrators group is a part of the sysadmin fixed server role.
  • Constantly monitor error logs and event logs for security related alerts and errors.
  • SQL Server error logs can reveal a great deal of information about your server. So, secure your error logs by using NTFS permissions.
  • Secure your registry by restricting access to the SQL Server specific registry keys like HKEY_LOCAL_MACHINE\Software\Microsoft\MSSQLServer.
  • If your databases contain sensitive information, consider encrypting the sensitive pieces (like credit card numbers and Social Security Numbers (SSN)). There are undocumented encryption functions in SQL Server, but I wouldn't recommend those. If you have the right skills available in your organization, develop your own encryption/decryption modules using Crypto API or other encryption libraries.
  • If you are running SQL Server 7.0, you could use the encryption capabilities of the Multi-Protocol net library for encrypted data exchange between the client and SQL Server. SQL Server 2000 supports encryption over all protocols using Secure Socket Layer (SSL). See SQL Server 7.0 and 2000 Books Online (BOL) for more information on this topic. Please note that, enabling encryption is always a tradeoff between security and performance, because of the additional overhead of encryption and decryption.
  • Prevent unauthorized access to linked servers by deleting the linked server entries that are no longer needed. Pay special attention to the login mapping between the local and remote servers. Use logins with the bare minimum privileges for configuring linked servers.
  • DBAs generally tend to run SQL Server service using a domain administrator account. That is asking for trouble. A malicious SQL Server user could take advantage of these domain admin privileges. Most of the times, a local administrator account would be more than enough for SQL Server service.
  • DBAs also tend to drop system stored procedures like xp_cmdshell and all the OLE automation stored procedures (sp_OACreate and the likes). Instead of dropping these procedures, deny EXECUTE permission on them to specific users/roles. Dropping these procedures would break some of the SQL Server functionality.
  • Be prompt in dropping the SQL Server logins of employees leaving the organization. Especially, in the case of a layoff, drop the logins of those poor souls ASAP as they could do anything to your data out of frustration.
  • When using mixed mode authentication, consider customizing the system stored procedure sp_password, to prevent users from using simple and easy-to-guess passwords.
  • To setup secure data replication over Internet or Wide Area Networks (WAN), implement Virtual Private Networks (VPN) . Securing the snapshot folder is important too, as the snapshot agent exports data and object scripts from published databases to this folder in the form of text files. Only the replication agents should have access to the snapshot folder.
  • It is good to have a tool like Lumigent Log Explorer handy, for a closer look at the transaction log to see who is doing what in the database.
  • Do not save passwords in your .udf files, as the password gets stored in clear text.
  • If your database code is proprietary, encrypt the definition of stored procedures, triggers, views and user defined functions using the WITH ENCRYPTION clause. dbLockdown is a tool that automates the insertion of the WITH ENCRYPTION clause and handles all the archiving of encrypted database objects so that they can be restored again in a single click. Click here to find out more information about this product.
  • In database development environments, use a source code control system like Visual Source Safe (VSS) or Rational Clear Case. Control access to source code by creating users in VSS and giving permissions by project. Reserve the 'destroy permanently' permission for VSS administrator only. After project completion, lock your VSS database or leave your developers with just read-only access.
  • Store the data files generated by DTS or BCP in a secure folder/share and delete these files once you are done.
  • Install anti-virus software on the SQL Server computer, but exclude your database folders from regular scans. Keep your anti-virus signature files up to date.
  • SQL Server 2000 allows you to specify a password for backups. If a backup is created with a password, you must provide that password to restore from that backup. This discourages unauthorized access to backup files.
  • Windows 2000 introduced Encrypted File System (EFS) that allows you to encrypt individual files and folders on an NTFS partition. Use this feature to encrypt your SQL Server database files. You must encrypt the files using the service account of SQL Server. When you want to change the service account of SQL Server, you must decrypt the files, change the service account and encrypt the files again with the new service account.

Tuesday, June 23, 2009

Q & A

What is a self join? Explain it with an example.
A.Self join is just like any other join, except that two instances of the same table will be joined in the query. Here is an example: Employees table which contains rows for normal employees as well as managers. So, to find out the managers of all the employees, you need a self join.

CREATE TABLE emp
(
empid int,
mgrid int,
empname char(10)
)

INSERT emp SELECT 1,2,'Vyas'
INSERT emp SELECT 2,3,'Mohan'
INSERT emp SELECT 3,NULL,'Shobha'
INSERT emp SELECT 4,2,'Shridhar'
INSERT emp SELECT 5,2,'Sourabh'

SELECT t1.empname [Employee], t2.empname [Manager]
FROM emp t1, emp t2
WHERE t1.mgrid = t2.empid

Here's an advanced query using a LEFT OUTER JOIN that even returns the employees without managers (super bosses)

SELECT t1.empname [Employee], COALESCE(t2.empname, 'No manager') [Manager]
FROM emp t1
LEFT OUTER JOIN
emp t2
ON
t1.mgrid = t2.empid


There is a trigger defined for INSERT operations on a table, in an OLTP system. The trigger is written to instantiate a COM object and pass the newly insterted rows to it for some custom processing. What do you think of this implementation? Can this be implemented better?
A.Instantiating COM objects is a time consuming process and since you are doing it from within a trigger, it slows down the data insertion process. Same is the case with sending emails from triggers. This scenario can be better implemented by logging all the necessary data into a separate table, and have a job which periodically checks this table and does the needful.

What are triggers? How many triggers you can have on a table? How to invoke a trigger on demand?
A.Triggers are special kind of stored procedures that get executed automatically when an INSERT, UPDATE or DELETE operation takes place on a table.
In SQL Server 6.5 you could define only 3 triggers per table, one for INSERT, one for UPDATE and one for DELETE. From SQL Server 7.0 onwards, this restriction is gone, and you could create multiple triggers per each action. But in 7.0 there's no way to control the order in which the triggers fire. In SQL Server 2000 you could specify which trigger fires first or fires last using sp_settriggerorder
Triggers can't be invoked on demand. They get triggered only when an associated action (INSERT, UPDATE, DELETE) happens on the table on which they are defined.
Triggers are generally used to implement business rules, auditing. Triggers can also be used to extend the referential integrity checks, but wherever possible, use constraints for this purpose, instead of triggers, as constraints are much faster.
Till SQL Server 7.0, triggers fire only after the data modification operation happens. So in a way, they are called post triggers. But in SQL Server 2000 you could create pre triggers also. Search SQL Server 2000 books online for INSTEAD OF triggers.
Also check out books online for 'inserted table', 'deleted table' and COLUMNS_UPDATED()

What is the system function to get the current user's user id?
A.USER_ID(). Also check out other system functions like USER_NAME(), SYSTEM_USER, SESSION_USER, CURRENT_USER, USER, SUSER_SID(), HOST_NAME().

What is an extended stored procedure? Can you instantiate a COM object by using T-SQL?
A.An extended stored procedure is a function within a DLL (written in a programming language like C, C++ using Open Data Services (ODS) API) that can be called from T-SQL, just the way we call normal stored procedures using the EXEC statement.
Yes, you can instantiate a COM (written in languages like VB, VC++) object from T-SQL by using sp_OACreate stored procedure. Also see books online for sp_OAMethod, sp_OAGetProperty, sp_OASetProperty, sp_OADestroy. For an example of creating a COM object in VB and calling it from T-SQL, see 'My code library' section of this site.

Can you have a nested transaction?
A.Yes, very much. Check out BEGIN TRAN, COMMIT, ROLLBACK, SAVE TRAN and @@TRANCOUNT

What is a join and explain different types of joins.
A.Joins are used in queries to explain how different tables are related. Joins also let you select data from a table depending upon data from another table.

Types of joins: INNER JOINs, OUTER JOINs, CROSS JOINs. OUTER JOINs are further classified as LEFT OUTER JOINS, RIGHT OUTER JOINS and FULL OUTER JOINS.

Write down the general syntax for a SELECT statements covering all the options.
A.Here's the basic syntax:
SELECT select_list
[INTO new_table_]
FROM table_source
[WHERE search_condition]
[GROUP BY group_by_expression]
[HAVING search_condition]
[ORDER BY order_expression [ASC | DESC] ]

What are cursors? Explain different types of cursors. What are the disadvantages of cursors? How can you avoid cursors?
A.Cursors allow row-by-row prcessing of the resultsets.

Types of cursors: Static, Dynamic, Forward-only, Keyset-driven. See books online for more information.
Disadvantages of cursors: Each time you fetch a row from the cursor, it results in a network roundtrip, where as a normal SELECT query makes only one rowundtrip, however large the resultset is. Cursors are also costly because they require more resources and temporary storage (results in more IO operations). Furthere, there are restrictions on the SELECT statements that can be used with some types of cursors.
Most of the times, set based operations can be used instead of cursors. Here is an example:
If you have to give a flat hike to your employees using the following criteria:
Salary between 30000 and 40000 -- 5000 hike
Salary between 40000 and 55000 -- 7000 hike
Salary between 55000 and 65000 -- 9000 hike
In this situation many developers tend to use a cursor, determine each employee's salary and update his salary according to the above formula. But the same can be achieved by multiple update statements or can be combined in a single UPDATE statement as shown below:
UPDATE tbl_emp SET salary =
CASE WHEN salary BETWEEN 30000 AND 40000 THEN salary + 5000
WHEN salary BETWEEN 40000 AND 55000 THEN salary + 7000
WHEN salary BETWEEN 55000 AND 65000 THEN salary + 10000
END
Another situation in which developers tend to use cursors: You need to call a stored procedure when a column in a particular row meets certain condition. You don't have to use cursors for this. This can be achieved using WHILE loop, as long as there is a unique key to identify each row.

What is database replicaion? What are the different types of replication you can set up in SQL Server?
A.Replication is the process of copying/moving data between databases on the same or different servers. SQL Server supports the following types of replication scenarios:
  • Snapshot replication
  • Transactional replication (with immediate updating subscribers, with queued updating subscribers)
  • Merge replication
Explian different types of BACKUPs avaialabe in SQL Server? Given a particular scenario, how would you go about choosing a backup plan?
A.Types of backups you can create in SQL Sever 7.0+ are Full database backup, differential database backup, transaction log backup, filegroup backup.

What are the different ways of moving data/databases between servers and databases in SQL Server?
A.There are lots of options available, you have to choose your option depending upon your requirements. Some of the options you have are: BACKUP/RESTORE, dettaching and attaching databases, replication, DTS, BCP, logshipping, INSERT...SELECT, SELECT...INTO, creating INSERT scripts to generate data.

What are statistics, under what circumstances they go out of date, how do you update them
A.Statistics determine the selectivity of the indexes. If an indexed column has unique values then the selectivity of that index is more, as opposed to an index with non-unique values. Query optimizer uses these indexes in determining whether to choose an index or not while executing a query.

Some situations under which you should update statistics:
1) If there is significant change in the key values in the index
2) If a large amount of data in an indexed column has been added, changed, or removed (that is, if the distribution of key values has changed), or the table has been truncated using the TRUNCATE TABLE statement and then repopulated
3) Database is upgraded from a previous version

Look up SQL Server books online for the following commands: UPDATE STATISTICS, STATS_DATE, DBCC SHOW_STATISTICS, CREATE STATISTICS, DROP STATISTICS, sp_autostats, sp_createstats, sp_updatestats

As a part of your job, what are the DBCC commands that you commonly use for database maintenance?
A.DBCC CHECKDB, DBCC CHECKTABLE, DBCC CHECKCATALOG, DBCC CHECKALLOC, DBCC SHOWCONTIG, DBCC SHRINKDATABASE, DBCC SHRINKFILE etc. But there are a whole load of DBCC commands which are very useful for DBAs

How to restart SQL Server in single user mode? How to start SQL Server in minimal configuration mode?
A.SQL Server can be started from command line, using the SQLSERVR.EXE. This EXE has some very important parameters with which a DBA should be familiar with. -m is used for starting SQL Server in single user mode and -f is used to start the SQL Server in minimal confuguration mode. Check out SQL Server books online for more parameters and their explanations.

Explain CREATE DATABASE syntax
A.Many of us are used to craeting databases from the Enterprise Manager or by just issuing the command: CREATE DATABAE MyDB. But what if you have to create a database with two filegroups, one on drive C and the other on drive D with log on drive E with an initial size of 600 MB and with a growth factor of 15%? That's why being a DBA you should be familiar with the CREATE DATABASE syntax. Check out SQL Server books online for more information.

What is blocking and how would you troubleshoot it?
A.Blocking happens when one connection from an application holds a lock and a second connection requires a conflicting lock type. This forces the second connection to wait, blocked on the first.

What is a deadlock and what is a live lock? How will you go about resolving deadlocks?
A.Deadlock is a situation when two processes, each having a lock on one piece of data, attempt to acquire a lock on the other's piece. Each process would wait indefinitely for the other to release the lock, unless one of the user processes is terminated. SQL Server detects deadlocks and terminates one user's process.

A livelock is one, where a request for an exclusive lock is repeatedly denied because a series of overlapping shared locks keeps interfering. SQL Server detects the situation after four denials and refuses further shared locks. A livelock also occurs when read transactions monopolize a table or page, forcing a write transaction to wait indefinitely.


What are the steps you will take, if you are tasked with securing an SQL Server?
A.Again this is another open ended question. Here are some things you could talk about: Preferring NT authentication, using server, databse and application roles to control access to the data, securing the physical database files using NTFS permissions, using an unguessable SA password, restricting physical access to the SQL Server, renaming the Administrator account on the SQL Server computer, disabling the Guest account, enabling auditing, using multiprotocol encryption, setting up SSL, setting up firewalls, isolating SQL Server from the web server etc.
http://vyaskn.tripod.com/sql_server_security_best_practices.htm



What are the steps you will take to improve performance of a poor performing query?
A.This is a very open ended question and there could be a lot of reasons behind the poor performance of a query. But some general issues that you could talk about would be: No indexes, table scans, missing or out of date statistics, blocking, excess recompilations of stored procedures, procedures and triggers without SET NOCOUNT ON, poorly written query with unnecessarily complicated joins, too much normalization, excess usage of cursors and temporary tables.
Some of the tools/ways that help you troubleshooting performance problems are: SET SHOWPLAN_ALL ON, SET SHOWPLAN_TEXT ON, SET STATISTICS IO ON, SQL Server Profiler, Windows NT /2000 Performance monitor, Graphical execution plan in Query Analyzer.

What is RAID and what are different types of RAID configurations?
A.RAID stands for Redundant Array of Inexpensive Disks, used to provide fault tolerance to database servers. There are six RAID levels 0 through 5 offering different levels of performance, fault tolerance.
http://www.raid-advisory.com/

What is an index? What are the types of indexes? How many clustered indexes can be created on a table? I create a separate index on each column of a table. what are the advantages and disadvantages of this approach?
A.Indexes in SQL Server are similar to the indexes in books. They help SQL Server retrieve the data quicker.
Indexes are of two types. Clustered indexes and non-clustered indexes. When you craete a clustered index on a table, all the rows in the table are stored in the order of the clustered index key. So, there can be only one clustered index per table. Non-clustered indexes have their own storage separate from the table data storage. Non-clustered indexes are stored as B-tree structures (so do clustered indexes), with the leaf level nodes having the index key and it's row locater. The row located could be the RID or the Clustered index key, depending up on the absence or presence of clustered index on the table.
If you create an index on each column of a table, it improves the query performance, as the query optimizer can choose from all the existing indexes to come up with an efficient execution plan. At the same t ime, data modification operations (such as INSERT, UPDATE, DELETE) will become slow, as every time data changes in the table, all the indexes need to be updated. Another disadvantage is that, indexes need disk space, the more indexes you have, more disk space is used.

What's the difference between DELETE TABLE and TRUNCATE TABLE commands?
A.DELETE TABLE is a logged operation, so the deletion of each row gets logged in the transaction log, which makes it slow. TRUNCATE TABLE also deletes all the rows in a table, but it won't log the deletion of each row, instead it logs the deallocation of the data pages of the table, which makes it faster. Of course, TRUNCATE TABLE can be rolled back.

What is lock escalation?
A.Lock escalation is the process of converting a lot of low level locks (like row locks, page locks) into higher level locks (like table locks). Every lock is a memory structure too many locks would mean, more memory being occupied by locks. To prevent this from happening, SQL Server escalates the many fine-grain locks to fewer coarse-grain locks. Lock escalation threshold was definable in SQL Server 6.5, but from SQL Server 7.0 onwards it's dynamically managed by SQL Server.

CREATE INDEX myIndex ON myTable(myColumn)
What type of Index will get created after executing the above statement?
A.Non-clustered index. Important thing to note: By default a clustered index gets created on the primary key, unless specified otherwise.

Explain different isolation levels
A.An isolation level determines the degree of isolation of data between concurrent transactions. The default SQL Server isolation level is Read Committed. Here are the other isolation levels (in the ascending order of isolation): Read Uncommitted, Read Committed, Repeatable Read, Serializable. See SQL Server books online for an explanation of the isolation levels. Be sure to read about SET TRANSACTION ISOLATION LEVEL, which lets you customize the isolation level at the connection level.

What is a transaction and what are ACID properties?
A.A transaction is a logical unit of work in which, all the steps must be performed or none. ACID stands for Atomicity, Consistency, Isolation, Durability. These are the properties of a transaction.

What are defaults? Is there a column to which a default can't be bound?
A.A default is a value that will be used by a column, if no value is supplied to that column while inserting data. IDENTITY columns and timestamp columns can't have defaults bound to them.

Define candidate key, alternate key, composite key.
A candidate key is one that can identify each row of a table uniquely. Generally a candidate key becomes the primary key of the table. If the table has more than one candidate key, one of them will become the primary key, and the rest are called alternate keys.

A key formed by combining at least two or more columns is called composite key.

What is bit datatype and what's the information that can be stored inside a bit column?
A.Bit datatype is used to store boolean information like 1 or 0 (true or false). Untill SQL Server 6.5 bit datatype could hold either a 1 or 0 and there was no support for NULL. But from SQL Server 7.0 onwards, bit datatype can represent a third state, which is NULL.

What are user defined datatypes and when you should go for them?
A.User defined datatypes let you extend the base SQL Server datatypes by providing a descriptive name, and format to the database. Take for example, in your database, there is a column called Flight_Num which appears in many tables. In all these tables it should be varchar(8). In this case you could create a user defined datatype called Flight_num_type of varchar(8) and use it across all your tables.

What's the difference between a primary key and a unique key?
A.Both primary key and unique enforce uniqueness of the column on which they are defined. But by default primary key creates a clustered index on the column, where are unique creates a nonclustered index by default. Another major difference is that, primary key doesn't allow NULLs, but unique key allows one NULL only.

How you can rename a database ?
A. To rename a database first get the database into Single user mode . by using the query below ( run these query in query analyzer)

ALTER DATABASE DBMydb SET SINGLE_USER WITH ROLLBACK IMMEDIATE

then rename the database using sp_rename

sp_renamedb 'DBMydb', 'DBmydb_new'

then change the access mode of the database to multiuser

ALTER DATABASE DBmydb_new SET MULTI_USER



What is mean by suspect status of a database ?
A. SQL Server gives a database 'suspect' status when it is unable to recover the database . reasons may be many , may be a torn page detection , may be datafile location no found.

How many access mode of database are in MSSQL
A there are three access mode of a database SINGLE_USER , RESTRICTED_USER and MULTI_USER . SINGLE_USER mode restricts database access to one user Connection. RESTRICTED_USER mode restricts database access to db_owner, dbcreator, and sysadmin roles. All other users are disconnected.MULTI_USER mode lets all users with proper access connect to the database.

What are the default databases in MSSQL Server Instance after Fresh installation
A. there are 4 default databases .
  • Master
    This is your most critical database, and is the core of your SQL Server implementation.
    Limit changes and access to this database as much as possible. Without the master
    database, your SQL Server instance will not start up.
  • MSDB
    This database contains tables that control and track the elements of the SQL Server
    Agent, as well as replication, log shipping (Enterprise Edition), database backup and
    restore history, database maintenance plan configurations and history, and a subset of
    tables participating in replication status and jobs. DTS packages saved to SQL Server are also stored within this database.
  • Model
    The model database is a template database, which, when modified, will affect the default
    values of databases that are created in the future. For example, if a table is added to the
    model database, future databases created for the SQL Server instance will include this
    table.
  • Tempdb
    The tempdb database is used to hold all temporary tables and temporary stored
    procedures. SQL Server also uses it for internal sorting and query operations (called
    work tables, or spooling). This database is re-created each time SQL Server is started,
    and cannot be used to store permanent tables.


What is replication ?
A. Replication is a process which enables data and database objects to be copied and modified from one database to another across different networks and platforms.the physical seperation of the databases and latency are integral part of the design process in replication. MSSQL Server 2000 permits 3 different kind of replication. they are snapshot,transactional and merge.

How many typf of backups are supported by MSSQL ?

A. there are three type of backup
  • Full Database Backup.
  • Transaction Log Backup
  • Differential Backup
Whats is Log Shipping ?
A. Log shipping is the process of backing up of database and transaction log files on a production SQL server, and then restoring them onto a standby server. The key feature of log shipping is that it will automatically backup transaction logs throughout the day (for whatever interval you specify) and automatically restore them on the standby server. This in effect keeps the two SQL Servers in "sync". Should the production server fail, all you have to do is point the users to the new server, and you are all set.

Name the types of Recovery model in SQL Server ?
A. There are thre recovery model in MSSQL Server
• Simple Recovery Model .
• Full Recovery Model.
• Bulk Logged Recovery Model.

What is a Dead Lock ?

A . Deadlock occurs when two users have locks on separate objects and each user wants a lock on the other's object. For example, User1 has a lock on object "A" and wants a lock on object "B" and User2 has a lock on object "B" and wants a lock on object "A". In this case, SQL Server ends a deadlock by choosing the user, who will be a deadlock victim. After that, SQL Server rolls back the breaking user's transaction, sends message number 1205 to notify the user's application about breaking, and then allows the nonbreaking user's process to continue.
Note. To reduce the chance of a deadlock, you should minimize the size of transactions and transaction times.

How many type of Locks are there ?

A. There are three main types of locks that SQL Server 7.0/2000 uses
Shared locks
Update locks
Exclusive locks

A database is consists of how many type of files ? Name their extensions

A. A database can consist of three types of file :
Primary file , Seconday file and transaction logs
The primary data file is the starting point of the database and points to any other files in the database. If the primary file is not large enough to hold all the data in the database, then secondary files are required.Transaction logs files hold records of updates to the data files, and can be used to recover the database after an error. Every database has at least one transaction log file, and some databases may have more than one.
The recommended file name extension for primary data files is .mdf.
The recommended file name extension for secondary data files is .ndf.
The recommended file name extension for log files is .ldf

How many type of indexes are there ?
How many Index can be created on a table ?
A. there are two type of indexes Clustered and Non-Clustured, 249 Non Clustured and 1 Clustered index

How you can get the last identity value inserted in any table ?
A.
SQL Server has a System Variable @@IDENTITY which gives the last identity element value inserted in any table

How many type of Contraints are in MSSQL.
A.
SQL Server 2000 supports five classes of constraints.
1) NOT NULL
2) CHECK
3) UNIQUE
4) PRIMARY KEY
5) FOREIGN KEY

How you can trap error in Sql Server
A. by using @@error .
After each Transact-SQL statement, the server sets the variable to an integer value:
0—if the statement was successful
Error number—if the statement has failed

What is @@Fetch_status ?

A. @@fetch_status is a function (or global variable) that returns the success code of the last Fetch statement executed during the current connection. It is often used as an exit criterion in loops that fetch records from a cursor.

Whats the limitation of user defined funtion ?

A User-defined functions have one serious limitation. They cannot have side effects. A function side effect is any permanent change to resources (such as tables) that have a scope outside of the function (such as a non-temporary table that is not declared in the function). Basically, this requirement means that a function should return a value while changing nothing in the database. it means in short that "user defined function can not use UPDATE / DELETE on permament table objects in MSSQL"

In how many ways you can recieve information from stored procedure
A. there are 4 ways to receive information from a stored procedure:
Resultset , Parameters , Return value , A global cursor that can be referenced outside the stored procedure.

How many type of triggers are there in Sql Server.
A. There are two type of triggers
• After Triggers
• Instead of Triggers

Whats the Difference between datetime and smalldatetime datatype in SQL Server .

A . The main difference between these two datatypes is in the amount of space they occupy. datetime occupies eight bytes and smalldatetime only four. The difference in size is due to a difference in precision. The precision of smalldatetime is one minute, and it covers dates from January 1, 1900 , through June 6, 2079 , which is usually more than enough. The precision of datetime is 3.33 ms, and it covers dates from January 1, 1753 , to December 31, 9999 .

How many type of authentication method are there in SQL Server 2005 and 2000 and what is that ?

A. There are two type of authentication method in SQL Server 2005 and 2000
Windows Authentication
SQL Server Authentication

How can you generate GUID in in Transact-SQL ?

A. GUIDs can be generated using the NEWID function.

How many system datatypes are in SQL Server.

A. 27

What is the Maximum number of input and output parameters in Stored procedure in SQL Server 2000 ?
A. 1024

How can we improve stored procedure performance? Explain the performance tuning techniques.

1. Use many WHERE clauses in the SELECT statements.
2. Select only those fields which you really require.
3. Joins are expensive in terms of time. Join the tables using related fields manly indexed fields.
4. Don't use unused tables.

NOTES: When you execute the stored proc, SQL Server generates an execution plan based on your indexes and statistics available. Now it is possible that the stored proc inserts a lot of rows into some table which is used in another statement in a join and the plan that was built is not efficient for the subsequent sql statement because the profile of the table has changed since the plan was generated. I would suggest splitting the stored proc into two and the split should be done such that the second proc deals with the updated table. When the second proc is executed, it will have a more realistic execution plan for the updated table.

Wednesday, June 3, 2009

TRY...CATCH in SQL Server 2005

Introduction
SQL Server 2005 offers a number of new features over its predecessor, including many features aimed at making working with databases more like writing .NET application code. For example, in SQL Server 2005, stored procedures, triggers, UDFs, and so on can be written using any .NET Framework programming language (such as Visual Basic or C#). Another feature, and the focus of this article, is SQL Server 2005's support for TRY...CATCH blocks. TRY...CATCH blocks are the standard approach to exception handling in modern programming languages, and involve:
  • A TRY Block - the TRY block contains the instructions that might cause an exception
  • A CATCH Block - if an exception occurs from one of the statements in the TRY block, control is branched to the CATCH block, where the exception can be handled, logged, and so on.
See the Wikipedia Exception Handling entry for more information on the TRY...CATCH construct as well as exception handling concepts in general.

Prior to SQL Server 2005, detecting errors resulting from T-SQL statements could only be handled by checking a global error variable, @@ERROR. Because the @@ERROR variable value is reset after each SQL statement, this antiquated approach leads to rather bloated stored procedures, as the variable must be checked after each statement with code to handle any problems.

The TRY...CATCH block in SQL Server 2005 offers a much more readable syntax and one that developers are more familiar with. In this article we'll look at the new TRY...CATCH block and examine how it can be used to rollback a transaction in the face of an error. Read on to learn more!


Checking @@ERROR - the Old Way of Handling Errors in T-SQL
SQL Server provides an @@ERROR variable that indicates the status of the last completed SQL statement in a given batch. If a SQL statement is completed successfully, @@ERROR is assigned 0. If, however, an error occurs, @@ERROR is set to the number of the error message.

To see how the @@ERROR variable can be used, imagine that we have a data-driven web application that maintains employee information. Let's assume that our database has Employees and EmployeePhoneNumbers tables, among others. These two tables share a one-to-many relationship; that is, each Employees record can have an arbitrary number of related records in the EmployeePhoneNumbers table. There might be one for their office phone, one for their pager, one for their cell phone, and so on. Imagine that our database includes a stored procedure, DeleteEmployee, which is comprised of two DELETE statements - one to delete the employee's related phone numbers from the system and one to delete the actual employee record:

CREATE PROCEDURE DeleteEmployee ( @EmployeeID int )
AS

-- Delete the Employee's phone numbers
DELETE FROM EmployeePhoneNumbers
WHERE EmployeeID = @EmployeeID

-- Delete the Employee record
DELETE FROM Employees
WHERE EmployeeID = @EmployeeID

Since we want these two delete statements to be atomic and either both fail or both succeed, we need to wrap up these two statements into a transaction. By using a transaction, we can rollback the transaction in the face of an error and undo any changes made since the start of the exception. To accomplish this we might initially try to use the following syntax:

CREATE PROCEDURE DeleteEmployee ( @EmployeeID int )
AS

BEGIN TRANSACTION -- Start the transaction

-- Delete the Employee's phone numbers
DELETE FROM EmployeePhoneNumbers
WHERE EmployeeID = @EmployeeID

-- Delete the Employee record
DELETE FROM Employees
WHERE EmployeeID = @EmployeeID


-- See if there is an error
IF @@ERROR <> 0
-- There's an error b/c @ERROR is not 0, rollback
ROLLBACK
ELSE
COMMIT -- Success! Commit the transaction

This stored procedure (it appears) starts a transaction, runs the two DELETE statements, and then checks to see if there was an error. If there was one, it rolls the transaction back, else it commits the transaction. I say "it appears" because this syntax, while legal, is semantically incorrect because the @@ERROR variable is set after every SQL statement. Therefore, if the first DELETE statement has an error the @@ERROR variable will be set to its error number. Then, the second DELETE will execute. If this second DELETE succeeds, @@ERROR will be set back to 0, in which case the transaction will be committed even though there was a problem with the first statement! Whoops!

Instead, a check must be made after every SQL statement to see if there has been an error. If so, the transaction must be rolled back and the stored procedure exited. This can lead to bulky scripts as a stored procedure with, say, five statements will have five checks against the @@ERROR variable. And if you forget to cut and paste a check in for a particular statement you're opening yourself up to a potential problem.

For more information on transactions and the @@ERROR syntax used for checking for errors and rolling back as needed, see Managing Transactions in SQL Server Stored Procedures.

Handling Errors With SQL Server 2005's TRY...CATCH Blocks
While SQL Server 2005 still supports the @@ERROR approach, a better alternative exists with its new TRY...CATCH blocks. As with programming languages like Visual Basic, C#, and Java, the SQL Server 2005 TRY...CATCH block executes a number of statements in the TRY block. If there are no errors in any of the statements, control proceeds to after the CATCH block. If, however, one of the statements causes an error, control branches immediately to the start of the CATCH block. Furthermore, like programming languages, nested TRY...CATCH blocks are allowed, meaning that you can have an entire TRY...CATCH block in the TRY or CATCH portions of an "outter" TRY...CATCH block.

BEGIN TRY
Try Statement 1
Try Statement 2
...
Try Statement M
END TRY
BEGIN CATCH
Catch Statement 1
Catch Statement 2
...
Catch Statement N
END CATCH

The following system functions are available in the CATCH block and can be used to determine additional error information:

FunctionDescription
ERROR_NUMBER()Returns the number of the error
ERROR_SEVERITY()Returns the severity
ERROR_STATE()Returns the error state number
ERROR_PROCEDURE()Returns the name of the stored procedure or trigger where the error occurred
ERROR_LINE()Returns the line number inside the routine that caused the error
ERROR_MESSAGE()Returns the complete text of the error message. The text includes the values supplied for any substitutable parameters, such as lengths, object names, or times

Also realize that not all errors generating by the TRY block statements are passed onto the CATCH block. Any errors with a severity of 10 or less are considered to be warnings and do not branch control flow to the CATCH block. Also, any errors that sever the database connection will not cause the CATCH block to be reached.

Let's look at a quick example of using TRY...CATCH, after which we'll turn our attention to using this new construct for rolling back transactions in the face of an error. The following example shows a very simply INSERT query on the Northwind database's Products table. The Products table's ProductID column is an IDENTITY column and therefore its value can't be specified when inserting a new record. However, I've specified this value in the following INSERT statement. Hence, control is turned over to the CATCH block where error information is displayed.

BEGIN TRY
-- This will generate an error, as ProductID is an IDENTITY column
-- Ergo, we can't specify a value for this column...
INSERT INTO Products(ProductID, ProductName)
VALUES(1, 'Test')
END TRY
BEGIN CATCH
SELECT 'There was an error! ' + ERROR_MESSAGE()
END CATCH

This query will return a single record with a single column with the contents: "There was an error! Cannot insert explicit value for identity column in table 'Products' when IDENTITY_INSERT is set to OFF."

Using TRY...CATCH to Rollback a Transaction in the Face of an Error
As discussed earlier in this article, one of the downsides of the @@ERROR variable approach is that for transactions a check against this variable must be added after each and every SQL statement to determine if an error occurred and, if so, to rollback the transaction. With SQL Server 2005's TRY...CATCH block, however, these types of scripts are greatly simplified, as the following example illustrates:

CREATE PROCEDURE DeleteEmployee ( @EmployeeID int )
AS

BEGIN TRY
BEGIN TRANSACTION -- Start the transaction

-- Delete the Employee's phone numbers
DELETE FROM EmployeePhoneNumbers
WHERE EmployeeID = @EmployeeID

-- Delete the Employee record
DELETE FROM Employees
WHERE EmployeeID = @EmployeeID

-- If we reach here, success!
COMMIT
END TRY
BEGIN CATCH
-- Whoops, there was an error
IF @@TRANCOUNT > 0
ROLLBACK

-- Raise an error with the details of the exception
DECLARE @ErrMsg nvarchar(4000), @ErrSeverity int
SELECT @ErrMsg = ERROR_MESSAGE(),
@ErrSeverity = ERROR_SEVERITY()

RAISERROR(@ErrMsg, @ErrSeverity, 1)
END CATCH

In the TRY block a transaction is started and the two DELETE statements are performed. If both DELETEs succeed, the COMMIT will be reached and the transaction committed. If, however, either one produces an error, control will be routed to the CATCH block where the transaction will be rolled back. Also, the CATCH block re-raises the error (using RAISERROR) so that the error information will be percolated up to the application that invoked the stored procedure. For an ASP.NET web application, that means that an exception will be raised in the .NET code that invoked this stored procedure, as chances are you not only want to rollback the transaction, but have some sort of error message handled in the web application as well so that the end user knows that their action failed.

Adding the call to RAISERROR in the CATCH block is tantamount to having a TRY...CATCH block in a programming language that re-throws the exception in the CATCH block after logging it or performing other actions. The action performed in the example above is rolling back the transaction, but could also include logging logic. If you omit the RAISERROR, the ASP.NET application won't throw an exception from executing the database command. Chances are you want to have an exception thrown on the ASP.NET side (so that you don't fail silently). If so, leave in the RAISERROR call.

Conclusion
SQL Server 2005's new TRY...CATCH block brings the familiar TRY...CATCH exception handling to T-SQL. Prior to SQL Server 2005, errors could only be detected in SQL scripts through the use of the @@ERROR variable, which annoyingly reset after each SQL statement, thereby requiring checks after each and every statement. This led to bloated script that was prone to typos or cut and paste errors leading to potentially serious problems. With SQL Server 2005's TRY...CATCH block, on the other hand, anytime an error is raised by one of the statements in the TRY block, control is directed to the CATCH block. As we saw in this article, the TRY...CATCH block allows for much more readable and cleaner error handling in transaction settings.

Followers