Friday, 7 January 2011

What Are Character Sets And Sort Orders?

Okay, see you again in SQL black book, I just read the book and tell what I see, so I didn't teach you, never :)

 

 

Another preinstallation issue is choosing a character set and sort order. A character set is the basic text and symbols that are loaded in your system. Regardless of the character set you choose, the first 128 characters are the same. The extended characters, including language-specific characters, reside in the remaining half of the character set. Your decision depends on whether you are doing business overseas or in other languages and need to store text and special characters. In most cases, the default is fine and should provide you with what you need to function.

You should make this determination prior to installation. Changing character sets can be a daunting task with many system ramifications. If your company is concerned about character sets, chances are you are experienced in these issues and this feature should be nothing new to you.

Another interesting issue concerns sort orders. Sort orders determine the way the data is organized when stored by Microsoft SQL Server. The default sort order for Microsoft SQL Server is dictionary order and case-insensitive. This is fine and probably the best default setting. It is not, however, the fastest setting you can use on your system.

Note: Microsoft is not trying to slow you down. Most programmers are not as careful as they could be and do not always exercise consistent case sensitivity when they write code. The default for Microsoft SQL Server should be used if you have legacy systems that might contain this kind of SQL code.

 

The fastest sort order is binary. The use of this setting has some impact on how you perform certain tasks down the road, so choose it carefully. It will change all of your SQL scripts, stored procedures, and client pass-through code to be case-sensitive. If you type a statement and use a different case than was specified when the table was created, you will get an error message. Say, for instance, you have a table called MyTable on your system. To access it, you type “mytable”. An “Object Not Found” error is returned.

Binary sort order poses a few issues in developing client software, and great care should be taken when using it. Your ad hoc queries might not return what you expect back from the server, either. A capital “F” does not equal a lowercase “f”. Reports are not inherently smart enough to tell the difference, and your code needs to allow for this.

If you store, access, and check for case sensitivity on your entire system, binary is the way to go. I have configured two identical machines installed from scratch with the same data sets stored in different sort orders. My tests have proven that binary is faster for a lot of common operations. If you are putting third-party applications on your server, make sure they run as expected in this sort order. If in doubt, call the vendor or technical support for the product in question.

I often use binary sort orders as an example of a setting that restricts programmers in a way they might find difficult. Because of the case-sensitive nature, programmers must write code with more care than they would otherwise. The end result is faster, but getting there might be more difficult.

Users of the system should also be considered when selecting binary sort orders. If a system allows for ad hoc reports or queries and a user does not know that the data is stored with case sensitivity, he or she might not get the expected results. This can be dangerous when converting legacy systems. Make the decision to use a binary sort order only after carefully weighing the impact on your entire organization.

Note: Under “Performance Comparisons” in the Microsoft SQL Server Books Online, select Topic 5 from the SQL Server 6.0 Setup Guide for more information.

 

 

 

 

Another consideration in choosing a character set and a sort order is whether you are setting up a distributed server environment. If you are, you must use compatible character sets and sort orders among your servers. If you are going to share, replicate, or distribute data, use a common character set and sort order throughout your enterprise. Do not forget that in business today we must occasionally share data with other companies. If your system interacts with another company’s system, again make sure the character sets and sort orders are compatible.

 

References : Microsoft SQL Server Black Book

No comments:

Post a Comment