Characterset and Collation

This article will consist of two parts :

A)Characterset and Collation in General 

B)Characterset and Collation in Mysql 

 

A)Characterset and Collation in General

Characterset :

Simply put , a characterset (set of characters ) is a combination of two things. 

  •  Symbols
  •  Encoding

Symbols are characters such as , A , B , @ , X , 5 , 1 ....

Encoding is numeric representation of  these  characters . E.g. A might have a binary code of 1011 , B may have a binary code of 1101 and so on . 

Simple , isn't it ?

Collation :

Collation is  a set of rules which determines how these characters should behave in certain situations . 

e.g. a rule which says that capital A is equivalent to small a , will be a collation . This example uses one rule only . There can be several rules which make up a collation .

There's a standard naming convention for collation : e.g. latin1_swedish_ci (characterset_collation_ci) . There can be values other then ci at the end , but let's keep it simple for now . 

 

B)Characterset and Collation in MYSQL

 

Mysql uses Characterset and collation settings  at 4 levels . 

1) Server

2) Database

3) Table

4) Column

 

1)Characterset and Collation settings at Server Level :

 Initially, the server character set and collation depend on the options that you use when you start mysqld.

for Characterset , you can use --character-set-server [shell> mysqld --character-set-server=latin1 ]

 

for Collation , you can use --collation-server              [shell> mysqld --collation-server=latin1_swedish_ci]

What If you don't specify any setting for characterset and collation ?

Mysql uses default characterset which is latin1 and default collation which is latin1_swedish_ci

Thus following three commands will have same effect : 

 

shell> mysqld
shell> mysqld --character-set-server=latin1
shell> mysqld --character-set-server=latin1 \
--collation-server=latin1_swedish_ci

 

Server level settings is used by default when Database level settings is not used..Let's have a look at database level settings ..

2)Characterset and Collation settings at Database Level:

Every database has a database character set and a database collation. The CREATE DATABASE and ALTER DATABASE statements have optional clauses for specifying the database character set and collation:

CREATE DATABASE db_name
[[DEFAULT] CHARACTER SET charset_name]
[[DEFAULT] COLLATE collation_name]

ALTER DATABASE db_name
[[DEFAULT] CHARACTER SET charset_name]
    [[DEFAULT] COLLATE collation_name]

Database Level Setting is used by default when Table level settings is not specified .. Let's have a look at Table Level Settings

3)Characterset and Collation settings at Table Level:

Every Table has a table character set and a table collation . The Create Table and Alter Table statements have optional clauses for specifying the table character set and collation:

CREATE TABLE tbl_name (column_list)
[[DEFAULT] CHARACTER SET charset_name]
[COLLATE collation_name]]

ALTER TABLE tbl_name
[[DEFAULT] CHARACTER SET charset_name]
[COLLATE collation_name]

Example:

CREATE TABLE virtism ( ... )
CHARACTER SET latin1 COLLATE latin1_german_ci;

Table level Character Settings and Collation is used by default when Column level settings is not specified.. Let's have a look at Column Level Settings .

 

4)Characterset and Collation settings at Column Level:

Every "character" Column has a Character set and a Column collation . The Create Table and Alter Table statements have optional clauses for specifying the Column character set and Collation:

 

col_name {CHAR | VARCHAR | TEXT} (col_length)
[CHARACTER SET charset_name]
[COLLATE collation_name]