What is the utf8mb4_0900_ai_ci collation?
What is the meaning of the MySQL collation utf8mb4_0900_ai_ci?
utf8mb4
means that each character is stored as a maximum of 4 bytes in the UTF-8 encoding scheme.0900
refers to the Unicode Collation Algorithm version. (The Unicode Collation Algorithm is the method used to compare two Unicode strings that conforms to the requirements of the Unicode Standard).ai
refers accent insensitivity. That is, there is no difference between e, è, é, ê and ë when sorting.ci
refers to case insensitivity. This is, there is no difference between p and P when sorting.
utf8mb4
has become the default character set, with utf8mb4_0900_ai_ci
as the default collation in MySQL 8.0.1 and later. Previously, utf8mb4_general_ci
was the default collation. Because the utf8mb4_0900_ai_ci
collation is now the default, new tables have the ability to store characters outside the Basic Multilingual Plane by default. Emojis can now be stored by default. If accent sensitivity and case sensitivity are required, you may use utf8mb4_0900_as_cs
instead.
If you are interested in the details, the MySQL developers have explained the motivation behind the switch to utf8mb4_0900_ai_ci
as the default collation in this article: New collations in MySQL 8.0.0.