Chars
Ringkasan Pelajaran
# Introduction
About chars
This is potentially a big subject! It is possible to write a long book about it, and several people have done so (search Amazon for “unicode book” to see some examples).
A very brief history
Handling characters in computers was much simpler in earlier decades, when programmers assumed that English was the only important language. So: 26 letters, upper and lower case, 10 digits, several punctuation marks, plus a code (0x07) to ring a bell, and it all fitted into 7 bits: the ASCII character set.
Naturally, people started asking what about à, ä and Ł, then other people started asking about ऄ, ஹ and ญ, and young people wanted emojis 😱. What to do?
To cut a long story short, many smart and patient people had to serve on committees for years, working out the details of the Unicode character set, and of encodings such as UTF-8, and lots of software needed a very complicated rewrite. Also, lots of new bugs were introduced.
To prevent everything breaking, the Unicode/UTF-8 design ensures that the first 127 codes are identical to ASCII (even the bell).
Characters in Kotlin
Languages designed after about 2005 have the huge advantage that a reasonably stable Unicode standard already existed.
Kotlin (first released in 2011) was able to assume that users would use a variety of (human) languages, and would need Unicode to express them.
[Characters][ref-char] in Kotlin are 16-bit (UTF-16) [`codepoints`][wiki-codepoint], the same as a JVM `char`.
This is enough to express most written alphabets, but not the entire range of emojis.
The full Unicode standard uses up to six bytes (48 bits) per character (called a [`grapheme`][wiki-grapheme]).
Kotlin `Strings` support this full standard by using multiple codepoints per character, when necessary.
For example, 😱 would be `\uD83D` and `\uDE31`.
Unfortunately, Java has no built-in grapheme support, and for compatibility neither does Kotlin.
[wiki-codepoint]: https://en.wikipedia.org/wiki/Code_point
[wiki-grapheme]: https://en.wikipedia.org/wiki/Grapheme
[ref-char]: https://kotlinlang.org/docs/characters.html
Character literals are written in single-quotes, and are distinct from strings written in double quotes. This is probably obvious to people from the C/C++ world, but potentially confusing to Python and JavaScript programmers.
val a = 'a'
a::class.qualifiedName // => kotlin.Char
a.code // => 97
val jha = 'झ' // Devanagari alphabet
jha.code // => 2333
val heart = '❤' // heart emoji
heart.code // => 10084
Char.MAX_VALUE.code // => 65535 (64k, the largest code point allowed)
val not_char = 'abc' // => Too many characters in a character literal.
Converting between Char and Int is straightforward:
a.code // => 97
Char(97) // => 'a'
The compiler allows some forms of integer arithmetic on Chars:
'a' + 5 // => 'f'
'c' - 'a' // => 2
'c' + 'a' // => error!
'f' + ('A' - 'a') // => 'F' (same as 'f'.uppercase()
'f'.dec() // => 'e' (decrement)
'f'.inc() // => 'g' (increment)
Some functions for Char
As always, there are far too many functions to discuss here, so this is just a selection.
- For appropriate alphabets, change case with
uppercase()andlowercase(). - Test case with
isUpperCase()andisLowerCase(). - Test character type with:
isLetter(), covers many alphabets (the Lu, Ll, Lt, Lm, and Lo categories in unicode)isDigit(), in range 0..9 (the Nd category in unicode)isLetterOrDigit(), combines the previous twoisWhitespace(), any whitespace character (the Cc, Zp, Zl, and Zs categories in unicode)
'झ'.isLetter() // => true
'A'.isLowerCase() // => false
'4'.isDigit() // => true
'\t'.isWhitespace() // => true (tab character)
Also, regular expressions (which will be the subject of a later Concept) allow powerful search and manipulation.
Char List and String interconversions
To convert from a String to a List of Chars, we can use toList().
To convert a List of Chars to a String, there is the joinToString() function, which takes a separator (often the empty string) as argument.
val kt = "kotlin".toList() // => [k, o, t, l, i, n]
kt.joinToString("") // => "kotlin"
kt.joinToString("_") // => "k_o_t_l_i_n"
Note that joinToString() operates on a List or Array.
To cast a single Char to a 1-character string, use toString().
'a'.toString() // => "a"
To check if a character is present in a String, or a Char list or array, we have in, which maps to the contains() function.
val clist = "kotlin".toList() // => [k, o, t, l, i, n]
't' in clist // => true
't' in "kotlin" // => true
Originally from Exercism kotlin concepts