Strings and Characters
Strings and characters are used as follows:
- Strings are collections of characters.
- Strings have the type
String
and characters have the typeCharacter
. - Strings can be used to work with text in a Unicode-compliant way.
- Strings are immutable.
String and character literals are enclosed in double quotation marks ("
):
_10let someString = "Hello, world!"
String literals may contain escape sequences. An escape sequence starts with a backslash (\
):
\0
: Null character\\
: Backslash\t
: Horizontal tab\n
: Line feed\r
: Carriage return\"
: Double quotation mark\'
: Single quotation mark\u
: A Unicode scalar value, written as\u{x}
, wherex
is a 1–8 digit hexadecimal number, which needs to be a valid Unicode scalar value (i.e., in the range 0 to 0xD7FF and 0xE000 to 0x10FFFF inclusive).
_10// Declare a constant which contains two lines of text_10// (separated by the line feed character `\n`), and ends_10// with a thumbs up emoji, which has code point U+1F44D (0x1F44D)._10//_10let thumbsUpText =_10 "This is the first line.\nThis is the second line with an emoji: \u{1F44D}"
The type Character
represents a single, human-readable character. Characters are extended grapheme clusters, which consist of one or more Unicode scalars.
For example, the single character ü
can be represented in several ways in Unicode. First, it can be represented by a single Unicode scalar value ü
("LATIN SMALL LETTER U WITH DIAERESIS", code point U+00FC). Second, the same single character can be represented by two Unicode scalar values: u
("LATIN SMALL LETTER U", code point +0075), and "COMBINING DIAERESIS" (code point U+0308). The combining Unicode scalar value is applied to the scalar before it, which turns a u
into a ü
.
Still, both variants represent the same human-readable character ü
:
_10let singleScalar: Character = "\u{FC}"_10// `singleScalar` is `ü`_10let twoScalars: Character = "\u{75}\u{308}"_10// `twoScalars` is `ü`
Another example where multiple Unicode scalar values are rendered as a single, human-readable character is a flag emoji. These emojis consist of two "REGIONAL INDICATOR SYMBOL LETTER" Unicode scalar values:
_10// Declare a constant for a string with a single character, the emoji_10// for the Canadian flag, which consists of two Unicode scalar values:_10// - REGIONAL INDICATOR SYMBOL LETTER C (U+1F1E8)_10// - REGIONAL INDICATOR SYMBOL LETTER A (U+1F1E6)_10//_10let canadianFlag: Character = "\u{1F1E8}\u{1F1E6}"_10// `canadianFlag` is `🇨🇦`
String fields and functions
Strings have multiple built-in functions you can use:
-
_10let length: Int
Returns the number of characters in the string as an integer.
_10let example = "hello"_10_10// Find the number of elements of the string._10let length = example.length_10// `length` is `5` -
_10let utf8: [UInt8]
The byte array of the UTF-8 encoding.
_10let flowers = "Flowers \u{1F490}"_10let bytes = flowers.utf8_10// `bytes` is `[70, 108, 111, 119, 101, 114, 115, 32, 240, 159, 146, 144]` -
_10view fun concat(_ other: String): String
Concatenates the string
other
to the end of the original string, but does not modify the original string. This function creates a new string whose length is the sum of the lengths of the string the function is called on and the string given as a parameter._10let example = "hello"_10let new = "world"_10_10// Concatenate the new string onto the example string and return the new string._10let helloWorld = example.concat(new)_10// `helloWorld` is now `"helloworld"` -
_10view fun slice(from: Int, upTo: Int): String
Returns a string slice of the characters in the given string from start index
from
up to, but not including, the end indexupTo
. This function creates a new string whose length isupTo - from
. It does not modify the original string. If either of the parameters are out of the bounds of the string, or the indices are invalid (from > upTo
), then the function will fail._11let example = "helloworld"_11_11// Create a new slice of part of the original string._11let slice = example.slice(from: 3, upTo: 6)_11// `slice` is now `"low"`_11_11// Run-time error: Out of bounds index, the program aborts._11let outOfBounds = example.slice(from: 2, upTo: 10)_11_11// Run-time error: Invalid indices, the program aborts._11let invalidIndices = example.slice(from: 2, upTo: 1) -
_10view fun decodeHex(): [UInt8]
Returns an array containing the bytes represented by the given hexadecimal string.
The given string must only contain hexadecimal characters and must have an even length. If the string is malformed, the program aborts.
_10let example = "436164656e636521"_10_10example.decodeHex() // is `[67, 97, 100, 101, 110, 99, 101, 33]` -
_10view fun toLower(): String
Returns a string where all upper case letters are replaced with lowercase characters.
_10let example = "Flowers"_10_10example.toLower() // is `flowers` -
_10view fun replaceAll(of: String, with: String): String
Returns a string where all occurences of
of
are replaced withwith
. Ifof
is empty, it matches at the beginning of the string and after each UTF-8 sequence yielding k+1 replacements for a string of length k._10let example = "abababa"_10_10example.replaceAll(of: "a", with: "o") // is `obobobo` -
_10view fun split(separator: String): [String]
Returns the variable-sized array of strings created splitting the receiver string on the
separator
._10let example = "hello world"_10_10example.split(separator: " ") // is `["hello", "world"]`
The String
type also provides the following functions:
-
_10view fun String.encodeHex(_ data: [UInt8]): String
Returns a hexadecimal string for the given byte array
_10let data = [1 as UInt8, 2, 3, 0xCA, 0xDE]_10_10String.encodeHex(data) // is `"010203cade"` -
_10view fun String.join(_ strings: [String], separator: String): String
Returns the string created by joining the array of
strings
with the providedseparator
._10let strings = ["hello", "world"]_10String.join(strings, " ") // is "hello world"
String
s are also indexable, returning a Character
value.
_10let str = "abc"_10let c = str[0] // is the Character "a"
-
_10view fun String.fromUTF8(_ input: [UInt8]): String?
Attempts to convert a UTF-8 encoded byte array into a
String
. This function returnsnil
if the byte array contains invalid UTF-8, such as incomplete codepoint sequences or undefined graphemes.For a given string
s
,String.fromUTF8(s.utf8)
is equivalent to wrappings
up in an optional.
Character fields and functions
Character
values can be converted into String
values using the toString
function:
-
_10view fun toString(): String`
Returns the string representation of the character.
_10let c: Character = "x"_10_10c.toString() // is "x" -
_10view fun String.fromCharacters(_ characters: [Character]): String
Builds a new
String
value from an array ofCharacter
s. BecauseString
s are immutable, this operation makes a copy of the input array._10let rawUwU: [Character] = ["U", "w", "U"]_10let uwu: String = String.fromCharacters(rawUwU) // "UwU" -
_10let utf8: [UInt8]
The byte array of the UTF-8 encoding.
_10let a: Character = "a"_10let a_bytes = a.utf8 // `a_bytes` is `[97]`_10_10let bouquet: Character = "\u{1F490}"_10let bouquet_bytes = bouquet.utf8 // `bouquet_bytes` is `[240, 159, 146, 144]`