Strings

length(s)

The number of characters in string s.

sizeof(s::String)

The number of bytes in string s.

*(s, t)

Concatenate strings. The * operator is an alias to this function.

julia> "Hello " * "world"
"Hello world"
^(s, n)

Repeat n times the string s. The ^ operator is an alias to this function.

julia> "Test "^3
"Test Test Test "
string(xs...)

Create a string from any values using the print function.

repr(x)

Create a string from any value using the showall function.

bytestring(::Ptr{Uint8}[, length])

Create a string from the address of a C (0-terminated) string encoded in ASCII or UTF-8. A copy is made; the ptr can be safely freed. If length is specified, the string does not have to be 0-terminated.

bytestring(s)

Convert a string to a contiguous byte array representation appropriate for passing it to C functions. The string will be encoded as either ASCII or UTF-8.

ascii(::Array{Uint8, 1})

Create an ASCII string from a byte array.

ascii(s)

Convert a string to a contiguous ASCII string (all characters must be valid ASCII characters).

utf8(::Array{Uint8, 1})

Create a UTF-8 string from a byte array.

utf8(s)

Convert a string to a contiguous UTF-8 string (all characters must be valid UTF-8 characters).

normalize_string(s, normalform::Symbol)

Normalize the string s according to one of the four “normal forms” of the Unicode standard: normalform can be :NFC, :NFD, :NFKC, or :NFKD. Normal forms C (canonical composition) and D (canonical decomposition) convert different visually identical representations of the same abstract string into a single canonical form, with form C being more compact. Normal forms KC and KD additionally canonicalize “compatibility equivalents”: they convert characters that are abstractly similar but visually distinct into a single canonical choice (e.g. they expand ligatures into the individual characters), with form KC being more compact.

Alternatively, finer control and additional transformations may be be obtained by calling normalize_string(s; keywords...), where any number of the following boolean keywords options (which all default to false except for compose) are specified:

  • compose=false: do not perform canonical composition
  • decompose=true: do canonical decomposition instead of canonical composition (compose=true is ignored if present)
  • compat=true: compatibility equivalents are canonicalized
  • casefold=true: perform Unicode case folding, e.g. for case-insensitive string comparison
  • newline2lf=true, newline2ls=true, or newline2ps=true: convert various newline sequences (LF, CRLF, CR, NEL) into a linefeed (LF), line-separation (LS), or paragraph-separation (PS) character, respectively
  • stripmark=true: strip diacritical marks (e.g. accents)
  • stripignore=true: strip Unicode’s “default ignorable” characters (e.g. the soft hyphen or the left-to-right marker)
  • stripcc=true: strip control characters; horizontal tabs and form feeds are converted to spaces; newlines are also converted to spaces unless a newline-conversion flag was specified
  • rejectna=true: throw an error if unassigned code points are found
  • stable=true: enforce Unicode Versioning Stability

For example, NFKC corresponds to the options compose=true, compat=true, stable=true.

is_valid_ascii(s) → Bool

Returns true if the string or byte vector is valid ASCII, false otherwise.

is_valid_utf8(s) → Bool

Returns true if the string or byte vector is valid UTF-8, false otherwise.

is_valid_char(c) → Bool

Returns true if the given char or integer is a valid Unicode code point.

is_assigned_char(c) → Bool

Returns true if the given char or integer is an assigned Unicode code point.

ismatch(r::Regex, s::String) → Bool

Test whether a string contains a match of the given regular expression.

match(r::Regex, s::String[, idx::Integer[, addopts]])

Search for the first match of the regular expression r in s and return a RegexMatch object containing the match, or nothing if the match failed. The matching substring can be retrieved by accessing m.match and the captured sequences can be retrieved by accessing m.captures The optional idx argument specifies an index at which to start the search.

eachmatch(r::Regex, s::String[, overlap::Bool=false])

Search for all matches of a the regular expression r in s and return a iterator over the matches. If overlap is true, the matching sequences are allowed to overlap indices in the original string, otherwise they must be from distinct character ranges.

matchall(r::Regex, s::String[, overlap::Bool=false]) → Vector{String}

Return a vector of the matching substrings from eachmatch.

lpad(string, n, p)

Make a string at least n characters long by padding on the left with copies of p.

rpad(string, n, p)

Make a string at least n characters long by padding on the right with copies of p.

Search for the first occurance of the given characters within the given string. The second argument may be a single character, a vector or a set of characters, a string, or a regular expression (though regular expressions are only allowed on contiguous strings, such as ASCII or UTF-8 strings). The third argument optionally specifies a starting index. The return value is a range of indexes where the matching sequence is found, such that s[search(s,x)] == x:

search(string, "substring") = start:end such that string[start:end] == "substring", or 0:-1 if unmatched.

search(string, 'c') = index such that string[index] == 'c', or 0 if unmatched.

rsearch(string, chars[, start])

Similar to search, but returning the last occurance of the given characters within the given string, searching in reverse from start.

searchindex(string, substring[, start])

Similar to search, but return only the start index at which the substring is found, or 0 if it is not.

rsearchindex(string, substring[, start])

Similar to rsearch, but return only the start index at which the substring is found, or 0 if it is not.

contains(haystack, needle)

Determine whether the second argument is a substring of the first.

replace(string, pat, r[, n])

Search for the given pattern pat, and replace each occurrence with r. If n is provided, replace at most n occurrences. As with search, the second argument may be a single character, a vector or a set of characters, a string, or a regular expression. If r is a function, each occurrence is replaced with r(s) where s is the matched substring.

split(string, [chars, [limit,] [include_empty]])

Return an array of substrings by splitting the given string on occurrences of the given character delimiters, which may be specified in any of the formats allowed by search‘s second argument (i.e. a single character, collection of characters, string, or regular expression). If chars is omitted, it defaults to the set of all space characters, and include_empty is taken to be false. The last two arguments are also optional: they are are a maximum size for the result and a flag determining whether empty fields should be included in the result.

rsplit(string, [chars, [limit,] [include_empty]])

Similar to split, but starting from the end of the string.

strip(string[, chars])

Return string with any leading and trailing whitespace removed. If chars (a character, or vector or set of characters) is provided, instead remove characters contained in it.

lstrip(string[, chars])

Return string with any leading whitespace removed. If chars (a character, or vector or set of characters) is provided, instead remove characters contained in it.

rstrip(string[, chars])

Return string with any trailing whitespace removed. If chars (a character, or vector or set of characters) is provided, instead remove characters contained in it.

beginswith(string, prefix | chars)

Returns true if string starts with prefix. If the second argument is a vector or set of characters, tests whether the first character of string belongs to that set.

endswith(string, suffix | chars)

Returns true if string ends with suffix. If the second argument is a vector or set of characters, tests whether the last character of string belongs to that set.

uppercase(string)

Returns string with all characters converted to uppercase.

lowercase(string)

Returns string with all characters converted to lowercase.

ucfirst(string)

Returns string with the first character converted to uppercase.

lcfirst(string)

Returns string with the first character converted to lowercase.

join(strings, delim[, last])

Join an array of strings into a single string, inserting the given delimiter between adjacent strings. If last is given, it will be used instead of delim between the last two strings. For example, join(["apples", "bananas", "pineapples"], ", ", " and ") == "apples, bananas and pineapples".

strings can be any iterable over elements x which are convertible to strings via print(io::IOBuffer, x).

chop(string)

Remove the last character from a string

chomp(string)

Remove a trailing newline from a string

ind2chr(string, i)

Convert a byte index to a character index

chr2ind(string, i)

Convert a character index to a byte index

isvalid(str, i)

Tells whether index i is valid for the given string

nextind(str, i)

Get the next valid string index after i. Returns a value greater than endof(str) at or after the end of the string.

prevind(str, i)

Get the previous valid string index before i. Returns a value less than 1 at the beginning of the string.

randstring(len)

Create a random ASCII string of length len, consisting of upper- and lower-case letters and the digits 0-9

charwidth(c)

Gives the number of columns needed to print a character.

strwidth(s)

Gives the number of columns needed to print a string.

isalnum(c::Union(Char, String)) → Bool

Tests whether a character is alphanumeric, or whether this is true for all elements of a string.

isalpha(c::Union(Char, String)) → Bool

Tests whether a character is alphabetic, or whether this is true for all elements of a string.

isascii(c::Union(Char, String)) → Bool

Tests whether a character belongs to the ASCII character set, or whether this is true for all elements of a string.

isblank(c::Union(Char, String)) → Bool

Tests whether a character is a tab or space, or whether this is true for all elements of a string.

iscntrl(c::Union(Char, String)) → Bool

Tests whether a character is a control character, or whether this is true for all elements of a string.

isdigit(c::Union(Char, String)) → Bool

Tests whether a character is a numeric digit (0-9), or whether this is true for all elements of a string.

isgraph(c::Union(Char, String)) → Bool

Tests whether a character is printable, and not a space, or whether this is true for all elements of a string.

islower(c::Union(Char, String)) → Bool

Tests whether a character is a lowercase letter, or whether this is true for all elements of a string.

isprint(c::Union(Char, String)) → Bool

Tests whether a character is printable, including space, or whether this is true for all elements of a string.

ispunct(c::Union(Char, String)) → Bool

Tests whether a character is printable, and not a space or alphanumeric, or whether this is true for all elements of a string.

isspace(c::Union(Char, String)) → Bool

Tests whether a character is any whitespace character, or whether this is true for all elements of a string.

isupper(c::Union(Char, String)) → Bool

Tests whether a character is an uppercase letter, or whether this is true for all elements of a string.

isxdigit(c::Union(Char, String)) → Bool

Tests whether a character is a valid hexadecimal digit, or whether this is true for all elements of a string.

symbol(str) → Symbol

Convert a string to a Symbol.

escape_string(str::String) → String

General escaping of traditional C and Unicode escape sequences. See print_escaped() for more general escaping.

unescape_string(s::String) → String

General unescaping of traditional C and Unicode escape sequences. Reverse of escape_string(). See also print_unescaped().

utf16(s)

Create a UTF-16 string from a byte array, array of Uint16, or any other string type. (Data must be valid UTF-16. Conversions of byte arrays check for a byte-order marker in the first two bytes, and do not include it in the resulting string.)

Note that the resulting UTF16String data is terminated by the NUL codepoint (16-bit zero), which is not treated as a character in the string (so that it is mostly invisible in Julia); this allows the string to be passed directly to external functions requiring NUL-terminated data. This NUL is appended automatically by the utf16(s) conversion function. If you have a Uint16 array A that is already NUL-terminated valid UTF-16 data, then you can instead use UTF16String(A)` to construct the string without making a copy of the data and treating the NUL as a terminator rather than as part of the string.

utf16(::Union(Ptr{Uint16}, Ptr{Int16})[, length])

Create a string from the address of a NUL-terminated UTF-16 string. A copy is made; the pointer can be safely freed. If length is specified, the string does not have to be NUL-terminated.

is_valid_utf16(s) → Bool

Returns true if the string or Uint16 array is valid UTF-16.

utf32(s)

Create a UTF-32 string from a byte array, array of Uint32, or any other string type. (Conversions of byte arrays check for a byte-order marker in the first four bytes, and do not include it in the resulting string.)

Note that the resulting UTF32String data is terminated by the NUL codepoint (32-bit zero), which is not treated as a character in the string (so that it is mostly invisible in Julia); this allows the string to be passed directly to external functions requiring NUL-terminated data. This NUL is appended automatically by the utf32(s) conversion function. If you have a Uint32 array A that is already NUL-terminated UTF-32 data, then you can instead use UTF32String(A)` to construct the string without making a copy of the data and treating the NUL as a terminator rather than as part of the string.

utf32(::Union(Ptr{Char}, Ptr{Uint32}, Ptr{Int32})[, length])

Create a string from the address of a NUL-terminated UTF-32 string. A copy is made; the pointer can be safely freed. If length is specified, the string does not have to be NUL-terminated.

wstring(s)

This is a synonym for either utf32(s) or utf16(s), depending on whether Cwchar_t is 32 or 16 bits, respectively. The synonym WString for UTF32String or UTF16String is also provided.