Strings¶
length
(s)¶The number of characters in string
s
.
sizeof
(s::AbstractString)¶The number of bytes in string
s
.
*
(s, t)¶Concatenate strings. The
*
operator is an alias to this function.julia>"Hello "*"world""Hello world"
^
(s, n)¶Repeat
n
times the strings
. Therepeat
function is an alias to this operator.julia>"Test "^3"Test Test Test "
string
(xs...)¶Create a string from any values using the
print
function.
repr
(x)¶Create a string from any value using the
showall
function.
bytestring
(::Ptr{UInt8}[, length])¶Create a string from the address of a C (0-terminated) string encoded in ASCII or UTF-8. A copy is made; the ptr can be safely freed. If
length
is specified, the string does not have to be 0-terminated.
bytestring
(s)Convert a string to a contiguous byte array representation appropriate for passing it to C functions. The string will be encoded as either ASCII or UTF-8.
ascii
(::Array{UInt8, 1})¶Create an ASCII string from a byte array.
ascii
(s)Convert a string to a contiguous ASCII string (all characters must be valid ASCII characters).
ascii
(::Ptr{UInt8}[, length])Create an ASCII string from the address of a C (0-terminated) string encoded in ASCII. A copy is made; the ptr can be safely freed. If
length
is specified, the string does not have to be 0-terminated.
utf8
(::Array{UInt8, 1})¶Create a UTF-8 string from a byte array.
utf8
(::Ptr{UInt8}[, length])Create a UTF-8 string from the address of a C (0-terminated) string encoded in UTF-8. A copy is made; the ptr can be safely freed. If
length
is specified, the string does not have to be 0-terminated.
utf8
(s)Convert a string to a contiguous UTF-8 string (all characters must be valid UTF-8 characters).
@r_str -> Regex
Construct a regex, such as
r"^[a-z]*$"
. The regex also accepts one or more flags, listed after the ending quote, to change its behaviour:i
enables case-insensitive matchingm
treats the^
and$
tokens as matching the start and end of individual lines, as opposed to the whole string.s
allows the.
modifier to match newlines.x
enables “comment mode”: whitespace is enabled except when escaped with\
, and#
is treated as starting a comment.
For example, this regex has all three flags enabled:
julia>match(r"a+.*b+.*?d$"ism,"Goodbye,\nOh, angry,\nBad world\n")RegexMatch("angry,\nBad world")
@html_str -> Docs.HTML
Create an
HTML
object from a literal string.
@text_str -> Docs.Text
Create a
Text
object from a literal string.
normalize_string
(s, normalform::Symbol)¶Normalize the string
s
according to one of the four “normal forms” of the Unicode standard:normalform
can be:NFC
,:NFD
,:NFKC
, or:NFKD
. Normal forms C (canonical composition) and D (canonical decomposition) convert different visually identical representations of the same abstract string into a single canonical form, with form C being more compact. Normal forms KC and KD additionally canonicalize “compatibility equivalents”: they convert characters that are abstractly similar but visually distinct into a single canonical choice (e.g. they expand ligatures into the individual characters), with form KC being more compact.Alternatively, finer control and additional transformations may be be obtained by calling
normalize_string(s;keywords...)
, where any number of the following boolean keywords options (which all default tofalse
except forcompose
) are specified:compose=false
: do not perform canonical compositiondecompose=true
: do canonical decomposition instead of canonical composition (compose=true
is ignored if present)compat=true
: compatibility equivalents are canonicalizedcasefold=true
: perform Unicode case folding, e.g. for case-insensitive string comparisonnewline2lf=true
,newline2ls=true
, ornewline2ps=true
: convert various newline sequences (LF, CRLF, CR, NEL) into a linefeed (LF), line-separation (LS), or paragraph-separation (PS) character, respectivelystripmark=true
: strip diacritical marks (e.g. accents)stripignore=true
: strip Unicode’s “default ignorable” characters (e.g. the soft hyphen or the left-to-right marker)stripcc=true
: strip control characters; horizontal tabs and form feeds are converted to spaces; newlines are also converted to spaces unless a newline-conversion flag was specifiedrejectna=true
: throw an error if unassigned code points are foundstable=true
: enforce Unicode Versioning Stability
For example, NFKC corresponds to the options
compose=true,compat=true,stable=true
.
graphemes
(s) → iterator over substrings of s¶Returns an iterator over substrings of
s
that correspond to the extended graphemes in the string, as defined by Unicode UAX #29. (Roughly, these are what users would perceive as single characters, even though they may contain more than one codepoint; for example a letter combined with an accent mark is a single grapheme.)
isvalid
(value) → Bool¶Returns
true
if the given value is valid for its type, which currently can be one ofChar
,ASCIIString
,UTF8String
,UTF16String
, orUTF32String
.
isvalid
(T, value) → BoolReturns
true
if the given value is valid for that type. Types currently can beChar
,ASCIIString
,UTF8String
,UTF16String
, orUTF32String
Values forChar
can be of typeChar
orUInt32
Values forASCIIString
andUTF8String
can be of that type, orVector{UInt8}
Values forUTF16String
can beUTF16String
orVector{UInt16}
Values forUTF32String
can beUTF32String
,Vector{Char}
orVector{UInt32}
isvalid
(str, i)Tells whether index
i
is valid for the given string
is_assigned_char
(c) → Bool¶Returns
true
if the given char or integer is an assigned Unicode code point.
ismatch
(r::Regex, s::AbstractString) → Bool¶Test whether a string contains a match of the given regular expression.
match
(r::Regex, s::AbstractString[, idx::Integer[, addopts]])¶Search for the first match of the regular expression
r
ins
and return aRegexMatch
object containing the match, or nothing if the match failed. The matching substring can be retrieved by accessingm.match
and the captured sequences can be retrieved by accessingm.captures
The optionalidx
argument specifies an index at which to start the search.
eachmatch
(r::Regex, s::AbstractString[, overlap::Bool=false])¶Search for all matches of a the regular expression
r
ins
and return a iterator over the matches. If overlap istrue
, the matching sequences are allowed to overlap indices in the original string, otherwise they must be from distinct character ranges.
matchall
(r::Regex, s::AbstractString[, overlap::Bool=false]) → Vector{AbstractString}¶Return a vector of the matching substrings from eachmatch.
lpad
(string, n, p)¶Make a string at least
n
columns wide when printed, by padding on the left with copies ofp
.
rpad
(string, n, p)¶Make a string at least
n
columns wide when printed, by padding on the right with copies ofp
.
search
(string, chars[, start])¶Search for the first occurrence of the given characters within the given string. The second argument may be a single character, a vector or a set of characters, a string, or a regular expression (though regular expressions are only allowed on contiguous strings, such as ASCII or UTF-8 strings). The third argument optionally specifies a starting index. The return value is a range of indexes where the matching sequence is found, such that
s[search(s,x)]==x
:search(string,"substring")
=start:end
such thatstring[start:end]=="substring"
, or0:-1
if unmatched.search(string,'c')
=index
such thatstring[index]=='c'
, or0
if unmatched.
rsearch
(string, chars[, start])¶Similar to
search
, but returning the last occurrence of the given characters within the given string, searching in reverse fromstart
.
searchindex
(string, substring[, start])¶Similar to
search
, but return only the start index at which the substring is found, or0
if it is not.
rsearchindex
(string, substring[, start])¶Similar to
rsearch
, but return only the start index at which the substring is found, or0
if it is not.
contains
(haystack, needle)¶Determine whether the second argument is a substring of the first.
reverse
(s::AbstractString) → AbstractString¶Reverses a string
replace
(string, pat, r[, n])¶Search for the given pattern
pat
, and replace each occurrence withr
. Ifn
is provided, replace at mostn
occurrences. As with search, the second argument may be a single character, a vector or a set of characters, a string, or a regular expression. Ifr
is a function, each occurrence is replaced withr(s)
wheres
is the matched substring. Ifpat
is a regular expression andr
is aSubstitutionString
, then capture group references inr
are replaced with the corresponding matched text.
split
(string, [chars]; limit=0, keep=true)¶Return an array of substrings by splitting the given string on occurrences of the given character delimiters, which may be specified in any of the formats allowed by
search
‘s second argument (i.e. a single character, collection of characters, string, or regular expression). Ifchars
is omitted, it defaults to the set of all space characters, andkeep
is taken to befalse
. The two keyword arguments are optional: they are are a maximum size for the result and a flag determining whether empty fields should be kept in the result.
rsplit
(string, [chars]; limit=0, keep=true)¶Similar to
split
, but starting from the end of the string.
strip
(string[, chars])¶Return
string
with any leading and trailing whitespace removed. Ifchars
(a character, or vector or set of characters) is provided, instead remove characters contained in it.
lstrip
(string[, chars])¶Return
string
with any leading whitespace removed. Ifchars
(a character, or vector or set of characters) is provided, instead remove characters contained in it.
rstrip
(string[, chars])¶Return
string
with any trailing whitespace removed. Ifchars
(a character, or vector or set of characters) is provided, instead remove characters contained in it.
startswith
(string, prefix | chars)¶Returns
true
ifstring
starts withprefix
. If the second argument is a vector or set of characters, tests whether the first character ofstring
belongs to that set.
endswith
(string, suffix | chars)¶Returns
true
ifstring
ends withsuffix
. If the second argument is a vector or set of characters, tests whether the last character ofstring
belongs to that set.
uppercase
(string)¶Returns
string
with all characters converted to uppercase.
lowercase
(string)¶Returns
string
with all characters converted to lowercase.
ucfirst
(string)¶Returns
string
with the first character converted to uppercase.
lcfirst
(string)¶Returns
string
with the first character converted to lowercase.
join
(strings, delim[, last])¶Join an array of
strings
into a single string, inserting the given delimiter between adjacent strings. Iflast
is given, it will be used instead ofdelim
between the last two strings. For example,join(["apples","bananas","pineapples"],",","and")=="apples,bananasandpineapples"
.strings
can be any iterable over elementsx
which are convertible to strings viaprint(io::IOBuffer,x)
.
chop
(string)¶Remove the last character from a string.
chomp
(string)¶Remove a trailing newline from a string.
ind2chr
(string, i)¶Convert a byte index to a character index.
chr2ind
(string, i)¶Convert a character index to a byte index.
nextind
(str, i)¶Get the next valid string index after
i
. Returns a value greater thanendof(str)
at or after the end of the string.
prevind
(str, i)¶Get the previous valid string index before
i
. Returns a value less than1
at the beginning of the string.
randstring
([rng, ]len=8)¶Create a random ASCII string of length
len
, consisting of upper- and lower-case letters and the digits 0-9. The optionalrng
argument specifies a random number generator, see Random Numbers.
charwidth
(c)¶Gives the number of columns needed to print a character.
strwidth
(s)¶Gives the number of columns needed to print a string.
isalnum
(c::Union{Char, AbstractString}) → Bool¶Tests whether a character is alphanumeric, or whether this is true for all elements of a string. A character is classified as alphabetic if it belongs to the Unicode general category Letter or Number, i.e. a character whose category code begins with ‘L’ or ‘N’.
isalpha
(c::Union{Char, AbstractString}) → Bool¶Tests whether a character is alphabetic, or whether this is true for all elements of a string. A character is classified as alphabetic if it belongs to the Unicode general category Letter, i.e. a character whose category code begins with ‘L’.
isascii
(c::Union{Char, AbstractString}) → Bool¶Tests whether a character belongs to the ASCII character set, or whether this is true for all elements of a string.
iscntrl
(c::Union{Char, AbstractString}) → Bool¶Tests whether a character is a control character, or whether this is true for all elements of a string. Control characters are the non-printing characters of the Latin-1 subset of Unicode.
isdigit
(c::Union{Char, AbstractString}) → Bool¶Tests whether a character is a numeric digit (0-9), or whether this is true for all elements of a string.
isgraph
(c::Union{Char, AbstractString}) → Bool¶Tests whether a character is printable, and not a space, or whether this is true for all elements of a string. Any character that would cause a printer to use ink should be classified with
isgraph(c)==true
.
islower
(c::Union{Char, AbstractString}) → Bool¶Tests whether a character is a lowercase letter, or whether this is true for all elements of a string. A character is classified as lowercase if it belongs to Unicode category Ll, Letter: Lowercase.
isnumber
(c::Union{Char, AbstractString}) → Bool¶Tests whether a character is numeric, or whether this is true for all elements of a string. A character is classified as numeric if it belongs to the Unicode general category Number, i.e. a character whose category code begins with ‘N’.
isprint
(c::Union{Char, AbstractString}) → Bool¶Tests whether a character is printable, including spaces, but not a control character. For strings, tests whether this is true for all elements of the string.
ispunct
(c::Union{Char, AbstractString}) → Bool¶Tests whether a character belongs to the Unicode general category Punctuation, i.e. a character whose category code begins with ‘P’. For strings, tests whether this is true for all elements of the string.
isspace
(c::Union{Char, AbstractString}) → Bool¶Tests whether a character is any whitespace character. Includes ASCII characters ‘\t’, ‘\n’, ‘\v’, ‘\f’, ‘\r’, and ‘ ‘, Latin-1 character U+0085, and characters in Unicode category Zs. For strings, tests whether this is true for all elements of the string.
isupper
(c::Union{Char, AbstractString}) → Bool¶Tests whether a character is an uppercase letter, or whether this is true for all elements of a string. A character is classified as uppercase if it belongs to Unicode category Lu, Letter: Uppercase, or Lt, Letter: Titlecase.
isxdigit
(c::Union{Char, AbstractString}) → Bool¶Tests whether a character is a valid hexadecimal digit, or whether this is true for all elements of a string.
symbol
(x...) → Symbol¶Create a
Symbol
by concatenating the string representations of the arguments together.
escape_string
(str::AbstractString) → AbstractString¶General escaping of traditional C and Unicode escape sequences. See
print_escaped()
for more general escaping.
unescape_string
(s::AbstractString) → AbstractString¶General unescaping of traditional C and Unicode escape sequences. Reverse of
escape_string()
. See alsoprint_unescaped()
.
utf16
(s)¶Create a UTF-16 string from a byte array, array of
UInt16
, or any other string type. (Data must be valid UTF-16. Conversions of byte arrays check for a byte-order marker in the first two bytes, and do not include it in the resulting string.)Note that the resulting
UTF16String
data is terminated by the NUL codepoint (16-bit zero), which is not treated as a character in the string (so that it is mostly invisible in Julia); this allows the string to be passed directly to external functions requiring NUL-terminated data. This NUL is appended automatically by theutf16(s)
conversion function. If you have aUInt16
arrayA
that is already NUL-terminated valid UTF-16 data, then you can instead useUTF16String(A)
to construct the string without making a copy of the data and treating the NUL as a terminator rather than as part of the string.
utf16
(::Union{Ptr{UInt16}, Ptr{Int16}}[, length])Create a string from the address of a NUL-terminated UTF-16 string. A copy is made; the pointer can be safely freed. If
length
is specified, the string does not have to be NUL-terminated.
utf32
(s)¶Create a UTF-32 string from a byte array, array of
Char
orUInt32
, or any other string type. (Conversions of byte arrays check for a byte-order marker in the first four bytes, and do not include it in the resulting string.)Note that the resulting
UTF32String
data is terminated by the NUL codepoint (32-bit zero), which is not treated as a character in the string (so that it is mostly invisible in Julia); this allows the string to be passed directly to external functions requiring NUL-terminated data. This NUL is appended automatically by theutf32(s)
conversion function. If you have aChar
orUInt32
arrayA
that is already NUL-terminated UTF-32 data, then you can instead useUTF32String(A)
to construct the string without making a copy of the data and treating the NUL as a terminator rather than as part of the string.
utf32
(::Union{Ptr{Char}, Ptr{UInt32}, Ptr{Int32}}[, length])Create a string from the address of a NUL-terminated UTF-32 string. A copy is made; the pointer can be safely freed. If
length
is specified, the string does not have to be NUL-terminated.
wstring
(s)¶This is a synonym for either
utf32(s)
orutf16(s)
, depending on whetherCwchar_t
is 32 or 16 bits, respectively. The synonymWString
forUTF32String
orUTF16String
is also provided.