edge badge
Methods
C
D
N
P
S
T
U
Constants
NORMALIZATION_FORMS = [:c, :kc, :d, :kd]
 

A list of all available normalization forms. See www.unicode.org/reports/tr15/tr15-29.html for more information about normalization.

UNICODE_VERSION = RbConfig::CONFIG["UNICODE_VERSION"]
 

The Unicode version that is supported by the implementation

Attributes
[RW] default_normalization_form

The default normalization used for operations that require normalization. It can be set to any of the normalizations in NORMALIZATION_FORMS.

ActiveSupport::Multibyte::Unicode.default_normalization_form = :c
Instance Public methods
compose(codepoints)

Compose decomposed characters to the composed form.

# File activesupport/lib/active_support/multibyte/unicode.rb, line 49
def compose(codepoints)
  codepoints.pack("U*").unicode_normalize(:nfc).codepoints
end
decompose(type, codepoints)

Decompose composed characters to the decomposed form.

# File activesupport/lib/active_support/multibyte/unicode.rb, line 40
def decompose(type, codepoints)
  if type == :compatibility
    codepoints.pack("U*").unicode_normalize(:nfkd).codepoints
  else
    codepoints.pack("U*").unicode_normalize(:nfd).codepoints
  end
end
downcase(string)
# File activesupport/lib/active_support/multibyte/unicode.rb, line 117
def downcase(string)
  string.downcase
end
normalize(string, form = nil)

Returns the KC normalization of the string by default. NFKC is considered the best normalization form for passing strings to databases and validations.

  • string - The string to perform normalization on.

  • form - The form you want to normalize in. Should be one of the following: :c, :kc, :d, or :kd. Default is #default_normalization_form.

# File activesupport/lib/active_support/multibyte/unicode.rb, line 100
def normalize(string, form = nil)
  form ||= @default_normalization_form
  # See http://www.unicode.org/reports/tr15, Table 1
  case form
  when :d
    string.unicode_normalize(:nfd)
  when :c
    string.unicode_normalize(:nfc)
  when :kd
    string.unicode_normalize(:nfkd)
  when :kc
    string.unicode_normalize(:nfkc)
  else
    raise ArgumentError, "#{form} is not a valid normalization variant", caller
  end
end
pack_graphemes(unpacked)

Reverse operation of unpack_graphemes.

Unicode.pack_graphemes(Unicode.unpack_graphemes('क्षि')) # => 'क्षि'
# File activesupport/lib/active_support/multibyte/unicode.rb, line 35
def pack_graphemes(unpacked)
  unpacked.flatten.pack("U*")
end
swapcase(string)
# File activesupport/lib/active_support/multibyte/unicode.rb, line 125
def swapcase(string)
  string.swapcase
end
tidy_bytes(string, force = false)

Replaces all ISO-8859-1 or CP1252 characters by their UTF-8 equivalent resulting in a valid UTF-8 string.

Passing true will forcibly tidy all bytes, assuming that the string's encoding is entirely CP1252 or ISO-8859-1.

# File activesupport/lib/active_support/multibyte/unicode.rb, line 60
def tidy_bytes(string, force = false)
  return string if string.empty?
  return recode_windows1252_chars(string) if force
  string.scrub { |bad| recode_windows1252_chars(bad) }
end
unpack_graphemes(string)

Unpack the string at grapheme boundaries. Returns a list of character lists.

Unicode.unpack_graphemes('क्षि') # => [[2325, 2381], [2359], [2367]]
Unicode.unpack_graphemes('Café') # => [[67], [97], [102], [233]]
# File activesupport/lib/active_support/multibyte/unicode.rb, line 28
def unpack_graphemes(string)
  string.scan(/\X/).map(&:codepoints)
end
upcase(string)
# File activesupport/lib/active_support/multibyte/unicode.rb, line 121
def upcase(string)
  string.upcase
end