edge badge
Methods
C
D
N
P
R
T
U
Constants
NORMALIZATION_FORMS = [:c, :kc, :d, :kd]
 

A list of all available normalization forms. See www.unicode.org/reports/tr15/tr15-29.html for more information about normalization.

UNICODE_VERSION = RbConfig::CONFIG["UNICODE_VERSION"]
 

The Unicode version that is supported by the implementation

Attributes
[RW] default_normalization_form

The default normalization used for operations that require normalization. It can be set to any of the normalizations in NORMALIZATION_FORMS.

ActiveSupport::Multibyte::Unicode.default_normalization_form = :c
Instance Public methods
compose(codepoints)

Compose decomposed characters to the composed form.

# File activesupport/lib/active_support/multibyte/unicode.rb, line 57
def compose(codepoints)
  codepoints.pack("U*").unicode_normalize(:nfc).codepoints
end
decompose(type, codepoints)

Decompose composed characters to the decomposed form.

# File activesupport/lib/active_support/multibyte/unicode.rb, line 48
def decompose(type, codepoints)
  if type == :compatibility
    codepoints.pack("U*").unicode_normalize(:nfkd).codepoints
  else
    codepoints.pack("U*").unicode_normalize(:nfd).codepoints
  end
end
normalize(string, form = nil)

Returns the KC normalization of the string by default. NFKC is considered the best normalization form for passing strings to databases and validations.

  • string - The string to perform normalization on.

  • form - The form you want to normalize in. Should be one of the following: :c, :kc, :d, or :kd. Default is #default_normalization_form.

# File activesupport/lib/active_support/multibyte/unicode.rb, line 108
      def normalize(string, form = nil)
        form ||= @default_normalization_form

        # See https://www.unicode.org/reports/tr15, Table 1
        if alias_form = NORMALIZATION_FORM_ALIASES[form]
          ActiveSupport::Deprecation.warn(<<-MSG.squish)
            ActiveSupport::Multibyte::Unicode#normalize is deprecated and will be
            removed from Rails 6.1. Use String#unicode_normalize(:#{alias_form}) instead.
          MSG

          string.unicode_normalize(alias_form)
        else
          ActiveSupport::Deprecation.warn(<<-MSG.squish)
            ActiveSupport::Multibyte::Unicode#normalize is deprecated and will be
            removed from Rails 6.1. Use String#unicode_normalize instead.
          MSG

          raise ArgumentError, "#{form} is not a valid normalization variant", caller
        end
      end
pack_graphemes(unpacked)

Reverse operation of unpack_graphemes.

Unicode.pack_graphemes(Unicode.unpack_graphemes('क्षि')) # => 'क्षि'
# File activesupport/lib/active_support/multibyte/unicode.rb, line 43
def pack_graphemes(unpacked)
  unpacked.flatten.pack("U*")
end
recode_windows1252_chars(string)
# File activesupport/lib/active_support/multibyte/unicode.rb, line 142
def recode_windows1252_chars(string)
  string.encode(Encoding::UTF_8, Encoding::Windows_1252, invalid: :replace, undef: :replace)
end
tidy_bytes(string, force = false)

Replaces all ISO-8859-1 or CP1252 characters by their UTF-8 equivalent resulting in a valid UTF-8 string.

Passing true will forcibly tidy all bytes, assuming that the string's encoding is entirely CP1252 or ISO-8859-1.

# File activesupport/lib/active_support/multibyte/unicode.rb, line 68
def tidy_bytes(string, force = false)
  return string if string.empty?
  return recode_windows1252_chars(string) if force
  string.scrub { |bad| recode_windows1252_chars(bad) }
end
unpack_graphemes(string)

Unpack the string at grapheme boundaries. Returns a list of character lists.

Unicode.unpack_graphemes('क्षि') # => [[2325, 2381], [2359], [2367]]
Unicode.unpack_graphemes('Café') # => [[67], [97], [102], [233]]
# File activesupport/lib/active_support/multibyte/unicode.rb, line 36
def unpack_graphemes(string)
  string.scan(/\X/).map(&:codepoints)
end