Skip to main content
Technical

How Do I Handle Unicode and Special Characters in RCS Messages?

RCS fully supports Unicode including emojis, accented characters, Asian scripts, and right-to-left languages. Use UTF-8 encoding throughout your stack, test with diverse character sets, and be aware that emoji rendering varies by device. No special encoding required—just send UTF-8 text.

Key Points

  • Full Unicode support including emojis and all scripts
  • Use UTF-8 encoding throughout your stack
  • Test rendering across devices and carriers
  • Emoji appearance varies by device
  • Character limits count UTF-8 code points, not bytes

Unicode and Special Characters in RCS

RCS supports the full Unicode standard. Here's how to handle it properly.

What's Supported

All Unicode characters:

  • Latin accented characters (é, ñ, ü, ç, etc.)
  • Cyrillic (Russian, Ukrainian, etc.)
  • Greek, Armenian, Georgian
  • Arabic, Hebrew, Persian (right-to-left)
  • Chinese (Simplified and Traditional)
  • Japanese, Korean, Thai
  • Hindi, Bengali, Tamil, and all Indic scripts
  • Emojis (full Unicode emoji set)

What's supported technically:

  • UTF-8 encoding (industry standard)
  • Multiple scripts in single message
  • Mixed LTR/RTL content
  • Language-specific fonts (automatic)
  • Special symbols (math, currency, etc.)

Encoding Best Practices

Use UTF-8 everywhere:

# Good - UTF-8 encoding
message = "Hello 你好 🌍"
message_bytes = message.encode('utf-8')
api.send(message_bytes)

# Bad - ASCII or Latin-1 encoding
message = "Hello 你好 🌍"  # EncodingError!

HTML entities not needed:

  • Send raw UTF-8 text, not HTML entities
  • No need for & or 你
  • Just send the actual characters

Character Limits

Important: Limits count UTF-8 code points, not bytes

Examples:

  • "Hello" = 5 code points (5 bytes in ASCII, 5 in UTF-8)
  • "你好" = 2 code points (6 bytes in UTF-8)
  • "🌍" = 1 code point (4 bytes in UTF-8)

RCS message limits:

  • Single message: ~160-1,600 characters depending on encoding
  • Segmented automatically for longer messages
  • Emojis count as 2 characters typically

Right-to-Left (RTL) Support

Automatic RTL for Arabic and Hebrew:

  • RCS automatically handles RTL layout
  • Text direction detected per character
  • Mixed LTR/RTL in same message works

Testing RTL:

  • Test on actual devices, not just web previews
  • Verify numbers, emails, URLs display correctly
  • Check that buttons work correctly
  • Get native speaker review

Emoji Best Practices

Emoji rendering:

  • Modern OS emoji (iOS, Android, Windows, etc.)
  • Slight variations across platforms
  • Color emoji supported on most modern devices
  • Some older devices show monochrome only

Usage guidelines:

  • Use 2-3 emojis per message maximum
  • Avoid emojis in critical information (don't replace text)
  • Consider cultural context (some emojis offensive in certain cultures)
  • Test rendering on target devices

Emoji in buttons:

  • Can use emojis in button labels
  • "Shop Now 🛍️" works fine
  • Don't rely on emojis alone for meaning

Special Character Handling

Currency symbols:

  • $, €, £, ¥, ₹, ₽, ₩ all supported
  • Use Unicode symbols not HTML entities
  • Example: "$50" not "$50" or "$50"

Mathematical symbols:

  • ∑, π, ∞, √, ±, ×, ÷ all supported
  • Use for technical or financial content

Quotes and apostrophes:

  • Use Unicode smart quotes: " " ' '
  • Or straight quotes: " "
  • Both work, but be consistent

Dashes:

  • En dash (–) for ranges
  • Em dash (—) for breaks
  • Hyphen (-) for compounds
  • All supported, choose by style

Testing Diverse Characters

Test matrix should include:

  • Accented Latin characters (café, naïve, résumé)
  • Cyrillic (Привет мир)
  • Chinese (你好世界)
  • Japanese (こんにちは)
  • Arabic (مرحبا بالعالم)
  • Hebrew (שלום עולם)
  • Emojis (🌍 📱 ✉️)
  • Mixed scripts in one message

Tools for testing:

  • Unicode test pages
  • Real device testing
  • Carrier-specific test numbers
  • Native speaker review

Common Issues and Solutions

Issue: Characters showing as ???

  • Cause: Wrong encoding (likely Latin-1 or ASCII)
  • Solution: Use UTF-8 encoding throughout stack

Issue: Emojis showing as boxes

  • Cause: Device doesn't support color emoji
  • Solution: Use simple text or test on target devices

Issue: RTL text reversed

  • Cause: Mixing LTR/RTL incorrectly
  • Solution: Use proper Unicode bidirectional algorithm

Issue: Character limits exceeded

  • Cause: Counting bytes instead of code points
  • Solution: Use code point counting for limits

Performance Considerations

UTF-8 vs other encodings:

  • UTF-8: 1-4 bytes per character, backward compatible with ASCII
  • UTF-16: 2-4 bytes per character, more complex
  • Always use UTF-8 unless you have specific reason not to

Database storage:

  • Use utf8mb4 in MySQL (not utf8 which is limited)
  • Use nvarchar in SQL Server
  • Use proper encoding in PostgreSQL (UTF-8 default)

The Bottom Line

RCS supports the full Unicode standard. Use UTF-8 encoding throughout your stack, test with diverse character sets, and design for international audiences from day one.

Unicode is straightforward when you set up proper encoding from the start. Retrofitting is painful and error-prone.

Still have questions?

Schedule a free consultation with our RCS specialists to discuss your specific needs.

Schedule Consultation
X Enterprises Footer Background