How Do I Handle Unicode and Special Characters in RCS Messages?

Q: How Do I Handle Unicode and Special Characters in RCS Messages?

RCS fully supports Unicode including emojis, accented characters, Asian scripts, and right-to-left languages. Use UTF-8 encoding throughout your stack, test with diverse character sets, and be aware that emoji rendering varies by device. No special encoding required—just send UTF-8 text.

Unicode and Special Characters in RCS

RCS supports the full Unicode standard. Here's how to handle it properly.

What's Supported

All Unicode characters:

Latin accented characters (é, ñ, ü, ç, etc.)
Cyrillic (Russian, Ukrainian, etc.)
Greek, Armenian, Georgian
Arabic, Hebrew, Persian (right-to-left)
Chinese (Simplified and Traditional)
Japanese, Korean, Thai
Hindi, Bengali, Tamil, and all Indic scripts
Emojis (full Unicode emoji set)

What's supported technically:

UTF-8 encoding (industry standard)
Multiple scripts in single message
Mixed LTR/RTL content
Language-specific fonts (automatic)
Special symbols (math, currency, etc.)

Encoding Best Practices

Use UTF-8 everywhere:

# Good - UTF-8 encoding
message = "Hello 你好 🌍"
message_bytes = message.encode('utf-8')
api.send(message_bytes)

# Bad - ASCII or Latin-1 encoding
message = "Hello 你好 🌍"  # EncodingError!

HTML entities not needed:

Send raw UTF-8 text, not HTML entities
No need for & or 你
Just send the actual characters

Character Limits

Important: Limits count UTF-8 code points, not bytes

Examples:

"Hello" = 5 code points (5 bytes in ASCII, 5 in UTF-8)
"你好" = 2 code points (6 bytes in UTF-8)
"🌍" = 1 code point (4 bytes in UTF-8)

RCS message limits:

Single message: ~160-1,600 characters depending on encoding
Segmented automatically for longer messages
Emojis count as 2 characters typically

Right-to-Left (RTL) Support

Automatic RTL for Arabic and Hebrew:

RCS automatically handles RTL layout
Text direction detected per character
Mixed LTR/RTL in same message works

Testing RTL:

Test on actual devices, not just web previews
Verify numbers, emails, URLs display correctly
Check that buttons work correctly
Get native speaker review

Emoji Best Practices

Emoji rendering:

Modern OS emoji (iOS, Android, Windows, etc.)
Slight variations across platforms
Color emoji supported on most modern devices
Some older devices show monochrome only

Usage guidelines:

Use 2-3 emojis per message maximum
Avoid emojis in critical information (don't replace text)
Consider cultural context (some emojis offensive in certain cultures)
Test rendering on target devices

Emoji in buttons:

Can use emojis in button labels
"Shop Now 🛍️" works fine
Don't rely on emojis alone for meaning

Special Character Handling

Currency symbols:

$, €, £, ¥, ₹, ₽, ₩ all supported
Use Unicode symbols not HTML entities
Example: "$50" not "$50" or "$50"

Mathematical symbols:

∑, π, ∞, √, ±, ×, ÷ all supported
Use for technical or financial content

Quotes and apostrophes:

Use Unicode smart quotes: " " ' '
Or straight quotes: " "
Both work, but be consistent

Dashes:

En dash (–) for ranges
Em dash (—) for breaks
Hyphen (-) for compounds
All supported, choose by style

Testing Diverse Characters

Test matrix should include:

Accented Latin characters (café, naïve, résumé)
Cyrillic (Привет мир)
Chinese (你好世界)
Japanese (こんにちは)
Arabic (مرحبا بالعالم)
Hebrew (שלום עולם)
Emojis (🌍 📱 ✉️)
Mixed scripts in one message

Tools for testing:

Unicode test pages
Real device testing
Carrier-specific test numbers
Native speaker review

Common Issues and Solutions

Issue: Characters showing as ???

Cause: Wrong encoding (likely Latin-1 or ASCII)
Solution: Use UTF-8 encoding throughout stack

Issue: Emojis showing as boxes

Cause: Device doesn't support color emoji
Solution: Use simple text or test on target devices

Issue: RTL text reversed

Cause: Mixing LTR/RTL incorrectly
Solution: Use proper Unicode bidirectional algorithm

Issue: Character limits exceeded

Cause: Counting bytes instead of code points
Solution: Use code point counting for limits

Performance Considerations

UTF-8 vs other encodings:

UTF-8: 1-4 bytes per character, backward compatible with ASCII
UTF-16: 2-4 bytes per character, more complex
Always use UTF-8 unless you have specific reason not to

Database storage:

Use utf8mb4 in MySQL (not utf8 which is limited)
Use nvarchar in SQL Server
Use proper encoding in PostgreSQL (UTF-8 default)

The Bottom Line

RCS supports the full Unicode standard. Use UTF-8 encoding throughout your stack, test with diverse character sets, and design for international audiences from day one.

Unicode is straightforward when you set up proper encoding from the start. Retrofitting is painful and error-prone.

How Do I Handle Unicode and Special Characters in RCS Messages?

Key Points

Unicode and Special Characters in RCS

What's Supported

Encoding Best Practices

Character Limits

Right-to-Left (RTL) Support

Emoji Best Practices

Special Character Handling

Testing Diverse Characters

Common Issues and Solutions

Performance Considerations

The Bottom Line

Related Questions

Do You Have API Documentation for RCS Integration?

Can I Send RCS from My Existing CRM?

What Happens If a Recipient Doesn't Have RCS?

Still have questions?

Solutions

Resources

Contact