How Do I Handle Unicode and Special Characters in RCS Messages?
RCS fully supports Unicode including emojis, accented characters, Asian scripts, and right-to-left languages. Use UTF-8 encoding throughout your stack, test with diverse character sets, and be aware that emoji rendering varies by device. No special encoding required—just send UTF-8 text.
Key Points
- Full Unicode support including emojis and all scripts
- Use UTF-8 encoding throughout your stack
- Test rendering across devices and carriers
- Emoji appearance varies by device
- Character limits count UTF-8 code points, not bytes
Unicode and Special Characters in RCS
RCS supports the full Unicode standard. Here's how to handle it properly.
What's Supported
All Unicode characters:
- Latin accented characters (é, ñ, ü, ç, etc.)
- Cyrillic (Russian, Ukrainian, etc.)
- Greek, Armenian, Georgian
- Arabic, Hebrew, Persian (right-to-left)
- Chinese (Simplified and Traditional)
- Japanese, Korean, Thai
- Hindi, Bengali, Tamil, and all Indic scripts
- Emojis (full Unicode emoji set)
What's supported technically:
- UTF-8 encoding (industry standard)
- Multiple scripts in single message
- Mixed LTR/RTL content
- Language-specific fonts (automatic)
- Special symbols (math, currency, etc.)
Encoding Best Practices
Use UTF-8 everywhere:
# Good - UTF-8 encoding
message = "Hello 你好 🌍"
message_bytes = message.encode('utf-8')
api.send(message_bytes)
# Bad - ASCII or Latin-1 encoding
message = "Hello 你好 🌍" # EncodingError!
HTML entities not needed:
- Send raw UTF-8 text, not HTML entities
- No need for & or 你
- Just send the actual characters
Character Limits
Important: Limits count UTF-8 code points, not bytes
Examples:
- "Hello" = 5 code points (5 bytes in ASCII, 5 in UTF-8)
- "你好" = 2 code points (6 bytes in UTF-8)
- "🌍" = 1 code point (4 bytes in UTF-8)
RCS message limits:
- Single message: ~160-1,600 characters depending on encoding
- Segmented automatically for longer messages
- Emojis count as 2 characters typically
Right-to-Left (RTL) Support
Automatic RTL for Arabic and Hebrew:
- RCS automatically handles RTL layout
- Text direction detected per character
- Mixed LTR/RTL in same message works
Testing RTL:
- Test on actual devices, not just web previews
- Verify numbers, emails, URLs display correctly
- Check that buttons work correctly
- Get native speaker review
Emoji Best Practices
Emoji rendering:
- Modern OS emoji (iOS, Android, Windows, etc.)
- Slight variations across platforms
- Color emoji supported on most modern devices
- Some older devices show monochrome only
Usage guidelines:
- Use 2-3 emojis per message maximum
- Avoid emojis in critical information (don't replace text)
- Consider cultural context (some emojis offensive in certain cultures)
- Test rendering on target devices
Emoji in buttons:
- Can use emojis in button labels
- "Shop Now 🛍️" works fine
- Don't rely on emojis alone for meaning
Special Character Handling
Currency symbols:
- $, €, £, ¥, ₹, ₽, ₩ all supported
- Use Unicode symbols not HTML entities
- Example: "$50" not "$50" or "$50"
Mathematical symbols:
- ∑, π, ∞, √, ±, ×, ÷ all supported
- Use for technical or financial content
Quotes and apostrophes:
- Use Unicode smart quotes: " " ' '
- Or straight quotes: " "
- Both work, but be consistent
Dashes:
- En dash (–) for ranges
- Em dash (—) for breaks
- Hyphen (-) for compounds
- All supported, choose by style
Testing Diverse Characters
Test matrix should include:
- Accented Latin characters (café, naïve, résumé)
- Cyrillic (Привет мир)
- Chinese (你好世界)
- Japanese (こんにちは)
- Arabic (مرحبا بالعالم)
- Hebrew (שלום עולם)
- Emojis (🌍 📱 ✉️)
- Mixed scripts in one message
Tools for testing:
- Unicode test pages
- Real device testing
- Carrier-specific test numbers
- Native speaker review
Common Issues and Solutions
Issue: Characters showing as ???
- Cause: Wrong encoding (likely Latin-1 or ASCII)
- Solution: Use UTF-8 encoding throughout stack
Issue: Emojis showing as boxes
- Cause: Device doesn't support color emoji
- Solution: Use simple text or test on target devices
Issue: RTL text reversed
- Cause: Mixing LTR/RTL incorrectly
- Solution: Use proper Unicode bidirectional algorithm
Issue: Character limits exceeded
- Cause: Counting bytes instead of code points
- Solution: Use code point counting for limits
Performance Considerations
UTF-8 vs other encodings:
- UTF-8: 1-4 bytes per character, backward compatible with ASCII
- UTF-16: 2-4 bytes per character, more complex
- Always use UTF-8 unless you have specific reason not to
Database storage:
- Use utf8mb4 in MySQL (not utf8 which is limited)
- Use nvarchar in SQL Server
- Use proper encoding in PostgreSQL (UTF-8 default)
The Bottom Line
RCS supports the full Unicode standard. Use UTF-8 encoding throughout your stack, test with diverse character sets, and design for international audiences from day one.
Unicode is straightforward when you set up proper encoding from the start. Retrofitting is painful and error-prone.
Related Questions
Technical
Do You Have API Documentation for RCS Integration?
RCS API documentation, SDKs, webhooks, and developer resources.
Technical
Can I Send RCS from My Existing CRM?
Integration options for Salesforce, HubSpot, and other CRM platforms.
Technical
What Happens If a Recipient Doesn't Have RCS?
How fallback to SMS works and what recipients see.
Still have questions?
Schedule a free consultation with our RCS specialists to discuss your specific needs.
Schedule Consultation
