3 minutes
DeepDive : Concealing Text Using Unicode
I recently stumbled upon an article titled Hidden Messages in Emojis and Hacking the US Treasury and it reminded me of a piece of code i wrote back in February 2025 about Unicode character encoding and how we could hide text inside a single character to hide a message inside of it, if we wanted to. I was able to do that thanks to the Unicode Variation Selectors.
Unicode variation selectors
Variation selectors are special characters in the Unicode standard that modify the appearance of preceding characters. They are often used in situations where slight variations in character rendering are needed, such as choosing between different glyph styles.
For example in the case of emojis : (source)
| Glyph | Description | Unicode |
|---|---|---|
| ☑︎ | U+2611 BALLOT BOX WITH CHECK in explicit text style | U+2611 U+FE0E (VS15) |
| ☑️ | U+2611 BALLOT BOX WITH CHECK in explicit emoji style | U+2611 U+FE0F (VS16) |
| ✉︎ | U+2709 ENVELOPE in explicit text style | U+2709 U+FE0E (VS15) |
| ✉️ | U+2709 ENVELOPE in explicit emoji style | U+2709 U+FE0F (VS16) |
| ✔︎ | U+2714 HEAVY CHECK MARK in explicit text style | U+2714 U+FE0E (VS15) |
| ✔️ | U+2714 HEAVY CHECK MARK in explicit emoji style | U+2714 U+FE0F (VS16) |
We see that they are pretty much the same but with different formating thanks to the variation selectors.
We use these variation selectors in other contexts too, such as:
- Mathematical Symbols
- Chinese, Japanese, and Korean Characters
How can we use this ?
So I made a demonstration to showcase how we could exploit this :
How does this work ?
The concept is straightforward:
- Convert the data you want to hide into bytes.
- Map each byte to a corresponding variation selector.
- Attach these variation selectors to a base character.
- On the other side, when reading the text, you can extract the hidden bytes from the variation selectors, convert them back to their original form, and reveal the concealed data.
Let’s go through the code and try to understand it.
encode.py
def byte_to_variation_selector(byte: int) -> str:
if byte < 16:
return chr(0xFE00 + byte)
else:
return chr(0xE0100 + (byte - 16))
def encode(base: str, byte_list: list[int]) -> str:
result = base
for byte in byte_list:
result += byte_to_variation_selector(byte)
return result
if __name__ == '__main__':
conf_text = list(input("Enter the text you want to hide: ").encode('utf-8'))
char_uni = input("Enter the character you want to hide the text in: ")
encoded = encode(char_uni, conf_text)
print("Encoded length:", len(encoded))
print(f"Encoded: {encoded}")
The byte_to_variation_selector function maps each byte to a Unicode variation selector.
So basically, there are two sets of variation selectors we can exploit :
- 0xFE00 to 0xFE0F : These are Unicode variation selectors for subtle changes (for example, stylistic differences in emoji or text).
- 0xE0100 to 0xE01EF : These are extended variation selectors used for additional rendering options.
Then we just append the variations (that we selected with byte_to_variation_selector) to the base character.
decode.py
def variation_selector_to_byte(variation_selector: str) -> int | None:
codepoint = ord(variation_selector)
if 0xFE00 <= codepoint <= 0xFE0F:
return codepoint - 0xFE00
elif 0xE0100 <= codepoint <= 0xE01EF:
return codepoint - 0xE0100 + 16
else:
return None
def decode(variation_selectors: str) -> list[int]:
result = []
# Skip the base character
for ch in variation_selectors[1:]:
byte = variation_selector_to_byte(ch)
if byte is not None:
result.append(byte)
# Instead of breaking, just ignore non-variation selectors.
return result
if __name__ == '__main__':
encoded = input("Enter the encoded text: ")
decoded_bytes = decode(encoded)
try:
decoded_text = bytes(decoded_bytes).decode('utf-8')
except Exception as e:
decoded_text = f"Decoding error: {e}"
print(f"Decoded: {decoded_text}")
Decoding is basically the same process but in reverse. And this is how we can hide text in plain sight.
Conclusion
I could see this used in a CTF environment and i think it will carry out a lot of head scratches. I had a lot of fun discovering all of this and making it work and I hope you too will enjoy tinkering with this feature.