Sounds obvious, right? It turns out that browsers silently encode text inside HTML.
What is mean. Let’s say we have this piece of code:
<div>
<b>bold</b> & safe
</div>
Reading with .innerHTML
often gives:
'\n <b>bold</b> & safe\n'
Reading with .textContent
gives:
\n bold & safe\n
So how to got the raw text from that element? what’s the solution? Using a <textarea>
:
function decodeHtmlEntities(html) {
const textarea = document.createElement('textarea')
textarea.innerHTML = html
return textarea.value
}
Why does <textarea>
do the trick?
(from GPT):
<textarea>
treats.innerHTML
as raw HTML and.value
as plain text. Settingtextarea.innerHTML = "<b>"
stores encoded. Readingtextarea.value
returns decoded:<b>
No rendering. No DOM parsing. Just plain text logic. Native, fast HTML decoder — no custom parser needed.
Amazing, right?