Sounds obvious, right? It turns out that browsers silently encode text inside HTML.
What is mean. Let’s say we have this piece of code:
<div>
<b>bold</b> & safe
</div>
Reading with .innerHTML often gives:
'\n <b>bold</b> & safe\n'
Reading with .textContent gives:
\n bold & safe\n
So how to got the raw text from that element? what’s the solution? Using a <textarea>:
function decodeHtmlEntities(html) {
const textarea = document.createElement('textarea')
textarea.innerHTML = html
return textarea.value
}
Why does <textarea> do the trick?
(from GPT):
<textarea>treats.innerHTMLas raw HTML and.valueas plain text. Settingtextarea.innerHTML = "<b>"stores encoded. Readingtextarea.valuereturns decoded:<b>No rendering. No DOM parsing. Just plain text logic. Native, fast HTML decoder — no custom parser needed.
Amazing, right?