Raw text from HTML Element

Sounds obvious, right? It turns out that browsers silently encode text inside HTML.

What is mean. Let’s say we have this piece of code:

<div>
    <b>bold</b> & safe
</div>

Reading with .innerHTML often gives:

'\n  <b>bold</b> &amp; safe\n'

Reading with .textContent gives:

\n  bold & safe\n

So how to got the raw text from that element? what’s the solution? Using a <textarea>:

function decodeHtmlEntities(html) {
    const textarea = document.createElement('textarea')
    textarea.innerHTML = html
    return textarea.value
}

Why does `<textarea>` do the trick?

(from GPT): <textarea> treats .innerHTML as raw HTML and .value as plain text. Setting textarea.innerHTML = "<b>" stores encoded. Reading textarea.value returns decoded: <b> No rendering. No DOM parsing. Just plain text logic. Native, fast HTML decoder — no custom parser needed.

Amazing, right?

Why does <textarea> do the trick?

Comments

Why does `<textarea>` do the trick?