javascript - Strip all unwanted tags from html string but preserve whitespace in JS -


i trying strip html content of unwanted tags , return text basic formatting (ul, b, u, p etc) or plain text (but preserving new lines, spacing etc) having trouble creating catch solution let me keep structure of content pasted.

example string:

    <p class="bodytext" style="color: rgb(51, 51, 51);background-color: rgb(255, 255, 255);">         <span lang="en-gb">hello             <span class="apple-converted-space"> world,   </span>             <span class="cross-reference">                 <a href="" style="color: rgb(66, 139, 202);background-color: transparent;">cough                 </a>             </span>             <span class="apple-converted-space"></span>and             <span class="apple-converted-space"></span>             <span class="cross-reference">                 <a href="" style="color: rgb(66, 139, 202);background-color: transparent;">feverish - risk assessment</a>             </span>.             <span class="apple-converted-space"></span>         </span>     </p>     <p class="bodytext" style="color: rgb(51, 51, 51);background-color: rgb(255, 255, 255);">         <span lang="en-gb">fin.  </span>     </p> 

here plain javascript solution remove span elements within html leave inner content:

var span = document.getelementsbytagname('span'); while(span.length) {     var parent = span[ 0 ].parentnode;     while( span[ 0 ].firstchild ) {         parent.insertbefore(  span[ 0 ].firstchild, span[ 0 ] );     }      parent.removechild( span[ 0 ] ); } 

you can more using jquery, shown in example remove span tags, p, b, ul, li tags, leave inner content:

$("span, p, b, ul, li").contents().unwrap(); 

see also: remove html tag keep innerhtml

it may beneficial note anytime have 2 or more consecutive spaces, modern browser typically truncate these 1 space when display. if want preserve spacing multiple spaces, replace regularly typed space "" characters "&nbsp;" html encoded spaces. ordinary javascript has string replace method can use that, if desired.

edit: if wish remove html tags within javascript string, try following:

mystring.replace(/<(?:.|\n)*?>/gm, ''); 

see also: strip html text javascript


Comments

Popular posts from this blog

python - pip install -U PySide error -

arrays - C++ error: a brace-enclosed initializer is not allowed here before ‘{’ token -

cytoscape.js - How to add nodes to Dagre layout with Cytoscape -