javascript - Truncate text preserving keywords -

- April 15, 2011

i have text retrieved search result contains words match string that's been searched.

i need truncate text in similar way google does:

the keywords highlighted, of text not containing keywords truncated , ellipsis added, if keywords appear more once in whole text part still included. how structure regex in javascript this?

thanks

jsbin demo , quick on basic code:

var string = "lorem ipsum dummy book text of printing , text book long..."; var querystring = "book"; // want highlighted  var rgxp = new regexp("(\\s*.{0,10})?("+ querystring +")(.{0,10}\\s*)?", "ig"); // if want account newlines, replace dots `.` `[\\s\\s]` var results = [];  string.replace(rgxp, function(match, $1, $2, $3){   results.push( ($1?"…"+$1:"") +"<b>"+ $2 +"</b>"+ ($3?$3+"…":"") ); });  // ways use/test above: // // console.log( results.join("\n") ); // someelement.innerhtml = results.join("<br>"); // someelement.innerhtml = string.replace(rgxp, "<span>$1<b>$2</b>$3</span>");

use example: jsbin demo

the regexp:

let's have long string , want match book or book word appearances,
regex it:

/book/ig

^{(ig (case)insensitive , global flags)}

but need not book truncated portions of text before , after match. let's 10 characters before , 10 characters after:

/.{0,10}book.{0,10}/ig

^{. means any character except linebreak, , {minn, maxn} quantifier of how many of such characters want match.}

to able differentiate prefixed chunk, match , suffixed chunk can use them separately (i.e: wrapping in <b> bold tags etc.), let's use capturing group ()

/(.{0,10})(book)(.{0,10})/ig

the above match both book , book in

"book apartment , read book of nice little fluffy animals"

in order know when add ellipsis need make chunks "optional" let's apply lazy quantifiers ?

/(.{0,10})?(book)(.{0,10})?/ig

^{now capturing group might result empty. used conditional operator ?: boolean can assert ellipsis like: ($1 ? "…"+$1 : "")}

now captured like:

book apartm
nd read book of nice l

^{(i've bolded querystring visuals)}

to fix ugly-cutted words, let's prepend (append) number * of non whitespace characters \s

/(\s*.{0,10})?(book)(.{0,10}\s*)?/ig

the result now:

book apartment
, read book of nice little

(see above's regex details @ regex101)

let's convert regex notation regexp string (escaping backshash characters , putting our ig flags in second argument).

new regexp("(\\s*.{0,10})?(book)(.{0,10}\\s*)?", "ig");

thanks of use of new regexp method can pass variables into:

var querystring = "book"; var rgxp = new regexp("(\\s*.{0,10})?("+ querystring +")(.{0,10}\\s*)?", "ig");

finally retrieve , use our 3 captured groups can access them inside .replace() string parameter using "$1", "$2" , "$3" (see demos).
or more freedom can use instead of string parameter callback function passing needed arguments .replace(rgxp, function(match, $1, $2, $3){

note:

this code not return overlapping matches. let's search in above string "an". it'll not return 2 matches "an" & "and" first "an" since other 1 close the first one, , regex consumed later characters due up-to-max 10 in .{0,10}. more info.

if source string has html tags in it, make sure (for ease sake) search trough text content (not html string) - otherwise more complicated approach necessary.

useful resources:

https://developer.mozilla.org/en/docs/web/javascript/reference/global_objects/regexp
https://developer.mozilla.org/en/docs/web/javascript/reference/global_objects/string/replace
http://www.rexegg.com/regex-quickstart.html

Search This Blog

Click Hand

javascript - Truncate text preserving keywords -

the regexp:

Comments

Post a Comment

Popular posts from this blog

python - pip install -U PySide error -

arrays - C++ error: a brace-enclosed initializer is not allowed here before ‘{’ token -

cytoscape.js - How to add nodes to Dagre layout with Cytoscape -