javascript - Truncate text preserving keywords -
i have text retrieved search result contains words match string that's been searched.
i need truncate text in similar way google does: 
the keywords highlighted, of text not containing keywords truncated , ellipsis added, if keywords appear more once in whole text part still included. how structure regex in javascript this?
thanks
jsbin demo , quick on basic code:
var string = "lorem ipsum dummy book text of printing , text book long..."; var querystring = "book"; // want highlighted var rgxp = new regexp("(\\s*.{0,10})?("+ querystring +")(.{0,10}\\s*)?", "ig"); // if want account newlines, replace dots `.` `[\\s\\s]` var results = []; string.replace(rgxp, function(match, $1, $2, $3){ results.push( ($1?"…"+$1:"") +"<b>"+ $2 +"</b>"+ ($3?$3+"…":"") ); }); // ways use/test above: // // console.log( results.join("\n") ); // someelement.innerhtml = results.join("<br>"); // someelement.innerhtml = string.replace(rgxp, "<span>$1<b>$2</b>$3</span>"); the regexp:
let's have long string , want match book or book word appearances,
regex it:
/book/ig (ig (case)insensitive , global flags)
but need not book truncated portions of text before , after match. let's 10 characters before , 10 characters after:
/.{0,10}book.{0,10}/ig . means any character except linebreak, , {minn, maxn} quantifier of how many of such characters want match.
to able differentiate prefixed chunk, match , suffixed chunk can use them separately (i.e: wrapping in <b> bold tags etc.), let's use capturing group ()
/(.{0,10})(book)(.{0,10})/ig the above match both book , book in
"book apartment , read book of nice little fluffy animals"
in order know when add ellipsis need make chunks "optional" let's apply lazy quantifiers ?
/(.{0,10})?(book)(.{0,10})?/ig now capturing group might result empty. used conditional operator ?: boolean can assert ellipsis like: ($1 ? "…"+$1 : "")
now captured like:
book apartm
nd read book of nice l
(i've bolded querystring visuals)
to fix ugly-cutted words, let's prepend (append) number * of non whitespace characters \s
/(\s*.{0,10})?(book)(.{0,10}\s*)?/ig the result now:
book apartment
, read book of nice little
(see above's regex details @ regex101)
let's convert regex notation regexp string (escaping backshash characters , putting our ig flags in second argument).
new regexp("(\\s*.{0,10})?(book)(.{0,10}\\s*)?", "ig"); thanks of use of new regexp method can pass variables into:
var querystring = "book"; var rgxp = new regexp("(\\s*.{0,10})?("+ querystring +")(.{0,10}\\s*)?", "ig"); finally retrieve , use our 3 captured groups can access them inside .replace() string parameter using "$1", "$2" , "$3" (see demos).
or more freedom can use instead of string parameter callback function passing needed arguments .replace(rgxp, function(match, $1, $2, $3){
note:
this code not return overlapping matches. let's search in above string "an". it'll not return 2 matches "an" & "and" first "an" since other 1 close the first one, , regex consumed later characters due up-to-max 10 in .{0,10}. more info.
if source string has html tags in it, make sure (for ease sake) search trough text content (not html string) - otherwise more complicated approach necessary.
useful resources:
https://developer.mozilla.org/en/docs/web/javascript/reference/global_objects/regexp
https://developer.mozilla.org/en/docs/web/javascript/reference/global_objects/string/replace
http://www.rexegg.com/regex-quickstart.html

Comments
Post a Comment