javascript - Truncate text preserving keywords -
i have text retrieved search result contains words match string that's been searched.
i need truncate text in similar way google does:
the keywords highlighted, of text not containing keywords truncated , ellipsis added, if keywords appear more once in whole text part still included. how structure regex in javascript this?
thanks
jsbin demo , quick on basic code:
var string = "lorem ipsum dummy book text of printing , text book long..."; var querystring = "book"; // want highlighted var rgxp = new regexp("(\\s*.{0,10})?("+ querystring +")(.{0,10}\\s*)?", "ig"); // if want account newlines, replace dots `.` `[\\s\\s]` var results = []; string.replace(rgxp, function(match, $1, $2, $3){ results.push( ($1?"…"+$1:"") +"<b>"+ $2 +"</b>"+ ($3?$3+"…":"") ); }); // ways use/test above: // // console.log( results.join("\n") ); // someelement.innerhtml = results.join("<br>"); // someelement.innerhtml = string.replace(rgxp, "<span>$1<b>$2</b>$3</span>");
the regexp:
let's have long string , want match book or book word appearances,
regex it:
/book/ig
(ig
(case)insensitive , global flags)
but need not book truncated portions of text before , after match. let's 10 characters before , 10 characters after:
/.{0,10}book.{0,10}/ig
.
means any character except linebreak, , {minn, maxn}
quantifier of how many of such characters want match.
to able differentiate prefixed chunk, match , suffixed chunk can use them separately (i.e: wrapping in <b>
bold tags etc.), let's use capturing group ()
/(.{0,10})(book)(.{0,10})/ig
the above match both book
, book
in
"book apartment , read book of nice little fluffy animals"
in order know when add ellipsis need make chunks "optional" let's apply lazy quantifiers ?
/(.{0,10})?(book)(.{0,10})?/ig
now capturing group might result empty. used conditional operator ?:
boolean can assert ellipsis like: ($1 ? "…"+$1 : "")
now captured like:
book apartm
nd read book of nice l
(i've bolded querystring visuals)
to fix ugly-cutted words, let's prepend (append) number *
of non whitespace characters \s
/(\s*.{0,10})?(book)(.{0,10}\s*)?/ig
the result now:
book apartment
, read book of nice little
(see above's regex details @ regex101)
let's convert regex notation regexp string (escaping backshash characters , putting our ig
flags in second argument).
new regexp("(\\s*.{0,10})?(book)(.{0,10}\\s*)?", "ig");
thanks of use of new regexp
method can pass variables into:
var querystring = "book"; var rgxp = new regexp("(\\s*.{0,10})?("+ querystring +")(.{0,10}\\s*)?", "ig");
finally retrieve , use our 3 captured groups can access them inside .replace()
string parameter using "$1"
, "$2"
, "$3"
(see demos).
or more freedom can use instead of string parameter callback function passing needed arguments .replace(rgxp, function(match, $1, $2, $3){
note:
this code not return overlapping matches. let's search in above string "an"
. it'll not return 2 matches "an" & "and" first "an"
since other 1 close the first one, , regex consumed later characters due up-to-max 10
in .{0,10}
. more info.
if source string has html tags in it, make sure (for ease sake) search trough text content (not html string) - otherwise more complicated approach necessary.
useful resources:
https://developer.mozilla.org/en/docs/web/javascript/reference/global_objects/regexp
https://developer.mozilla.org/en/docs/web/javascript/reference/global_objects/string/replace
http://www.rexegg.com/regex-quickstart.html
Comments
Post a Comment