regex - Remove tweet regular expressions from string of text -


i have excel sheet filled tweets. there several entries contain @blah type of strings among other. need keep rest of text , remove @blah part. example: "@villos hey dude" needs transformed : "hey dude". ve done far.

sub macro1() ' ' macro1 macro ' dim counter integer dim strin string dim newstring string  counter = 1 46     cells(counter, "e").select     activecell.formular1c1 = strin     stripchars (strin)     newstring = stripchars(strin)     activecell.formular1c1 = stripchars(strin) next counter end sub  function stripchars(strin string) string dim objregex object set objregex = createobject("vbscript.regexp")  objregex  .pattern = "^@?(\w){1,15}$" .ignorecase = true stripchars = .replace(strin, vbnullstring) end end function 

moreover there entries one: Ÿ³é‡ï¼Ÿã€€åˆã‚ã¦çŸ¥ã‚Šã¾ã—ãŸã€‚ shiftã—ãªãŒã‚‰ã‚¨ã‚¯ã‚¹ãƒ

i need them gone too! ideas?

for every line in spreadsheet run following regex on it: ^(@.+?)\s+?(.*)$

if line matches regex, information interested in in second capturing group. (usually 0 indexed position 0 contain entire match). first capturing group contain twitter handle if need too.

regex demo here.

however, not match tweets not replies (starting @). in situation way distinguish between regular tweets , junk not interested in restrict tweet alphanumerics - may mean tweets missed if contain non-alphanumerical characters. following regex work if not issue you:
^(?:(@.+?)\s+?)?([\w\t ]+)$

demo 2.


Comments

Popular posts from this blog

python - pip install -U PySide error -

arrays - C++ error: a brace-enclosed initializer is not allowed here before ‘{’ token -

cytoscape.js - How to add nodes to Dagre layout with Cytoscape -