regex - Remove tweet regular expressions from string of text -
i have excel sheet filled tweets. there several entries contain @blah type of strings among other. need keep rest of text , remove @blah part. example: "@villos hey dude" needs transformed : "hey dude". ve done far.
sub macro1() ' ' macro1 macro ' dim counter integer dim strin string dim newstring string counter = 1 46 cells(counter, "e").select activecell.formular1c1 = strin stripchars (strin) newstring = stripchars(strin) activecell.formular1c1 = stripchars(strin) next counter end sub function stripchars(strin string) string dim objregex object set objregex = createobject("vbscript.regexp") objregex .pattern = "^@?(\w){1,15}$" .ignorecase = true stripchars = .replace(strin, vbnullstring) end end function
moreover there entries one: Ÿ³é‡ï¼Ÿã€€åˆã‚ã¦çŸ¥ã‚Šã¾ã—ãŸã€‚ shiftã—ãªãŒã‚‰ã‚¨ã‚¯ã‚¹ãƒ
i need them gone too! ideas?
for every line in spreadsheet run following regex on it: ^(@.+?)\s+?(.*)$
if line matches regex, information interested in in second capturing group. (usually 0 indexed position 0 contain entire match). first capturing group contain twitter handle if need too.
however, not match tweets not replies (starting @). in situation way distinguish between regular tweets , junk not interested in restrict tweet alphanumerics - may mean tweets missed if contain non-alphanumerical characters. following regex work if not issue you:
^(?:(@.+?)\s+?)?([\w\t ]+)$
Comments
Post a Comment