c# - how to dynamically generate HTML code using .NET's WebBrowser or mshtml.HTMLDocument? -


most of answers have read concerning subject point either system.windows.forms.webbrowser class or com interface mshtml.htmldocument microsoft html object library assembly.

the webbrowser class did not lead me anywhere. following code fails retrieve html code rendered web browser:

[stathread] public static void main() {     webbrowser wb = new webbrowser();     wb.navigate("https://www.google.com/#q=where+am+i");      wb.documentcompleted += delegate(object sender, webbrowserdocumentcompletedeventargs e)     {         mshtml.ihtmldocument2 doc = (mshtml.ihtmldocument2)wb.document.domdocument;         foreach (ihtmlelement element in doc.all)         {                     system.diagnostics.debug.writeline(element.outerhtml);         }          };     form f = new form();     f.controls.add(wb);     application.run(f); }  

the above example. i'm not interested in finding workaround figuring out name of town located. need understand how retrieve kind of dynamically generated data programmatically.

(call new system.net.webclient.downloadstring("https://www.google.com/#q=where+am+i"), save resulting text somewhere, search name of town located, , let me know if able find it.)

but yet when access "https://www.google.com/#q=where+am+i" web browser (ie or firefox) see name of town written on web page. in firefox, if right click on name of town , select "inspect element (q)" see name of town written in html code happens quite different raw html returned webclient.

after got tired of playing system.net.webbrowser, decided give mshtml.htmldocument shot, end same useless raw html:

public static void main() {     mshtml.ihtmldocument2 doc = (mshtml.ihtmldocument2)new mshtml.htmldocument();     doc.write(new system.net.webclient().downloadstring("https://www.google.com/#q=where+am+i"));      foreach (ihtmlelement e in doc.all)     {             system.diagnostics.debug.writeline(e.outerhtml);     } }  

i suppose there must elegant way obtain kind of information. right can think of add webbrowser control form, have navigate url in question, send keys "clrl, a", , copy whatever happens displayed on page clipboard , attempt parse it. that's horrible solution, though.

i'd contribute code alexei's answer. few points:

  • strictly speaking, may not possible determine when page has finished rendering 100% probability. pages quite complex , use continuous ajax updates. can quite close, polling page's current html snapshot changes , checking webbrowser.isbusy property. that's loaddynamicpage below.

  • some time-out logic has present on top of above, in case page rendering never-ending (note cancellationtokensource).

  • async/await great tool coding this, gives linear code flow our asynchronous polling logic, simplifies it.

  • it's important enable html5 rendering using browser feature control, webbrowser runs in ie7 emulation mode default. that's setfeaturebrowseremulation below.

  • this winforms app, concept can converted console app.

  • this logic works on url you've mentioned: https://www.google.com/#q=where+am+i.

using microsoft.win32; using system; using system.componentmodel; using system.diagnostics; using system.threading; using system.threading.tasks; using system.windows.forms;  namespace wbfetchpage {     public partial class mainform : form     {         public mainform()         {             setfeaturebrowseremulation();             initializecomponent();             this.load += mainform_load;         }          // start task         async void mainform_load(object sender, eventargs e)         {             try             {                 var cts = new cancellationtokensource(10000); // cancel in 10s                 var html = await loaddynamicpage("https://www.google.com/#q=where+am+i", cts.token);                 messagebox.show(html.substring(0, 1024) + "..." ); // it's long!             }             catch (exception ex)             {                 messagebox.show(ex.message);             }         }          // navigate , download          async task<string> loaddynamicpage(string url, cancellationtoken token)         {             // navigate , await documentcompleted             var tcs = new taskcompletionsource<bool>();             webbrowserdocumentcompletedeventhandler handler = (s, arg) =>                 tcs.trysetresult(true);              using (token.register(() => tcs.trysetcanceled(), usesynchronizationcontext: true))             {                 this.webbrowser.documentcompleted += handler;                 try                  {                                this.webbrowser.navigate(url);                     await tcs.task; // wait documentcompleted                 }                                 {                     this.webbrowser.documentcompleted -= handler;                 }             }              // root element             var documentelement = this.webbrowser.document.getelementsbytagname("html")[0];              // poll current html changes asynchronosly             var html = documentelement.outerhtml;             while (true)             {                 // wait asynchronously, throw if cancellation requested                 await task.delay(500, token);                   // continue polling if webbrowser still busy                 if (this.webbrowser.isbusy)                     continue;                   var htmlnow = documentelement.outerhtml;                 if (html == htmlnow)                     break; // no changes detected, end poll loop                  html = htmlnow;             }              // consider page rendered              token.throwifcancellationrequested();             return html;         }          // enable html5 (assuming we're running ie10+)         // more info: https://stackoverflow.com/a/18333982/1768303         static void setfeaturebrowseremulation()         {             if (licensemanager.usagemode != licenseusagemode.runtime)                 return;             var appname = system.io.path.getfilename(system.diagnostics.process.getcurrentprocess().mainmodule.filename);             registry.setvalue(@"hkey_current_user\software\microsoft\internet explorer\main\featurecontrol\feature_browser_emulation",                 appname, 10000, registryvaluekind.dword);         }     } } 

Comments

Popular posts from this blog

python - pip install -U PySide error -

arrays - C++ error: a brace-enclosed initializer is not allowed here before ‘{’ token -

apache - setting document root in antoher partition on ubuntu -