android - How to parse HTML table using jsoup? -
i trying parse html using jsoup. first time working jsoup , being little hard me. html table trying parse below. html table complicated because of many tr , td , don't know how proceed select name of each column in table 1: "group block" (table 0 topline , don't need it).
i need select "bdd, bbgen, bbtest, conn, cpu, disk, files, hobbitd, http, info, memory, msgs, ports, procs, trends" set them in textview tag in xml file. possible using jsoup?
i have i'm doing conexión url follows:
string username = "user"; string password = "pass"; string login = username + ":" + password; string base64login = new string(android.util.base64.encode(login.getbytes(), android.util.base64.no_wrap)); document document = jsoup.connect("http://example.com").header("authorization", "basic " + base64login).get();
html code:
<table summary="topline" width="100%"> <tr><td height=16> </td></tr> <!-- menu bar --> <tr> <td valign=middle align=left width="30%"> <font face="arial, helvetica" size="+1" color="silver"><b>xymon</b></font </td> <td valign=middle align=center width="40%"> <center><font face="arial, helvetica" size="+1" color="silver"><b>current status</b></font></center> </td> <td valign=middle align=right width="30%"> <font face="arial, helvetica" size="+1" color="silver"><b>thu jul 23 16:05:06 2015</b></font> </td> </tr> <tr> <td colspan=3> <hr width="100%"> </td> </tr> </table> <br> <a name=hosts-blk> </a> <center><table summary="group block" border=0 cellpadding=2> <tr><td valign=middle rowspan=2><center><font color="#fffff0" size="+1"> </font></center></td> <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?bbd"><font color="#87a9e5" size="-1"><b>bbd</b></font></a> </td> <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?bbgen"><font color="#87a9e5" size="-1"><b>bbgen</b></font></a> </td> <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?bbtest"><font color="#87a9e5" size="-1"><b>bbtest</b></font></a> </td> <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?conn"><font color="#87a9e5" size="-1"><b>conn</b></font></a> </td> <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?cpu"><font color="#87a9e5" size="-1"><b>cpu</b></font></a> </td> <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?disk"><font color="#87a9e5" size="-1"><b>disk</b></font></a> </td> <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?files"><font color="#87a9e5" size="-1"><b>files</b></font></a> </td> <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?hobbitd"><font color="#87a9e5" size="-1"><b>hobbitd</b></font></a> </td> <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?http"><font color="#87a9e5" size="-1"><b>http</b></font></a> </td> <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?info"><font color="#87a9e5" size="-1"><b>info</b></font></a> </td> <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?memory"><font color="#87a9e5" size="-1"><b>memory</b></font></a> </td> <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?msgs"><font color="#87a9e5" size="-1"><b>msgs</b></font></a> </td> <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?ports"><font color="#87a9e5" size="-1"><b>ports</b></font></a> </td> <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?procs"><font color="#87a9e5" size="-1"><b>procs</b></font></a> </td> <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?trends"><font color="#87a9e5" size="-1"><b>trends</b></font></a> </td> </tr> <tr><td colspan=15><hr width="100%"></td></tr>
edit:
i tried doesn't work:
arraylist<string> groupblock = new arraylist<string>(); object[] objplace; element table = document.select("table").get(1); //select second table: "group block" elements rows = table.select("tr"); (int = 0; < rows.size(); i++) { element row = rows.get(i); elements col = row.select("td"); if (col.get(1).text().equals("bbd")) { //check 1 field moment groupblock.add(col.get(1).text()); } } objplace = groupblock.toarray();
then do:
textview txtgroupblock = (textview) findviewbyid(r.id.txtgroupblock); txtgroupblock.settext(""); (int = 0; < objplace.length; i++) { txtgroupblock.append(objplace[i].tostring() + " "); }
the error:
07-23 21:26:36.454: e/androidruntime(330): fatal exception: asynctask #1 07-23 21:26:36.454: e/androidruntime(330): java.lang.runtimeexception: error occured while executing doinbackground() 07-23 21:26:36.454: e/androidruntime(330): @ android.os.asynctask$3.done(asynctask.java:200) 07-23 21:26:36.454: e/androidruntime(330): @ java.util.concurrent.futuretask$sync.innersetexception(futuretask.java:274) 07-23 21:26:36.454: e/androidruntime(330): @ java.util.concurrent.futuretask.setexception(futuretask.java:125) 07-23 21:26:36.454: e/androidruntime(330): @ java.util.concurrent.futuretask$sync.innerrun(futuretask.java:308) 07-23 21:26:36.454: e/androidruntime(330): @ java.util.concurrent.futuretask.run(futuretask.java:138) 07-23 21:26:36.454: e/androidruntime(330): @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1088) 07-23 21:26:36.454: e/androidruntime(330): @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:581) 07-23 21:26:36.454: e/androidruntime(330): @ java.lang.thread.run(thread.java:1019) 07-23 21:26:36.454: e/androidruntime(330): caused by: java.lang.indexoutofboundsexception: invalid index 1, size 1 07-23 21:26:36.454: e/androidruntime(330): @ java.util.arraylist.throwindexoutofboundsexception(arraylist.java:257) 07-23 21:26:36.454: e/androidruntime(330): @ java.util.arraylist.get(arraylist.java:311) 07-23 21:26:36.454: e/androidruntime(330): @ org.jsoup.select.elements.get(elements.java:544) 07-23 21:26:36.454: e/androidruntime(330): @ activities.monitorapp.mainactivity$update.doinbackground(mainactivity.java:211) 07-23 21:26:36.454: e/androidruntime(330): @ activities.monitorapp.mainactivity$update.doinbackground(mainactivity.java:1) 07-23 21:26:36.454: e/androidruntime(330): @ android.os.asynctask$2.call(asynctask.java:185) 07-23 21:26:36.454: e/androidruntime(330): @ java.util.concurrent.futuretask$sync.innerrun(futuretask.java:306)
edit 2:
now have parallel problem. have before have following html code (just follows previous html code, same html file):
... <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?procs"><font color="#87a9e5" size="-1"><b>procs</b></font></a> </td> <td align=center valign=bottom width=45> <a href="/hobbit-cgi/hobbitcolumn.sh?trends"><font color="#87a9e5" size="-1"><b>trends</b></font></a> </td> </tr> <tr><td colspan=15><hr width="100%"></td></tr> <tr class=line> <td nowrap><a name="hostname1"> </a> <font size="+1" color="#ffffcc" face="tahoma, arial, helvetica"><span title="127.0.0.1">hostname1</span></font><td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname1.&service=bbd"><img src="/hobbit/gifs/static/green.gif" alt="bbd:green:268d04h25m" title="bbd:green:268d04h25m" height="16" width="16" border=0></a></td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname1&service=bbgen"><img src="/hobbit/gifs/static/green.gif" alt="bbgen:green:268d04h24m" title="bbgen:green:268d04h24m" height="16" width="16" border=0></a></td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname1&service=bbtest"><img src="/hobbit/gifs/static/green.gif" alt="bbtest:green:268d04h25m" title="bbtest:green:268d04h25m" height="16" width="16" border=0></a></td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname1&service=conn"><img src="/hobbit/gifs/static/green.gif" alt="conn:green:268d04h25m" title="conn:green:268d04h25m" height="16" width="16" border=0></a></td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname1&service=cpu"><img src="/hobbit/gifs/static/green.gif" alt="cpu:green:169d00h15m" title="cpu:green:169d00h15m" height="16" width="16" border=0></a></td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname1&service=disk"><img src="/hobbit/gifs/static/green.gif" alt="disk:green:268d04h25m" title="disk:green:268d04h25m" height="16" width="16" border=0></a></td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname1&service=files"><img src="/hobbit/gifs/static/clear.gif" alt="files:clear:268d04h25m" title="files:clear:268d04h25m" height="16" width="16" border=0></a></td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname1&service=hobbitd"><img src="/hobbit/gifs/static/green.gif" alt="hobbitd:green:169d01h05m" title="hobbitd:green:169d01h05m" height="16" width="16" border=0></a></td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname1&service=http"><img src="/hobbit/gifs/static/green.gif" alt="http:green:268d04h19m" title="http:green:268d04h19m" height="16" width="16" border=0></a></td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname1&service=info"><img src="/hobbit/gifs/static/green.gif" alt="info:green:127.0.0.1" title="info:green:127.0.0.1" height="16" width="16" border=0></a></td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname1&service=memory"><img src="/hobbit/gifs/static/green.gif" alt="memory:green:268d04h25m" title="memory:green:268d04h25m" height="16" width="16" border=0></a></td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname1&service=msgs"><img src="/hobbit/gifs/static/green.gif" alt="msgs:green:268d04h20m" title="msgs:green:268d04h20m" height="16" width="16" border=0></a></td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname1&service=ports"><img src="/hobbit/gifs/static/clear.gif" alt="ports:clear:268d04h25m" title="ports:clear:268d04h25m" height="16" width="16" border=0></a></td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname1&service=procs"><img src="/hobbit/gifs/static/clear.gif" alt="procs:clear:268d04h25m" title="procs:clear:268d04h25m" height="16" width="16" border=0></a></td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname1&service=trends"><img src="/hobbit/gifs/static/green.gif" alt="trends:green:" title="trends:green:" height="16" width="16" border=0></a></td> </tr> <tr class=line> <td nowrap><a name="hostname2"> </a> <font size="+1" color="#ffffcc" face="tahoma, arial, helvetica"><span title="127.0.0.2">hostname2</span></font><td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname2&service=bbd"><img src="/hobbit/gifs/static/red.gif" alt="bbd:red:16d06h46m" title="bbd:red:16d06h46m" height="16" width="16" border=0></a></td> <td align=center>-</td> <td align=center>-</td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname2&service=conn"><img src="/hobbit/gifs/static/green.gif" alt="conn:green:16d06h46m" title="conn:green:16d06h46m" height="16" width="16" border=0></a></td> <td align=center>-</td> <td align=center>-</td> <td align=center>-</td> <td align=center>-</td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname2&service=http"><img src="/hobbit/gifs/static/green.gif" alt="http:green:16d06h46m" title="http:green:16d06h46m" height="16" width="16" border=0></a></td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname2&service=info"><img src="/hobbit/gifs/static/green.gif" alt="info:green:127.0.0.2" title="info:green:127.0.0.2" height="16" width="16" border=0></a></td> <td align=center>-</td> <td align=center>-</td> <td align=center>-</td> <td align=center>-</td> <td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname2&service=trends"><img src="/hobbit/gifs/static/green.gif" alt="trends:green:" title="trends:green:" height="16" width="16" border=0></a></td> </tr> </table></center><br> <br><br>
in case have parse 2 hostnames (hostname1 , hostname2) put in separates textview problem hostname can change name in future. in addition, have parse "img src" in each td, example:
<td align=center><a href="/hobbit-cgi/bb-hostsvc.sh?host=hostname1&service=http"><img src="/hobbit/gifs/static/green.gif" alt="http:green:268d04h19m" title="http:green:268d04h19m" height="16" width="16" border=0></a></td>
i need parse /hobbit/gifs/static/green.gif have append rest of url @ begining: http://example.com/hobbit/gifs/static/green.gif image.
i know once image have like:
inputstream input = new java.net.url(imgsrc).openstream(); bitmap = bitmapfactory.decodestream(input); imageview logoimg = (imageview) findviewbyid(r.id.logo); logoimg.setimagebitmap(bitmap);
but miss me in previous steps...some idea? don't know how start...
the problem here
if (col.get(1).text().equals("bbd")) { groupblock.add(col.get(i).text()); }
you try access col.get(i)
, may out of bounds, error tells also.
if change index want, should fine. maybe this:
arraylist<string> groupblock = new arraylist<string>(); object[] objplace; element table = document.select("table").get(1); //select second table: "group block" elements rows = table.select("tr"); (int = 0; < rows.size(); i++) { element row = rows.get(i); elements cols = row.select("td"); (element col : cols){ switch(col.text()){ case "bbd": case "bbgen": case "bbtest": //...more cases if need them groupblock.add(col.select("a").first().attr("href")); system.out.println(col.text()); break; default: break; } } } objplace = groupblock.toarray();
i not sure need dom, think idea.
Comments
Post a Comment