python - scrapy unable to extract some data from website -
i using scrapy crawl page, able simple things visible text. there texts not visible crawler , end showing spaces.
for instance seeing page sources allow me see these fields:
https://www.dropbox.com/s/f056mffmuah6uu4/screenshot%202015-07-23%2018.23.32.png?dl=0
i've tried numerous times access field through xpath , css , not able these fields after each attempt.
when try like:
response.xpath('//text()').extract()
i not able see these fields in text dump @ all.
would have idea why these fields not visible scrapy? website is: https://www.buzzbuzzhome.com/uc/units/houses/sapphire
in spider, need make additional xhr post request https://www.buzzbuzzhome.com/bbhajax/development/unitpricehistory
endpoint price history providing necessary headers , post parameters:
import json import scrapy class buzzspider(scrapy.spider): name = 'buzzbuzzhome' allowed_domains = ['buzzbuzzhome.com'] start_urls = ['https://www.buzzbuzzhome.com/uc/units/houses/sapphire'] def parse(self, response): unit_id = response.xpath("//div[@id = 'unitdetails']/@data-unit-id").extract()[0] development_url = "uc" new_relic_id = response.xpath("//script[contains(., 'xpid')]").re(r'xpid:"(.*?)"') params = {"developmenturl": development_url, "unitid": unit_id} yield scrapy.request("https://www.buzzbuzzhome.com/bbhajax/development/unitpricehistory", method="post", body=json.dumps(params), callback=self.parse_history, headers={ "accept": "*/*", "user-agent": "mozilla/5.0 (macintosh; intel mac os x 10_10_2) applewebkit/537.36 (khtml, gecko) chrome/43.0.2357.134 safari/537.36", "x-requested-with": "xmlhttprequest", "x-newrelic-id": new_relic_id, "origin": "https://www.buzzbuzzhome.com", "host": "www.buzzbuzzhome.com", 'content-type': 'application/json; charset=utf-8' }) def parse_history(self, response): row in response.css("div.row"): title = row.xpath(".//div[@class='content-title']/text()").extract()[0].strip() text = row.xpath(".//div[@class='content-text']/text()").extract()[0].strip() print title, text
prints:
05/25/2015 unit listed sold 12/18/2014 unit listed sale 11/24/2014 unit price increased 1.54% $461,990 11/04/2014 unit price increased 6.81% $454,990 10/02/2014 unit price increased 4.67% $425,990 01/22/2014 unit price increased 2.52% $406,990 12/06/2013 unit listed sale @ $396,990
Comments
Post a Comment