How do I parse an HTML table with Nokogiri?

#!/usr/bin/ruby1.8 require ‘nokogiri’ require ‘pp’ html = <<-EOS (The HTML from the question goes here) EOS doc = Nokogiri::HTML(html) rows = doc.xpath(‘//table/tbody[@id=”threadbits_forum_251″]/tr’) details = rows.collect do |row| detail = {} [ [:title, ‘td[3]/div[1]/a/text()’], [:name, ‘td[3]/div[2]/span/a/text()’], [:date, ‘td[4]/text()’], [:time, ‘td[4]/span/text()’], [:number, ‘td[5]/a/text()’], [:views, ‘td[6]/text()’], ].each do |name, xpath| detail[name] = row.at_xpath(xpath).to_s.strip end detail end pp details … Read more

How to handle IncompleteRead: in python

The link you included in your question is simply a wrapper that executes urllib’s read() function, which catches any incomplete read exceptions for you. If you don’t want to implement this entire patch, you could always just throw in a try/catch loop where you read your links. For example: try: page = urllib2.urlopen(urls).read() except httplib.IncompleteRead, … Read more

What should I do if socket.setdefaulttimeout() is not working?

While socket.setsocketimeout will set the default timeout for new sockets, if you’re not using the sockets directly, the setting can be easily overwritten. In particular, if the library calls socket.setblocking on its socket, it’ll reset the timeout. urllib2.open has a timeout argument, hovewer, there is no timeout in urllib2.Request. As you’re using mechanize, you should … Read more

mechanize python click a button

clicking a type=”button” in a pure html form does nothing. For it to do anything, there must be javascript involved. And mechanize doesn’t run javascript. So your options are: Read the javascript yourself and simulate with mechanize what it would be doing Use spidermonkey to run the javascript code I’d do the first one, since … Read more

adding directory to sys.path /PYTHONPATH

This is working as documented. Any paths specified in PYTHONPATH are documented as normally coming after the working directory but before the standard interpreter-supplied paths. sys.path.append() appends to the existing path. See here and here. If you want a particular directory to come first, simply insert it at the head of sys.path: import sys sys.path.insert(0,’/path/to/mod_directory’) … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)