Pandas can do this right out of the box, saving you from having to parse the html yourself. read_html()
extracts all tables from your html and puts them in a list of dataframes. to_csv()
can be used to convert each dataframe to a csv file. For the web page in your example, the relevant table is the last one, which is why I used df_list[-1]
in the code below.
import requests
import pandas as pd
url="http://www.ffiec.gov/census/report.aspx?year=2011&state=01&report=demographic&msa=11500"
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print(df)
df.to_csv('my data.csv')
It’s simple enough to do in one line, if you prefer:
pd.read_html(requests.get(<url>).content)[-1].to_csv(<csv file>)
P.S. Just make sure you have lxml
, html5lib
, and BeautifulSoup4
packages installed in advance.