Selecting attribute values with html Agility Pack

You can directly grab the attribute if you use the HtmlNavigator instead. //Load document from some html string HtmlDocument hdoc = new HtmlDocument(); hdoc.LoadHtml(htmlContent); //Load navigator for current document HtmlNodeNavigator navigator = (HtmlNodeNavigator)hdoc.CreateNavigator(); //Get value from given xpath string xpath = “//div[@id=’topslot’]/a/img/@src”; string val = navigator.SelectSingleNode(xpath).Value;

Parsing HTML page with HtmlAgilityPack

There are a number of ways to select elements using the agility pack. Let’s assume we have defined our HtmlDocument as follows: string html = @”<TD class=texte width=””50%””> <DIV align=right>Name :<B> </B></DIV></TD> <TD width=””50%””> <INPUT class=box value=John maxLength=16 size=16 name=user_name> </TD> <TR vAlign=center>”; HtmlDocument htmlDoc = new HtmlDocument(); htmlDoc.LoadHtml(html); 1. Simple LINQ We could use … Read more

HTML Agility Pack strip tags NOT IN whitelist

heh, apparently I ALMOST found an answer in a blog post someone made…. using System.Collections.Generic; using System.Linq; using HtmlAgilityPack; namespace Wayloop.Blog.Core.Markup { public static class HtmlSanitizer { private static readonly IDictionary<string, string[]> Whitelist; static HtmlSanitizer() { Whitelist = new Dictionary<string, string[]> { { “a”, new[] { “href” } }, { “strong”, null }, { “em”, … Read more

HTML Agility pack – parsing tables

How about something like: Using HTML Agility Pack HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(@”<html><body><p><table id=””foo””><tr><th>hello</th></tr><tr><td>world</td></tr></table></body></html>”); foreach (HtmlNode table in doc.DocumentNode.SelectNodes(“//table”)) { Console.WriteLine(“Found: ” + table.Id); foreach (HtmlNode row in table.SelectNodes(“tr”)) { Console.WriteLine(“row”); foreach (HtmlNode cell in row.SelectNodes(“th|td”)) { Console.WriteLine(“cell: ” + cell.InnerText); } } } Note that you can make it prettier with LINQ-to-Objects if … Read more