I had a task at work to get a web page content (using System.Net.WebRequest) in order to send the data by demand by email or in other ways.
The web page holds contents like images and more that need to be send by email in order to display the html content properly.
In order to parse the html content and look out for the images in order to download them to the server manually (a thing that will cause a lot of regex work and parsing issues), I found a great open source module (by Sharon Djabnoun, my allrise.com teammate, recommendation) that called HTML Agility Pack. “This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).”
Now, the work on the html content will be very easy and fast – the only thing that I’ll need to do is to fine the images node, download the images to the server, set the directive of the image’s source and send the email with the attachments and the fixed URL content to point the new location of the images.
You can find it here.