ScrapySharp in english :)

January 27, 2012 at 10:58 AMromain

 

Seeing the number of English reading my previous article, thanks to Google Analytics, I decided to make the effort to blog in English.
As you will see, my English is not perfect so sorry for the errors:)

 

ScrapySharp has a Web Client able to simulate a real Web browser (handle referrer, cookies ...)

Html parsing has to be as natural as possible. So I like to use CSS Selectors and Linq.

This framework wraps HtmlAgilityPack.

 

Source code is here: https://bitbucket.org/rflechner/scrapysharp

Blog is here: http://www.romcyber.com/post/2011/09/07/ScrapySharp.aspx

You can install it in your .Net project using Nuget.

 

nuget_scrapysharp

 

 

Basic examples of CssSelect usages:

 

var divs = html.CssSelect("div");  //all div elements

var nodes = html.CssSelect("div.content"); //all div elements with css class ‘content’

var nodes = html.CssSelect("div.widget.monthlist"); //all div elements with the both css class

var nodes = html.CssSelect("#postPaging"); //all HTML elements with the id postPaging

var nodes = html.CssSelect("div#postPaging.testClass"); // all HTML elements with the id postPaging and css class testClass

 

var nodes = html.CssSelect("div.content > p.para"); //p elements who are direct children of div elements with css class ‘content’

 

var nodes = html.CssSelect("input[type=text].login"); // textbox with css class login

 

We can also select ancestors of elements:

var nodes = html.CssSelect("p.para").CssSelectAncestors("div.content > div.widget");

 

 

Posted in: C# | Scraping

Tags:

Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading