1. Cheerio:
是nodejs模块,其实现基于核心jquery。它与非常简单且一致的DOM模型一起使用。 Cheerio被广泛用于Web抓取工作,有时还用于自动化任务。它基于jquery,非常快捷。 Cheerio包装了Parse5解析器,并且有足够的能力解析任何类型的HTML和XML文档。
2.木偶:
Puppeteer被广泛用于自动化浏览器任务,并且只能与谷歌chrome无头浏览器(即chrome)一起使用。 Puppeteer也可以用于Web抓取任务,但另一方面,它的功能非常强大,并具有cheerio模块中不可用的许多功能。
Cheerio和Puppeteer之间的区别:
S.No. | Cheerio | Puppeteer |
---|---|---|
1. | It was developed and maintained by CheerioJS. | It is developed in maintained by Google. |
2. | It is not capable to parse Javascript. | It is capable to parse Javascript. |
3. | Websites built with react or angular cannot be scraped with this. | Websites built with react or angular can be scraped with this. |
4. | It does not provide functionalities like taking screenshot and making pdf. | One can take a screenshot and save pdf with the puppeteer. |
5. | It is faster as compared to the puppeteer. | It is slower as compared to cheeriojs. |
6. | Cheerio is just a DOM parser that parses HTML and XML. | While puppeteer brings the whole browser engine. |
7. | Cheerio is a perfect fit for scraping tasks. | Puppeteer is mostly used for browser automation. |
8. | Cheerio can work with chrome. | Puppeteer require chromium to run its script. By default chromium is headless. |
9. | Cheerio can only works with raw HTML data. | It supports raw HTML, XML, and capable of executing Javacript. |