1.啦啦队:
其实现是基于核心jquery的nodejs模块。它使用非常简单且一致的 DOM 模型。 Cheerio 广泛用于网络抓取工作,有时还用于自动化任务。它非常快速和快速,因为它基于 jquery。 Cheerio 环绕 Parse5 解析器,足以解析任何类型的 HTML 和 XML 文档。
2.傀儡师:
Puppeteer 被广泛用于自动化浏览器任务,并且只能与谷歌浏览器无头浏览器(即 Chromium)一起使用。 Puppeteer 也可以用于网页抓取任务,但另一方面,它非常强大,并且充满了cheerio 模块中没有的许多功能
Cheerio和Puppeteer的区别:
S.No. | Cheerio | Puppeteer |
---|---|---|
1. | It was developed and maintained by CheerioJS. | It is developed in maintained by Google. |
2. | It is not capable to parse Javascript. | It is capable to parse Javascript. |
3. | Websites built with react or angular cannot be scraped with this. | Websites built with react or angular can be scraped with this. |
4. | It does not provide functionalities like taking screenshot and making pdf. | One can take a screenshot and save pdf with the puppeteer. |
5. | It is faster as compared to the puppeteer. | It is slower as compared to cheeriojs. |
6. | Cheerio is just a DOM parser that parses HTML and XML. | While puppeteer brings the whole browser engine. |
7. | Cheerio is a perfect fit for scraping tasks. | Puppeteer is mostly used for browser automation. |
8. | Cheerio can work with chrome. | Puppeteer require chromium to run its script. By default chromium is headless. |
9. | Cheerio can only works with raw HTML data. | It supports raw HTML, XML, and capable of executing JavaScript. |