1.木偶:
是Google开发的用于自动执行浏览器的工具。 Puppeteer非常强大,同时,使用非常方便。与beautifulsoup不同,它使整个浏览器引擎API都可以使用,从而使人们可以使用许多高级功能,而不仅仅是Web抓取
2. Beautifulsoup:
是用Python编写的库。当使用HTML标记和X-paths进行Web抓取工作时,它被证明更有用,更快捷。它解析HTML和XML文档。
Puppeteer和Beautifulsoup之间的区别:
S.No. | Puppeteer | Beautifulsoup |
---|---|---|
1. | It is developed and maintained by Google. | It was created by Leonard Richardson. |
2. | It is written in Javascript. | This library is written in Python. |
3. | It brings the whole browser engine API. | It only parses the HTML and XML documents. |
4. | It is slow as compared to beautifulsoup in terms of execution but it can be negligible. | It is slightly faster as compared to Puppeteer. |
5. | It is used for browser automation and scraping work. | It is mainly used for scraping data and not for making complex automations. |
6. | It provides high-level API to control Chrome or Chromium over the DevTools Protocol. | It does not provide high-level API to control Chrome or Chromium over the DevTools Protocol. |
7. | It can parse Javascript along with HTML. | It cannot parse Javascript. |
8. | It is a Nodejs library or module. | It is a python library. |
9. | It supports only chrome and chromium. | It supports any browser that runs python scripts. |