📜  更快的抓取 - 无论代码示例

📅  最后修改于: 2022-03-11 14:56:05.607000             🧑  作者: Mango

代码示例1
import requests
from bs4 import BeautifulSoup

BASE_URL = "https://news.ycombinator.com/"
STORY_LINKS = []

for i in range(10):
    resp = requests.get(f"{BASE_URL}news?p={i}")
    soup = BeautifulSoup(resp.content, "html.parser")
    stories = soup.find_all("a", attrs={"class":"storylink"})
    links = [x["href"] for x in stories if "http" in x["href"]]
    STORY_LINKS += links
    time.sleep(0.25)

print(len(STORY_LINKS))

for url in STORY_LINKS[:3]:
    print(url)