📅  最后修改于: 2023-12-03 15:40:55.172000             🧑  作者: Mango
这个 Shell 脚本可以帮助程序员遍历所有内部 URL,并将任何错误报告到“traverse.errors”文件中。下面是脚本的详细介绍和用法说明。
该脚本包含以下功能:
该脚本需要使用以下命令运行:
$ bash traverse-urls.sh <url>
其中,<url>
是网站的 URL,该脚本将从这个 URL 开始遍历所有内部链接。
脚本运行时会显示进度条,告诉你已经处理的 URL 数量和错误数量。如果有任何错误,它们将被保存在“traverse.errors”文件中。
#!/bin/bash
# Colors for output messages
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[0;33m'
NC='\033[0m' # No Color
# Create traverse.errors file or clear its contents
echo -n > traverse.errors
function traverse_url() {
local url="$1"
local depth="$2"
# Print progress message
echo -ne "\rProcessed ${GREEN}$processed_urls_cnt${NC} URLs, found ${YELLOW}$errors_cnt${NC} errors." >&2
# Check if URL is a relative or absolute URL
# and change it to an absolute one
if [[ $url == *"${full_url}"* ]]; then
url="${url}"
elif [[ $url == "/"* ]]; then
url="${full_url}${url}"
else
return
fi
# Remove any query parameters and fragments from URL
url="${url%%\?*}"
url="${url%%\#*}"
# Check if URL has already been processed
if grep -qFx "${url}" traverse.urls; then
return
fi
# Add URL to traverse.urls list
echo "${url}" >> traverse.urls
# Check if URL exists
if ! curl -s --head "${url}" | head -n 1 | grep "HTTP/1.[01] [23].."; then
# Save error message to traverse.errors file
echo -e "${RED}[ERROR] ${url}${NC}" >> traverse.errors
((errors_cnt++))
return
fi
# Recursively traverse child links
while read -r child_url; do
((processed_urls_cnt++))
traverse_url "${child_url}" "$((depth + 1))"
done < <(curl -s "${url}" | \
grep -oE '<a [^>]*href="[^"]*"' | \
cut -d'"' -f2 | \
sed -e 's/\&/\&/g' -e 's/\&[gl]t;/</g' -e 's/\&[gr]t;/>/g' -e 's/\"/"/g' -e 's/\'/'\''/g' | \
sort -u)
}
# Program start
processed_urls_cnt=0
errors_cnt=0
full_url="$1"
# Check if URL starts with "http" or "https"
if [[ $full_url != "http"* ]] && [[ $full_url != "https"* ]]; then
echo "[ERROR] Please provide a valid URL starting with http or https." >&2
exit 1
fi
echo "Starting traversal of ${full_url} ..." >&2
# Traverse initial URL
traverse_url "${full_url}" 0
# Print final progress message
echo -e "\rFinished processing ${GREEN}$processed_urls_cnt${NC} URLs, found ${YELLOW}$errors_cnt${NC} errors." >&2
注意:返回的代码片段中的注释已经足够详细了,本文档无法对其中的每个段落进行进一步说明。