📅  最后修改于: 2023-12-03 15:03:04.009000             🧑  作者: Mango
Munshi Premchand was an Indian writer famous for his works in Hindi literature. He is considered one of the pioneers of modern Hindi literature.
In this article, we will explore how we can use Python to analyze some of Munshi Premchand's works.
To start the analysis, we first need to collect data. We can use web scraping to collect data from websites such as premkahani.in and hindisamay.com.
import requests
from bs4 import BeautifulSoup
# URL of the website to be scraped
url = 'https://premkahani.in/'
# Send request to the website
response = requests.get(url)
# Parse HTML content with BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')
# Collect all links from the website
links = []
for link in soup.find_all('a'):
href = link.get('href')
if href and href.startswith('https://premkahani.in/'):
links.append(href)
# Visit each link and collect the text content
texts = []
for link in links:
response = requests.get(link)
soup = BeautifulSoup(response.content, 'html.parser')
texts.append(soup.get_text())
# Save the text content to a file
with open('premchand.txt', 'w', encoding='utf-8') as f:
f.write('\n\n'.join(texts))
The above code collects all the links from the website and then visits each link to collect the text content. It saves the collected text to a file named premchand.txt
.
Now that we have collected the data, we can use Python to analyze it. Let's start by calculating the word frequency of the collected text.
import re
# Load text content from file
with open('premchand.txt', 'r', encoding='utf-8') as f:
text = f.read()
# Clean the text by removing punctuations and special characters
cleaned_text = re.sub('[^a-zA-Z0-9\n\.]', ' ', text)
# Split the text into words
words = cleaned_text.split()
# Calculate word frequency
word_frequency = {}
for word in words:
word_frequency[word.lower()] = word_frequency.get(word.lower(), 0) + 1
# Print the 10 most frequent words
for word, frequency in sorted(word_frequency.items(), key=lambda x: x[1], reverse=True)[:10]:
print(f'{word}: {frequency}')
The above code loads the text content from the file and cleans it by removing punctuations and special characters. It then splits the text into words and calculates the frequency of each word. Finally, it prints the 10 most frequent words.
In this article, we have seen how we can use Python to collect and analyze data related to Munshi Premchand's works. There are many more things that can be done with the collected data, such as sentiment analysis and text summarization. Python offers many tools and libraries to make these tasks easier and faster.