使用Python从文本中提取位置
在本文中,我们将了解如何使用Python从文本中提取位置。
在处理文本时,要求可以是检测接收文本中的城市、地区、州和国家以及它们之间的关系。这对地理研究非常有用。在本文中,我们将使用locationtagger库。
需要一些基于语法的规则和统计建模方法的文本挖掘通常使用NER(命名实体识别)算法进行。从 NER 中提取的实体可以是个人、地点、组织或产品的名称。 locationtagger 库是从所有其他存在的实体中进一步标记和过滤地点的副产品。
安装:
要安装此模块,请在终端中键入以下命令。
pip install locationtagger
安装完成后,需要使用代码下载几个nltk模块。
Python3
import nltk
import spacy
# essential entity models downloads
nltk.downloader.download('maxent_ne_chunker')
nltk.downloader.download('words')
nltk.downloader.download('treebank')
nltk.downloader.download('maxent_treebank_pos_tagger')
nltk.downloader.download('punkt')
nltk.download('averaged_perceptron_tagger')
Python3
import locationtagger
# initializing sample text
sample_text = "India has very rich and vivid culture\
widely spread from Kerala to Nagaland to Haryana to Maharashtra. " \
"Delhi being capital with Mumbai financial capital.\
Can be said better than some western cities such as " \
" Munich, London etc. Pakistan and Bangladesh share its borders"
# extracting entities.
place_entity = locationtagger.find_locations(text = sample_text)
# getting all countries
print("The countries in text : ")
print(place_entity.countries)
# getting all states
print("The states in text : ")
print(place_entity.regions)
# getting all cities
print("The cities in text : ")
print(place_entity.cities)
Python3
import locationtagger
# initializing sample text
sample_text = "India has very rich and vivid culture widely\
spread from Kerala to Nagaland to Haryana to Maharashtra. " \
"Mumbai being financial capital can be said better\
than some western cities such as " \
" Lahore, Canberra etc. Pakistan and Nepal share its borders"
# extracting entities.
place_entity = locationtagger.find_locations(text = sample_text)
# getting all country regions
print("The countries regions in text : ")
print(place_entity.country_regions)
# getting all country cities
print("The countries cities in text : ")
print(place_entity.country_cities)
# getting all other countries
print("All other countries in text : ")
print(place_entity.other_countries)
# getting all region cities
print("The region cities in text : ")
print(place_entity.region_cities)
# getting all other regions
print("All other regions in text : ")
print(place_entity.other_regions)
# getting all other entities
print("All other entities in text : ")
print(place_entity.other)
同样从命令行:
python -m spacy download en_core_web_sm
示例 1:从文本打印国家、城市和地区。
各种函数可用于从文本中获取城市、国家、地区等。
使用的功能:
- locationtagger.find_location(text) :返回带有位置信息的实体。 “text”参数将文本作为输入。
- entity.countries :提取文本中的所有国家。
- entity.regions :提取文本中的所有状态。
- entity.cities :提取文本中的所有城市。
代码:
蟒蛇3
import locationtagger
# initializing sample text
sample_text = "India has very rich and vivid culture\
widely spread from Kerala to Nagaland to Haryana to Maharashtra. " \
"Delhi being capital with Mumbai financial capital.\
Can be said better than some western cities such as " \
" Munich, London etc. Pakistan and Bangladesh share its borders"
# extracting entities.
place_entity = locationtagger.find_locations(text = sample_text)
# getting all countries
print("The countries in text : ")
print(place_entity.countries)
# getting all states
print("The states in text : ")
print(place_entity.regions)
# getting all cities
print("The cities in text : ")
print(place_entity.cities)
输出 :
示例 2:提取位置关系
在这个例子中,讨论了执行获取城市、地区和州之间关系的任务的各种函数。
使用的功能:
- entity.country_regions :提取在文本中找到区域的国家/地区。
- entity.country_cities :提取在文本中找到城市的国家。
- entity.other_countries :提取其地区或城市出现在文本中的所有国家/地区列表。
- entity.region_cities :提取在文本中找到其城市的区域。
- entity.other_regions :提取其城市出现在文本中的所有区域列表。
- entity.other :所有未被识别为地名的实体,都被提取到这个。
蟒蛇3
import locationtagger
# initializing sample text
sample_text = "India has very rich and vivid culture widely\
spread from Kerala to Nagaland to Haryana to Maharashtra. " \
"Mumbai being financial capital can be said better\
than some western cities such as " \
" Lahore, Canberra etc. Pakistan and Nepal share its borders"
# extracting entities.
place_entity = locationtagger.find_locations(text = sample_text)
# getting all country regions
print("The countries regions in text : ")
print(place_entity.country_regions)
# getting all country cities
print("The countries cities in text : ")
print(place_entity.country_cities)
# getting all other countries
print("All other countries in text : ")
print(place_entity.other_countries)
# getting all region cities
print("The region cities in text : ")
print(place_entity.region_cities)
# getting all other regions
print("All other regions in text : ")
print(place_entity.other_regions)
# getting all other entities
print("All other entities in text : ")
print(place_entity.other)
输出: