Python|熊猫 dataframe.filter()

Python是一种用于进行数据分析的出色语言，主要是因为以数据为中心的Python包的奇妙生态系统。 Pandas就是其中之一，它使导入和分析数据变得更加容易。

Pandas dataframe.filter()函数用于根据指定索引中的标签对数据帧的行或列进行子集。请注意，此例程不会根据其内容过滤数据框。过滤器应用于索引的标签。

Syntax: DataFrame.filter(items=None, like=None, regex=None, axis=None)

Parameters:
items : List of info axis to restrict to (must not all be present)
like : Keep info axis where “arg in col == True”
regex : Keep info axis with re.search(regex, col) == True
axis : The axis to filter on. By default this is the info axis, ‘index’ for Series, ‘columns’ for DataFrame

Returns : same type as input object

编程需要懂一点英语

项目、like 和正则表达式参数强制互斥。轴默认为使用 [] 索引时使用的信息轴。

如需 CSV 文件的链接，请单击此处

示例 #1：使用filter()函数过滤掉数据框的任意三列。

# importing pandas as pd
import pandas as pd
  
# Creating the dataframe 
df = pd.read_csv("nba.csv")
  
# Print the dataframe
df

现在过滤“姓名”、“大学”和“工资”列。

# applying filter function 
df.filter(["Name", "College", "Salary"])

输出：

示例 #2：使用filter()函数对名称中包含字母“a”或“A”的数据框中的所有列进行子集化。

注意： filter()函数也将正则表达式作为其参数之一。

# importing pandas as pd
import pandas as pd
  
# Creating the dataframe 
df = pd.read_csv("nba.csv")
  
# Using regular expression to extract all
# columns which has letter 'a' or 'A' in its name.
df.filter(regex ='[aA]')

输出：

正则表达式“[aA]”查找名称中包含“a”或“A”的所有列名。