📅  最后修改于: 2023-12-03 14:45:31.167000             🧑  作者: Mango
Pandas is a powerful and easy-to-use open-source data manipulation and analysis tool for Python. With pandas, you can easily import and export data from a variety of sources, clean and transform data, and perform complex data analysis tasks.
Pandas can be installed using pip, the Python package manager. To install pandas using pip, open your command prompt and type:
pip install pandas
To use pandas in your Python program, you need to import it using the import statement:
import pandas as pd
In this case, we use the alias "pd" to refer to pandas. This is a common convention in the Python community.
One of the core features of pandas is the DataFrame. A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or a SQL table.
You can create a DataFrame using various methods. The most common approach is to create a DataFrame from a 2-dimensional array/list, a dictionary of one-dimensional arrays/lists, or from a CSV file.
import pandas as pd
# create a DataFrame from a 2-dimensional array
data = [['John', 23], ['Mary', 21], ['Tom', 25]]
df = pd.DataFrame(data, columns=['Name', 'Age'])
# create a DataFrame from a dictionary
data = {'Name': ['John', 'Mary', 'Tom'], 'Age': [23, 21, 25]}
df = pd.DataFrame(data)
# create a DataFrame from a CSV file
df = pd.read_csv('data.csv')
There are many ways to access data in a DataFrame. You can access a column by its name or position, and you can access a row by its index or position.
import pandas as pd
# create a sample DataFrame
data = {'Name': ['John', 'Mary', 'Tom'], 'Age': [23, 21, 25]}
df = pd.DataFrame(data)
# access a column by its name
print(df['Name'])
# access a column by its position
print(df.iloc[:, 0])
# access a row by its index
print(df.loc[0])
# access a row by its position
print(df.iloc[0])
Pandas provides many built-in methods for manipulating data, such as filtering, sorting, grouping, and aggregating. Here are some examples:
import pandas as pd
# create a sample DataFrame
data = {'Name': ['John', 'Mary', 'Tom'], 'Age': [23, 21, 25]}
df = pd.DataFrame(data)
# filter rows by a condition
df2 = df[df['Age'] > 22]
# sort rows by a column
df3 = df.sort_values('Age', ascending=False)
# group rows by a column and calculate the mean of another column
df4 = df.groupby('Name')['Age'].mean()
# aggregate rows by a column and calculate multiple statistics
df5 = df.groupby('Name').agg({'Age': ['min', 'max', 'mean']})
Pandas is a powerful and flexible data analysis tool for Python, with a variety of useful functions and features. With pandas, you can handle and manipulate data easily, making data analysis tasks faster and more efficient.