📅  最后修改于: 2023-12-03 15:14:40.620000             🧑  作者: Mango
Pandas is a popular data manipulation library for Python, and its core feature is the DataFrame (df) object. A DataFrame is a 2-dimensional, size-mutable, tabular data structure with labeled axes (rows and columns) that can hold any data type. It is essentially a table with rows and columns, similar to a spreadsheet or SQL table.
We can create a DataFrame from different data sources, such as a CSV file, a SQL database, or a Python dictionary. Here is an example of creating a DataFrame from a dictionary:
import pandas as pd
data = {'name': ['Alice', 'Bob', 'Charlie', 'Dave'],
'age': [25, 30, 35, 40],
'country': ['USA', 'Canada', 'UK', 'France']}
df = pd.DataFrame(data)
print(df)
This will output:
name age country
0 Alice 25 USA
1 Bob 30 Canada
2 Charlie 35 UK
3 Dave 40 France
There are several ways to access data in a DataFrame. We can select specific columns, rows, or subsets of data using indexing or boolean masks.
We can select one or multiple columns by passing a list of column names:
names = df['name']
print(names)
0 Alice
1 Bob
2 Charlie
3 Dave
Name: name, dtype: object
We can select rows by index or label using the .loc
or .iloc
accessor:
row = df.loc[0]
print(row)
name Alice
age 25
country USA
Name: 0, dtype: object
We can select a subset of data by combining the above indexing methods with boolean masks:
subset = df.loc[df['age'] > 30, ['name', 'country']]
print(subset)
name country
2 Charlie UK
3 Dave France
We can modify data in a DataFrame by assigning new values to specific cells, columns, or rows:
# Change the age of Bob from 30 to 31
df.at[1, 'age'] = 31
# Add a new column 'gender'
df['gender'] = ['F', 'M', 'M', 'M']
# Delete the 'country' column
df = df.drop(columns='country')
print(df)
name age gender
0 Alice 25 F
1 Bob 31 M
2 Charlie 35 M
3 Dave 40 M
Pandas DataFrame is a powerful tool for data manipulation and analysis. It provides a flexible and intuitive way to work with tabular data, and integrates well with other Python libraries such as Matplotlib, Scikit-learn, and Jupyter Notebook.