df (1) - 芒果文档

📌 相关文章

📜 df (1)

📅 最后修改于: 2023-12-03 15:14:40.620000 🧑 作者: Mango

Pandas Dataframe (df)

Pandas is a popular data manipulation library for Python, and its core feature is the DataFrame (df) object. A DataFrame is a 2-dimensional, size-mutable, tabular data structure with labeled axes (rows and columns) that can hold any data type. It is essentially a table with rows and columns, similar to a spreadsheet or SQL table.

Creating a DataFrame

We can create a DataFrame from different data sources, such as a CSV file, a SQL database, or a Python dictionary. Here is an example of creating a DataFrame from a dictionary:

import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie', 'Dave'],
        'age': [25, 30, 35, 40],
        'country': ['USA', 'Canada', 'UK', 'France']}
df = pd.DataFrame(data)
print(df)

This will output:

       name  age country
0     Alice   25     USA
1       Bob   30  Canada
2   Charlie   35      UK
3      Dave   40  France

Accessing Data in a DataFrame

There are several ways to access data in a DataFrame. We can select specific columns, rows, or subsets of data using indexing or boolean masks.

Selecting Columns

We can select one or multiple columns by passing a list of column names:

names = df['name']
print(names)

0      Alice
1        Bob
2    Charlie
3       Dave
Name: name, dtype: object

Selecting Rows

We can select rows by index or label using the .loc or .iloc accessor:

row = df.loc[0]
print(row)

name       Alice
age           25
country      USA
Name: 0, dtype: object

Selecting Subsets of Data

We can select a subset of data by combining the above indexing methods with boolean masks:

subset = df.loc[df['age'] > 30, ['name', 'country']]
print(subset)

      name country
2  Charlie      UK
3     Dave  France

Modifying Data in a DataFrame

We can modify data in a DataFrame by assigning new values to specific cells, columns, or rows:

# Change the age of Bob from 30 to 31
df.at[1, 'age'] = 31

# Add a new column 'gender'
df['gender'] = ['F', 'M', 'M', 'M']

# Delete the 'country' column
df = df.drop(columns='country')

print(df)

      name  age gender
0    Alice   25      F
1      Bob   31      M
2  Charlie   35      M
3     Dave   40      M

Summary

Pandas DataFrame is a powerful tool for data manipulation and analysis. It provides a flexible and intuitive way to work with tabular data, and integrates well with other Python libraries such as Matplotlib, Scikit-learn, and Jupyter Notebook.