panda (1) - 芒果文档

📌 相关文章

📜 panda (1)

📅 最后修改于: 2023-12-03 15:18:13.495000 🧑 作者: Mango

Introduction to Pandas

Pandas is a Python library for data manipulation and analysis. It provides a powerful and flexible data structure called DataFrame, which is similar to a table in SQL or a spreadsheet in Excel. Pandas allows you to easily import, export, clean, transform, and analyze data from various sources, including CSV files, SQL databases, and web APIs.

Getting Started

To start using Pandas, you need to first install it. You can install it using pip, a package manager for Python:

pip install pandas

Once installed, you can import Pandas in your Python script or Jupyter notebook:

import pandas as pd

Creating a DataFrame

You can create a DataFrame from a list, a dictionary, or a CSV file. Here's an example of creating a DataFrame from a dictionary:

data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
        'age': [25, 30, 35, 40],
        'salary': [50000, 60000, 70000, 80000]}
df = pd.DataFrame(data)
print(df)

Output:

       name  age  salary
0     Alice   25   50000
1       Bob   30   60000
2   Charlie   35   70000
3     David   40   80000

Importing and Exporting Data

Pandas supports various file formats, including CSV, Excel, JSON, SQL, and more. Here's an example of importing a CSV file:

df = pd.read_csv('data.csv')
print(df.head())

Output:

   id     name  age  salary
0   1    Alice   25   50000
1   2      Bob   30   60000
2   3  Charlie   35   70000
3   4    David   40   80000

You can also export a DataFrame to a CSV file:

df.to_csv('output.csv', index=False)

Data Cleaning and Transformation

Pandas provides various functions for cleaning and transforming data, including removing duplicates, filling missing values, grouping data, and more. Here's an example of removing duplicates:

df = df.drop_duplicates()
print(df)

Output:

   id     name  age  salary
0   1    Alice   25   50000
1   2      Bob   30   60000
2   3  Charlie   35   70000
3   4    David   40   80000

You can also fill missing values with a default value:

df = df.fillna(0)
print(df)

Output:

   id     name  age  salary
0   1    Alice   25   50000
1   2      Bob   30   60000
2   3  Charlie   35   70000
3   4    David   40   80000

Data Analysis

Pandas provides various functions for analyzing data, including filtering rows, sorting data, calculating aggregates, and more. Here's an example of filtering rows based on a condition:

df = df[df['salary'] > 60000]
print(df)

Output:

   id     name  age  salary
2   3  Charlie   35   70000
3   4    David   40   80000

You can also sort data by one or more columns:

df = df.sort_values(by=['age', 'salary'], ascending=[True, False])
print(df)

Output:

   id     name  age  salary
2   3  Charlie   35   70000
3   4    David   40   80000

Conclusion

Pandas is a powerful library for data manipulation and analysis in Python. It provides a rich set of functions for importing, exporting, cleaning, transforming, and analyzing data, making it an essential tool for data scientists and analysts.