📅  最后修改于: 2023-12-03 15:33:25.010000             🧑  作者: Mango
Pandas is a popular data manipulation library for Python. It provides data structures for efficiently storing and manipulating large datasets. One of the key data structures in pandas is the DataFrame, which is similar to a spreadsheet or SQL table.
To install pandas, you can use pip, the Python package manager:
!pip install pandas
You can create a DataFrame from a dictionary:
import pandas as pd
data = {
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'gender': ['F', 'M', 'M']
}
df = pd.DataFrame(data)
You can also create a DataFrame from a CSV file:
df = pd.read_csv('data.csv')
You can display the top and bottom rows of a DataFrame using the head()
and tail()
methods:
print(df.head())
print(df.tail())
You can also select columns and rows using indexing:
# selecting a single column
print(df['name'])
# selecting multiple columns
print(df[['name', 'age']])
# selecting rows by condition
print(df[df['age'] > 30])
Pandas provides many methods for manipulating DataFrames. Here are a few examples:
# adding a new column
df['city'] = ['New York', 'Chicago', 'Los Angeles']
# renaming a column
df = df.rename(columns={'gender': 'sex'})
# dropping rows with missing values
df = df.dropna()
# grouping by a column and aggregating
df.groupby('sex').agg({'age': 'mean'})
Pandas is a powerful library for working with data in Python. DataFrames provide a convenient and efficient way to manipulate large datasets. With its rich set of features, Pandas is a must-have tool for any data scientist or programmer working with data.