pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.
you can import it to your program by typing:
import pandas as pd
Creating a Series by passing a list of values, letting pandas create a default integer index.
examples for code:
s = pd.Series([1, 4, np.nan, 6, 8])
output
0 1.0
1 4.0
2 NaN
3 6.0
4 8.0
dtype: float64
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=list('PQRS'))
output
DataFrame.to_numpy() gives a NumPy representation of the underlying data. Note that this can be an expensive operation when your DataFrame has columns with different data types, which comes down to a fundamental difference between pandas and NumPy: NumPy arrays have one dtype for the entire array, while pandas DataFrames have one dtype per column.
code examples:
df.to_numpy()
output
array([[ 0.4691, -0.2829, -1.5091, -1.1356],
[ 1.2121, -0.1732, 0.1192, -1.0442],
[-0.8618, -2.1046, -0.4949, 1.0718],
[ 0.7216, -0.7068, -1.0396, 0.2719],
[-0.425 , 0.567 , 0.2762, -1.0874],
[-0.6737, 0.1136, -1.4784, 0.525 ]])`
Note that DataFrame.to_numpy() does not include the index or column labels in the output.
df.loc[dates[0]]
Output
A 0.469112
B -0.282863
C -1.509059
D -1.135632
Name: 2013-01-01 00:00:00, dtype: float64
Select via the position of the passed integers:
df.iloc[3]
Output
A 0.721555
B -0.706771
C -1.039575
D 0.271860
Name: 2013-01-04 00:00:00, dtype: float64
pandas primarily uses the value np.nan to represent missing data. It is by default not included in computations.
Operations in general exclude missing data.
Applying functions to the data:
df.apply(np.cumsum)
output
A B C D F
2013-01-01 0.000000 0.000000 -1.509059 5 NaN
2013-01-02 1.212112 -0.173215 -1.389850 10 1.0
2013-01-03 0.350263 -2.277784 -1.884779 15 3.0
2013-01-04 1.071818 -2.984555 -2.924354 20 6.0
2013-01-05 0.646846 -2.417535 -2.648122 25 10.0
2013-01-06 -0.026844 -2.303886 -4.126549 30 15.0
Series is equipped with a set of string processing methods in the str attribute that make it easy to operate on each element of the array, as in the code snippet below. Note that pattern-matching in str generally uses regular expressions by default (and in some cases always uses them).
pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"])
Output
C bar foo
A B
one A 2.395985 -1.202872
B 1.395433 -1.814470
C -0.392670 -0.055224 three A -0.595447 NaN
B NaN 1.928123
C 0.166599 NaN two A NaN 0.007207
B 1.552825 NaN
C NaN 1.018601