What’s new in 1.3.0 (??)

These are the changes in pandas 1.3.0. See Release notes for a full changelog including other versions of pandas.

Warning

When reading new Excel 2007+ (.xlsx) files, the default argument engine=None to read_excel() will now result in using the openpyxl engine in all cases when the option io.excel.xlsx.reader is set to "auto". Previously, some cases would use the xlrd engine instead. See What’s new 1.2.0 for background on this change.

Enhancements

Custom HTTP(s) headers when reading csv or json files

When reading from a remote URL that is not handled by fsspec (ie. HTTP and HTTPS) the dictionary passed to storage_options will be used to create the headers included in the request. This can be used to control the User-Agent header or send other custom headers (GH36688). For example:

In [1]: headers = {"User-Agent": "pandas"}

In [2]: df = pd.read_csv(
   ...:     "https://download.bls.gov/pub/time.series/cu/cu.item",
   ...:     sep="\t",
   ...:     storage_options=headers
   ...: )
   ...: 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-2dee62044b9a> in <module>
      2     "https://download.bls.gov/pub/time.series/cu/cu.item",
      3     sep="\t",
----> 4     storage_options=headers
      5 )

~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    608     kwds.update(kwds_defaults)
    609 
--> 610     return _read(filepath_or_buffer, kwds)
    611 
    612 

~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    460 
    461     # Create the parser.
--> 462     parser = TextFileReader(filepath_or_buffer, **kwds)
    463 
    464     if chunksize or iterator:

~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    817             self.options["has_index_names"] = kwds["has_index_names"]
    818 
--> 819         self._engine = self._make_engine(self.engine)
    820 
    821     def close(self):

~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1048             )
   1049         # error: Too many arguments for "ParserBase"
-> 1050         return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
   1051 
   1052     def _failover_to_python(self):

~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1865 
   1866         # open handles
-> 1867         self._open_handles(src, kwds)
   1868         assert self.handles is not None
   1869         for key in ("storage_options", "encoding", "memory_map", "compression"):

~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/io/parsers.py in _open_handles(self, src, kwds)
   1366             compression=kwds.get("compression", None),
   1367             memory_map=kwds.get("memory_map", False),
-> 1368             storage_options=kwds.get("storage_options", None),
   1369         )
   1370 

~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    561         compression=compression,
    562         mode=mode,
--> 563         storage_options=storage_options,
    564     )
    565 

~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/io/common.py in _get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
    285         if storage_options:
    286             raise ValueError(
--> 287                 "storage_options passed with file object or non-fsspec file path"
    288             )
    289         req = urlopen(filepath_or_buffer)

ValueError: storage_options passed with file object or non-fsspec file path

Read and write XML documents

We added I/O support to read and render shallow versions of XML documents with pandas.read_xml() and DataFrame.to_xml(). Using lxml as parser, both XPath 1.0 and XSLT 1.0 is available. (GH27554)

In [1]: xml = """<?xml version='1.0' encoding='utf-8'?>
   ...: <data>
   ...:  <row>
   ...:     <shape>square</shape>
   ...:     <degrees>360</degrees>
   ...:     <sides>4.0</sides>
   ...:  </row>
   ...:  <row>
   ...:     <shape>circle</shape>
   ...:     <degrees>360</degrees>
   ...:     <sides/>
   ...:  </row>
   ...:  <row>
   ...:     <shape>triangle</shape>
   ...:     <degrees>180</degrees>
   ...:     <sides>3.0</sides>
   ...:  </row>
   ...:  </data>"""

In [2]: df = pd.read_xml(xml)
In [3]: df
Out[3]:
      shape  degrees  sides
0    square      360    4.0
1    circle      360    NaN
2  triangle      180    3.0

In [4]: df.to_xml()
Out[4]:
<?xml version='1.0' encoding='utf-8'?>
<data>
  <row>
    <index>0</index>
    <shape>square</shape>
    <degrees>360</degrees>
    <sides>4.0</sides>
  </row>
  <row>
    <index>1</index>
    <shape>circle</shape>
    <degrees>360</degrees>
    <sides/>
  </row>
  <row>
    <index>2</index>
    <shape>triangle</shape>
    <degrees>180</degrees>
    <sides>3.0</sides>
  </row>
</data>

For more, see io.xml in the user guide on IO tools.

Styler Upgrades

We provided some focused development on Styler, including altering methods to accept more universal CSS language for arguments, such as 'color:red;' instead of [('color', 'red')] (GH39564). This is also added to the built-in methods to allow custom CSS highlighting instead of default background coloring (GH40242). Enhancements to other built-in methods include extending the Styler.background_gradient() method to shade elements based on a given gradient map and not be restricted only to values in the DataFrame (GH39930 GH22727 GH28901). Additional built-in methods such as Styler.highlight_between() and Styler.highlight_quantile() have been added (GH39821 and GH40926).

The Styler.apply() now consistently allows functions with ndarray output to allow more flexible development of UDFs when axis is None 0 or 1 (GH39393).

Styler.set_tooltips() is a new method that allows adding on hover tooltips to enhance interactive displays (GH35643). Styler.set_td_classes(), which was recently introduced in v1.2.0 (GH36159) to allow adding specific CSS classes to data cells, has been made as performant as Styler.apply() and Styler.applymap() (GH40453), if not more performant in some cases. The overall performance of HTML render times has been considerably improved to match DataFrame.to_html() (GH39952 GH37792 GH40425).

The Styler.format() has had upgrades to easily format missing data, precision, and perform HTML escaping (GH40437 GH40134). There have been numerous other bug fixes to properly format HTML and eliminate some inconsistencies (GH39942 GH40356 GH39807 GH39889 GH39627)

Styler has also been compatible with non-unique index or columns, at least for as many features as are fully compatible, others made only partially compatible (GH41269).

Documentation has also seen major revisions in light of new features (GH39720 GH39317 GH40493)

DataFrame constructor honors copy=False with dict

When passing a dictionary to DataFrame with copy=False, a copy will no longer be made (GH32960)

In [3]: arr = np.array([1, 2, 3])

In [4]: df = pd.DataFrame({"A": arr, "B": arr.copy()}, copy=False)

In [5]: df
Out[5]: 
   A  B
0  1  1
1  2  2
2  3  3

df["A"] remains a view on arr:

In [6]: arr[0] = 0

In [7]: assert df.iloc[0, 0] == 0
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-7-85b1f54195bb> in <module>
----> 1 assert df.iloc[0, 0] == 0

AssertionError: 

The default behavior when not passing copy will remain unchanged, i.e. a copy will be made.

Centered Datetime-Like Rolling Windows

When performing rolling calculations on DataFrame and Series objects with a datetime-like index, a centered datetime-like window can now be used (GH38780). For example:

In [8]: df = pd.DataFrame(
   ...:     {"A": [0, 1, 2, 3, 4]}, index=pd.date_range("2020", periods=5, freq="1D")
   ...: )
   ...: 

In [9]: df
Out[9]: 
            A
2020-01-01  0
2020-01-02  1
2020-01-03  2
2020-01-04  3
2020-01-05  4

In [10]: df.rolling("2D", center=True).mean()
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-10-1c8c1fd36122> in <module>
----> 1 df.rolling("2D", center=True).mean()

~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/core/generic.py in rolling(self, window, min_periods, center, win_type, on, axis, closed)
  11251             on=on,
  11252             axis=axis,
> 11253             closed=closed,
  11254         )
  11255 

~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/core/window/rolling.py in __init__(self, obj, window, min_periods, center, win_type, axis, on, closed, **kwargs)
    111         self.win_freq = None
    112         self.axis = obj._get_axis_number(axis) if axis is not None else None
--> 113         self.validate()
    114 
    115     @property

~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/core/window/rolling.py in validate(self)
   1902             if self.center:
   1903                 raise NotImplementedError(
-> 1904                     "center is not implemented for "
   1905                     "datetimelike and offset based windows"
   1906                 )

NotImplementedError: center is not implemented for datetimelike and offset based windows

Other enhancements

  • Rolling and Expanding now support a method argument with a 'table' option that performs the windowing operation over an entire DataFrame. See ref:window.overview for performance and functional benefits (GH15095, GH38995)

  • Added MultiIndex.dtypes() (GH37062)

  • Added end and end_day options for origin in DataFrame.resample() (GH37804)

  • Improve error message when usecols and names do not match for read_csv() and engine="c" (GH29042)

  • Improved consistency of error message when passing an invalid win_type argument in Window (GH15969)

  • pandas.read_sql_query() now accepts a dtype argument to cast the columnar data from the SQL database based on user input (GH10285)

  • Improved integer type mapping from pandas to SQLAlchemy when using DataFrame.to_sql() (GH35076)

  • to_numeric() now supports downcasting of nullable ExtensionDtype objects (GH33013)

  • Add support for dict-like names in MultiIndex.set_names and MultiIndex.rename (GH20421)

  • pandas.read_excel() can now auto detect .xlsb files (GH35416)

  • pandas.ExcelWriter now accepts an if_sheet_exists parameter to control the behaviour of append mode when writing to existing sheets (GH40230)

  • Rolling.sum(), Expanding.sum(), Rolling.mean(), Expanding.mean(), ExponentialMovingWindow.mean(), Rolling.median(), Expanding.median(), Rolling.max(), Expanding.max(), Rolling.min(), and Expanding.min() now support Numba execution with the engine keyword (GH38895, GH41267)

  • DataFrame.apply() can now accept NumPy unary operators as strings, e.g. df.apply("sqrt"), which was already the case for Series.apply() (GH39116)

  • DataFrame.apply() can now accept non-callable DataFrame properties as strings, e.g. df.apply("size"), which was already the case for Series.apply() (GH39116)

  • DataFrame.applymap() can now accept kwargs to pass on to func (GH39987)

  • Disallow DataFrame indexer for iloc for Series.__getitem__() and DataFrame.__getitem__(), (GH39004)

  • Series.apply() can now accept list-like or dictionary-like arguments that aren’t lists or dictionaries, e.g. ser.apply(np.array(["sum", "mean"])), which was already the case for DataFrame.apply() (GH39140)

  • DataFrame.plot.scatter() can now accept a categorical column as the argument to c (GH12380, GH31357)

  • Styler.set_tooltips() allows on hover tooltips to be added to styled HTML dataframes (GH35643, GH21266, GH39317, GH39708, GH40284)

  • Styler.set_table_styles() amended to optionally allow certain css-string input arguments (GH39564)

  • Styler.apply() now more consistently accepts ndarray function returns, i.e. in all cases for axis is 0, 1 or None (GH39359)

  • Styler.apply() and Styler.applymap() now raise errors if wrong format CSS is passed on render (GH39660)

  • Styler.format() adds keyword argument escape for optional HTML escaping (GH40437)

  • Styler.background_gradient() now allows the ability to supply a specific gradient map (GH22727)

  • Styler.clear() now clears Styler.hidden_index and Styler.hidden_columns as well (GH40484)

  • Builtin highlighting methods in Styler have a more consistent signature and css customisability (GH40242)

  • Styler.highlight_between() added to list of builtin styling methods (GH39821)

  • Series.loc.__getitem__() and Series.loc.__setitem__() with MultiIndex now raising helpful error message when indexer has too many dimensions (GH35349)

  • pandas.read_stata() and StataReader support reading data from compressed files.

  • Add support for parsing ISO 8601-like timestamps with negative signs to pandas.Timedelta() (GH37172)

  • Add support for unary operators in FloatingArray (GH38749)

  • RangeIndex can now be constructed by passing a range object directly e.g. pd.RangeIndex(range(3)) (GH12067)

  • round() being enabled for the nullable integer and floating dtypes (GH38844)

  • pandas.read_csv() and pandas.read_json() expose the argument encoding_errors to control how encoding errors are handled (GH39450)

  • GroupBy.any() and GroupBy.all() use Kleene logic with nullable data types (GH37506)

  • GroupBy.any() and GroupBy.all() return a BooleanDtype for columns with nullable data types (GH33449)

  • GroupBy.rank() now supports object-dtype data (GH38278)

  • Constructing a DataFrame or Series with the data argument being a Python iterable that is not a NumPy ndarray consisting of NumPy scalars will now result in a dtype with a precision the maximum of the NumPy scalars; this was already the case when data is a NumPy ndarray (GH40908)

  • Add keyword sort to pivot_table() to allow non-sorting of the result (GH39143)

  • Add keyword dropna to DataFrame.value_counts() to allow counting rows that include NA values (GH41325)

  • Series.replace() will now cast results to PeriodDtype where possible instead of object dtype (GH41526)

Notable bug fixes

These are bug fixes that might have notable behavior changes.

Categorical.unique now always maintains same dtype as original

Previously, when calling unique() with categorical data, unused categories in the new array would be removed, meaning that the dtype of the new array would be different than the original, if some categories are not present in the unique array (GH18291)

As an example of this, given:

In [11]: dtype = pd.CategoricalDtype(['bad', 'neutral', 'good'], ordered=True)

In [12]: cat = pd.Categorical(['good', 'good', 'bad', 'bad'], dtype=dtype)

In [13]: original = pd.Series(cat)

In [14]: unique = original.unique()

pandas < 1.3.0:

In [1]: unique
['good', 'bad']
Categories (2, object): ['bad' < 'good']
In [2]: original.dtype == unique.dtype
False

pandas >= 1.3.0

In [15]: unique
Out[15]: 
['good', 'bad']
Categories (2, object): ['bad' < 'good']

In [16]: original.dtype == unique.dtype
Out[16]: False

Preserve dtypes in combine_first()

combine_first() will now preserve dtypes (GH7509)

In [17]: df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=[0, 1, 2])

In [18]: df1
Out[18]: 
   A  B
0  1  1
1  2  2
2  3  3

In [19]: df2 = pd.DataFrame({"B": [4, 5, 6], "C": [1, 2, 3]}, index=[2, 3, 4])

In [20]: df2
Out[20]: 
   B  C
2  4  1
3  5  2
4  6  3

In [21]: combined = df1.combine_first(df2)

pandas 1.2.x

In [1]: combined.dtypes
Out[2]:
A    float64
B    float64
C    float64
dtype: object

pandas 1.3.0

In [22]: combined.dtypes
Out[22]: 
A    float64
B    float64
C    float64
dtype: object

Group by methods agg and transform no longer changes return dtype for callables

Previously the methods DataFrameGroupBy.aggregate(), SeriesGroupBy.aggregate(), DataFrameGroupBy.transform(), and SeriesGroupBy.transform() might cast the result dtype when the argument func is callable, possibly leading to undesirable results (GH21240). The cast would occur if the result is numeric and casting back to the input dtype does not change any values as measured by np.allclose. Now no such casting occurs.

In [23]: df = pd.DataFrame({'key': [1, 1], 'a': [True, False], 'b': [True, True]})

In [24]: df
Out[24]: 
   key      a     b
0    1   True  True
1    1  False  True

pandas 1.2.x

In [5]: df.groupby('key').agg(lambda x: x.sum())
Out[5]:
        a  b
key
1    True  2

pandas 1.3.0

In [25]: df.groupby('key').agg(lambda x: x.sum())
Out[25]: 
        a  b
key         
1    True  2

Try operating inplace when setting values with loc and iloc

When setting an entire column using loc or iloc, pandas will try to insert the values into the existing data rather than create an entirely new array.

In [26]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")

In [27]: values = df.values

In [28]: new = np.array([5, 6, 7], dtype="int64")

In [29]: df.loc[[0, 1, 2], "A"] = new

In both the new and old behavior, the data in values is overwritten, but in the old behavior the dtype of df["A"] changed to int64.

pandas 1.2.x

In [1]: df.dtypes
Out[1]:
A    int64
dtype: object
In [2]: np.shares_memory(df["A"].values, new)
Out[2]: False
In [3]: np.shares_memory(df["A"].values, values)
Out[3]: False

In pandas 1.3.0, df continues to share data with values

pandas 1.3.0

In [30]: df.dtypes
Out[30]: 
A    int64
dtype: object

In [31]: np.shares_memory(df["A"], new)
Out[31]: False

In [32]: np.shares_memory(df["A"], values)
Out[32]: False

Never Operate Inplace When Setting frame[keys] = values

When setting multiple columns using frame[keys] = values new arrays will replace pre-existing arrays for these keys, which will not be over-written (GH39510). As a result, the columns will retain the dtype(s) of values, never casting to the dtypes of the existing arrays.

In [33]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")

In [34]: df[["A"]] = 5

In the old behavior, 5 was cast to float64 and inserted into the existing array backing df:

pandas 1.2.x

In [1]: df.dtypes
Out[1]:
A    float64

In the new behavior, we get a new array, and retain an integer-dtyped 5:

pandas 1.3.0

In [35]: df.dtypes
Out[35]: 
A    float64
dtype: object

Consistent Casting With Setting Into Boolean Series

Setting non-boolean values into a Series with ``dtype=bool` consistently cast to dtype=object (GH38709)

In [36]: orig = pd.Series([True, False])

In [37]: ser = orig.copy()

In [38]: ser.iloc[1] = np.nan

In [39]: ser2 = orig.copy()

In [40]: ser2.iloc[1] = 2.0

pandas 1.2.x

In [1]: ser
Out [1]:
0    1.0
1    NaN
dtype: float64

In [2]:ser2
Out [2]:
0    True
1     2.0
dtype: object

pandas 1.3.0

In [41]: ser
Out[41]: 
0    1.0
1    NaN
dtype: float64

In [42]: ser2
Out[42]: 
0    True
1     2.0
dtype: object

GroupBy.rolling no longer returns grouped-by column in values

The group-by column will now be dropped from the result of a groupby.rolling operation (GH32262)

In [43]: df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})

In [44]: df
Out[44]: 
   A  B
0  1  0
1  1  1
2  2  2
3  3  3

Previous behavior:

In [1]: df.groupby("A").rolling(2).sum()
Out[1]:
       A    B
A
1 0  NaN  NaN
1    2.0  1.0
2 2  NaN  NaN
3 3  NaN  NaN

New behavior:

In [45]: df.groupby("A").rolling(2).sum()
Out[45]: 
       A    B
A            
1 0  NaN  NaN
  1  2.0  1.0
2 2  NaN  NaN
3 3  NaN  NaN

Removed artificial truncation in rolling variance and standard deviation

core.window.Rolling.std() and core.window.Rolling.var() will no longer artificially truncate results that are less than ~1e-8 and ~1e-15 respectively to zero (GH37051, GH40448, GH39872).

However, floating point artifacts may now exist in the results when rolling over larger values.

In [46]: s = pd.Series([7, 5, 5, 5])

In [47]: s.rolling(3).var()
Out[47]: 
0         NaN
1         NaN
2    1.333333
3    0.000000
dtype: float64

GroupBy.rolling with MultiIndex no longer drops levels in the result

core.window.rolling.RollingGroupby will no longer drop levels of a DataFrame with a MultiIndex in the result. This can lead to a perceived duplication of levels in the resulting MultiIndex, but this change restores the behavior that was present in version 1.1.3 (GH38787, GH38523).

In [48]: index = pd.MultiIndex.from_tuples([('idx1', 'idx2')], names=['label1', 'label2'])

In [49]: df = pd.DataFrame({'a': [1], 'b': [2]}, index=index)

In [50]: df
Out[50]: 
               a  b
label1 label2      
idx1   idx2    1  2

Previous behavior:

In [1]: df.groupby('label1').rolling(1).sum()
Out[1]:
          a    b
label1
idx1    1.0  2.0

New behavior:

In [51]: df.groupby('label1').rolling(1).sum()
Out[51]: 
          a    b
label1          
idx1    1.0  2.0

Increased minimum versions for dependencies

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package

Minimum Version

Required

Changed

numpy

1.17.3

X

X

pytz

2017.3

X

python-dateutil

2.7.3

X

bottleneck

1.2.1

numexpr

2.6.8

pytest (dev)

6.0

X

mypy (dev)

0.800

X

setuptools

38.6.0

X

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package

Minimum Version

Changed

beautifulsoup4

4.6.0

fastparquet

0.4.0

X

fsspec

0.7.4

gcsfs

0.6.0

lxml

4.3.0

matplotlib

2.2.3

numba

0.46.0

openpyxl

3.0.0

X

pyarrow

0.17.0

X

pymysql

0.8.1

X

pytables

3.5.1

s3fs

0.4.0

scipy

1.2.0

sqlalchemy

1.2.8

tabulate

0.8.7

X

xarray

0.12.0

xlrd

1.2.0

xlsxwriter

1.0.2

xlwt

1.3.0

pandas-gbq

0.12.0

See Dependencies and Optional dependencies for more.

Other API changes

  • Partially initialized CategoricalDtype (i.e. those with categories=None objects will no longer compare as equal to fully initialized dtype objects.

  • Accessing _constructor_expanddim on a DataFrame and _constructor_sliced on a Series now raise an AttributeError. Previously a NotImplementedError was raised (GH38782)

  • Added new engine and **engine_kwargs parameters to DataFrame.to_sql() to support other future “SQL engines”. Currently we still only use SQLAlchemy under the hood, but more engines are planned to be supported such as turbodbc (GH36893)

Build

  • Documentation in .pptx and .pdf formats are no longer included in wheels or source distributions. (GH30741)

Deprecations

  • Deprecated allowing scalars to be passed to the Categorical constructor (GH38433)

  • Deprecated allowing subclass-specific keyword arguments in the Index constructor, use the specific subclass directly instead (GH14093, GH21311, GH22315, GH26974)

  • Deprecated astype of datetimelike (timedelta64[ns], datetime64[ns], Datetime64TZDtype, PeriodDtype) to integer dtypes, use values.view(...) instead (GH38544)

  • Deprecated MultiIndex.is_lexsorted() and MultiIndex.lexsort_depth(), use MultiIndex.is_monotonic_increasing() instead (GH32259)

  • Deprecated keyword try_cast in Series.where(), Series.mask(), DataFrame.where(), DataFrame.mask(); cast results manually if desired (GH38836)

  • Deprecated comparison of Timestamp object with datetime.date objects. Instead of e.g. ts <= mydate use ts <= pd.Timestamp(mydate) or ts.date() <= mydate (GH36131)

  • Deprecated Rolling.win_type returning "freq" (GH38963)

  • Deprecated Rolling.is_datetimelike (GH38963)

  • Deprecated DataFrame indexer for Series.__setitem__() and DataFrame.__setitem__() (GH39004)

  • Deprecated core.window.ewm.ExponentialMovingWindow.vol() (GH39220)

  • Using .astype to convert between datetime64[ns] dtype and DatetimeTZDtype is deprecated and will raise in a future version, use obj.tz_localize or obj.dt.tz_localize instead (GH38622)

  • Deprecated casting datetime.date objects to datetime64 when used as fill_value in DataFrame.unstack(), DataFrame.shift(), Series.shift(), and DataFrame.reindex(), pass pd.Timestamp(dateobj) instead (GH39767)

  • Deprecated Styler.set_na_rep() and Styler.set_precision() in favour of Styler.format() with na_rep and precision as existing and new input arguments respectively (GH40134, GH40425)

  • Deprecated allowing partial failure in Series.transform() and DataFrame.transform() when func is list-like or dict-like and raises anything but TypeError; func raising anything but a TypeError will raise in a future version (GH40211)

  • Deprecated support for np.ma.mrecords.MaskedRecords in the DataFrame constructor, pass {name: data[name] for name in data.dtype.names} instead (GH40363)

  • Deprecated using merge() or join() on a different number of levels (GH34862)

  • Deprecated the use of **kwargs in ExcelWriter; use the keyword argument engine_kwargs instead (GH40430)

  • Deprecated the level keyword for DataFrame and Series aggregations; use groupby instead (GH39983)

  • The inplace parameter of Categorical.remove_categories(), Categorical.add_categories(), Categorical.reorder_categories(), Categorical.rename_categories(), Categorical.set_categories() is deprecated and will be removed in a future version (GH37643)

  • Deprecated merge() producing duplicated columns through the suffixes keyword and already existing columns (GH22818)

  • Deprecated setting Categorical._codes, create a new Categorical with the desired codes instead (GH40606)

  • Deprecated behavior of DatetimeIndex.union() with mixed timezones; in a future version both will be cast to UTC instead of object dtype (GH39328)

  • Deprecated using usecols with out of bounds indices for read_csv with engine="c" (GH25623)

  • Deprecated passing arguments as positional (except for "method") in DataFrame.interpolate() and Series.interpolate() (GH41485)

Performance improvements

  • Performance improvement in IntervalIndex.isin() (GH38353)

  • Performance improvement in Series.mean() for nullable data types (GH34814)

  • Performance improvement in Series.isin() for nullable data types (GH38340)

  • Performance improvement in DataFrame.fillna() with method="pad|backfill" for nullable floating and nullable integer dtypes (GH39953)

  • Performance improvement in DataFrame.corr() for method=kendall (GH28329)

  • Performance improvement in core.window.rolling.Rolling.corr() and core.window.rolling.Rolling.cov() (GH39388)

  • Performance improvement in core.window.rolling.RollingGroupby.corr(), core.window.expanding.ExpandingGroupby.corr(), core.window.expanding.ExpandingGroupby.corr() and core.window.expanding.ExpandingGroupby.cov() (GH39591)

  • Performance improvement in unique() for object data type (GH37615)

  • Performance improvement in pd.json_normalize() for basic cases (including separators) (GH40035 GH15621)

  • Performance improvement in core.window.rolling.ExpandingGroupby aggregation methods (GH39664)

  • Performance improvement in Styler where render times are more than 50% reduced (GH39972 GH39952)

  • Performance improvement in core.window.ewm.ExponentialMovingWindow.mean() with times (GH39784)

  • Performance improvement in GroupBy.apply() when requiring the python fallback implementation (GH40176)

  • Performance improvement in the conversion of pyarrow boolean array to a pandas nullable boolean array (GH41051)

  • Performance improvement for concatenation of data with type CategoricalDtype (GH40193)

  • Performance improvement in GroupBy.cummin() and GroupBy.cummax() with nullable data types (GH37493)

  • Performance improvement in Series.nunique() with nan values (GH40865)

  • Performance improvement in DataFrame.transpose(), Series.unstack() with DatetimeTZDtype (GH40149)

Bug fixes

Categorical

  • Bug in CategoricalIndex incorrectly failing to raise TypeError when scalar data is passed (GH38614)

  • Bug in CategoricalIndex.reindex failed when Index passed with elements all in category (GH28690)

  • Bug where constructing a Categorical from an object-dtype array of date objects did not round-trip correctly with astype (GH38552)

  • Bug in constructing a DataFrame from an ndarray and a CategoricalDtype (GH38857)

  • Bug in DataFrame.reindex() was throwing IndexError when new index contained duplicates and old index was CategoricalIndex (GH38906)

  • Bug in setting categorical values into an object-dtype column in a DataFrame (GH39136)

  • Bug in DataFrame.reindex() was raising IndexError when new index contained duplicates and old index was CategoricalIndex (GH38906)

Datetimelike

  • Bug in DataFrame and Series constructors sometimes dropping nanoseconds from Timestamp (resp. Timedelta) data, with dtype=datetime64[ns] (resp. timedelta64[ns]) (GH38032)

  • Bug in DataFrame.first() and Series.first() returning two months for offset one month when first day is last calendar day (GH29623)

  • Bug in constructing a DataFrame or Series with mismatched datetime64 data and timedelta64 dtype, or vice-versa, failing to raise TypeError (GH38575, GH38764, GH38792)

  • Bug in constructing a Series or DataFrame with a datetime object out of bounds for datetime64[ns] dtype or a timedelta object out of bounds for timedelta64[ns] dtype (GH38792, GH38965)

  • Bug in DatetimeIndex.intersection(), DatetimeIndex.symmetric_difference(), PeriodIndex.intersection(), PeriodIndex.symmetric_difference() always returning object-dtype when operating with CategoricalIndex (GH38741)

  • Bug in Series.where() incorrectly casting datetime64 values to int64 (GH37682)

  • Bug in Categorical incorrectly typecasting datetime object to Timestamp (GH38878)

  • Bug in comparisons between Timestamp object and datetime64 objects just outside the implementation bounds for nanosecond datetime64 (GH39221)

  • Bug in Timestamp.round(), Timestamp.floor(), Timestamp.ceil() for values near the implementation bounds of Timestamp (GH39244)

  • Bug in Timedelta.round(), Timedelta.floor(), Timedelta.ceil() for values near the implementation bounds of Timedelta (GH38964)

  • Bug in date_range() incorrectly creating DatetimeIndex containing NaT instead of raising OutOfBoundsDatetime in corner cases (GH24124)

  • Bug in infer_freq() incorrectly fails to infer ‘H’ frequency of DatetimeIndex if the latter has a timezone and crosses DST boundaries (GH39556)

Timedelta

  • Bug in constructing Timedelta from np.timedelta64 objects with non-nanosecond units that are out of bounds for timedelta64[ns] (GH38965)

  • Bug in constructing a TimedeltaIndex incorrectly accepting np.datetime64("NaT") objects (GH39462)

  • Bug in constructing Timedelta from input string with only symbols and no digits failed to raise an error (GH39710)

  • Bug in TimedeltaIndex and to_timedelta() failing to raise when passed non-nanosecond timedelta64 arrays that overflow when converting to timedelta64[ns] (GH40008)

Timezones

  • Bug in different tzinfo objects representing UTC not being treated as equivalent (GH39216)

  • Bug in dateutil.tz.gettz("UTC") not being recognized as equivalent to other UTC-representing tzinfos (GH39276)

Numeric

  • Bug in DataFrame.quantile(), DataFrame.sort_values() causing incorrect subsequent indexing behavior (GH38351)

  • Bug in DataFrame.sort_values() raising an IndexError for empty by (GH40258)

  • Bug in DataFrame.select_dtypes() with include=np.number now retains numeric ExtensionDtype columns (GH35340)

  • Bug in DataFrame.mode() and Series.mode() not keeping consistent integer Index for empty input (GH33321)

  • Bug in DataFrame.rank() with np.inf and mixture of np.nan and np.inf (GH32593)

  • Bug in DataFrame.rank() with axis=0 and columns holding incomparable types raising IndexError (GH38932)

  • Bug in rank method for Series, DataFrame, DataFrameGroupBy, and SeriesGroupBy treating the most negative int64 value as missing (GH32859)

  • Bug in select_dtypes() different behavior between Windows and Linux with include="int" (GH36569)

  • Bug in DataFrame.apply() and DataFrame.agg() when passed argument func="size" would operate on the entire DataFrame instead of rows or columns (GH39934)

  • Bug in DataFrame.transform() would raise SpecificationError when passed a dictionary and columns were missing; will now raise a KeyError instead (GH40004)

  • Bug in DataFrameGroupBy.rank() giving incorrect results with pct=True and equal values between consecutive groups (GH40518)

  • Bug in Series.count() would result in an int32 result on 32-bit platforms when argument level=None (GH40908)

  • Bug in Series and DataFrame reductions with methods any and all not returning boolean results for object data (GH12863, GH35450, GH27709)

  • Bug in Series.clip() would fail if series contains NA values and has nullable int or float as a data type (GH40851)

Conversion

  • Bug in Series.to_dict() with orient='records' now returns python native types (GH25969)

  • Bug in Series.view() and Index.view() when converting between datetime-like (datetime64[ns], datetime64[ns, tz], timedelta64, period) dtypes (GH39788)

  • Bug in creating a DataFrame from an empty np.recarray not retaining the original dtypes (GH40121)

  • Bug in DataFrame failing to raise TypeError when constructing from a frozenset (GH40163)

  • Bug in Index construction silently ignoring a passed dtype when the data cannot be cast to that dtype (GH21311)

  • Bug in StringArray.astype() falling back to numpy and raising when converting to dtype='categorical' (GH40450)

  • Bug in factorize() where, when given an array with a numeric numpy dtype lower than int64, uint64 and float64, the unique values did not keep their original dtype (GH41132)

  • Bug in DataFrame construction with a dictionary containing an arraylike with ExtensionDtype and copy=True failing to make a copy (GH38939)

  • Bug in qcut() raising error when taking Float64DType as input (GH40730)

Strings

  • Bug in the conversion from pyarrow.ChunkedArray to StringArray when the original had zero chunks (GH41040)

  • Bug in Series.replace() and DataFrame.replace() ignoring replacements with regex=True for StringDType data (GH41333, GH35977)

  • Bug in Series.str.extract() with StringArray returning object dtype for empty DataFrame (GH41441)

Interval

  • Bug in IntervalIndex.intersection() and IntervalIndex.symmetric_difference() always returning object-dtype when operating with CategoricalIndex (GH38653, GH38741)

  • Bug in IntervalIndex.intersection() returning duplicates when at least one of both Indexes has duplicates which are present in the other (GH38743)

  • IntervalIndex.union(), IntervalIndex.intersection(), IntervalIndex.difference(), and IntervalIndex.symmetric_difference() now cast to the appropriate dtype instead of raising TypeError when operating with another IntervalIndex with incompatible dtype (GH39267)

  • PeriodIndex.union(), PeriodIndex.intersection(), PeriodIndex.symmetric_difference(), PeriodIndex.difference() now cast to object dtype instead of raising IncompatibleFrequency when operating with another PeriodIndex with incompatible dtype (GH??)

Indexing

  • Bug in Index.union() dropping duplicate Index values when Index was not monotonic or sort was set to False (GH36289, GH31326, GH40862)

  • Bug in CategoricalIndex.get_indexer() failing to raise InvalidIndexError when non-unique (GH38372)

  • Bug in inserting many new columns into a DataFrame causing incorrect subsequent indexing behavior (GH38380)

  • Bug in DataFrame.__setitem__() raising ValueError when setting multiple values to duplicate columns (GH15695)

  • Bug in DataFrame.loc(), Series.loc(), DataFrame.__getitem__() and Series.__getitem__() returning incorrect elements for non-monotonic DatetimeIndex for string slices (GH33146)

  • Bug in DataFrame.reindex() and Series.reindex() with timezone aware indexes raising TypeError for method="ffill" and method="bfill" and specified tolerance (GH38566)

  • Bug in DataFrame.reindex() with datetime64[ns] or timedelta64[ns] incorrectly casting to integers when the fill_value requires casting to object dtype (GH39755)

  • Bug in DataFrame.__setitem__() raising ValueError with empty DataFrame and specified columns for string indexer and non empty DataFrame to set (GH38831)

  • Bug in DataFrame.loc.__setitem__() raising ValueError when expanding unique column for DataFrame with duplicate columns (GH38521)

  • Bug in DataFrame.iloc.__setitem__() and DataFrame.loc.__setitem__() with mixed dtypes when setting with a dictionary value (GH38335)

  • Bug in Series.loc.__setitem__() and DataFrame.loc.__setitem__() raising KeyError for boolean Iterator indexer (GH39614)

  • Bug in Series.iloc() and DataFrame.iloc() raising KeyError for Iterator indexer (GH39614)

  • Bug in DataFrame.__setitem__() not raising ValueError when right hand side is a DataFrame with wrong number of columns (GH38604)

  • Bug in Series.__setitem__() raising ValueError when setting a Series with a scalar indexer (GH38303)

  • Bug in DataFrame.loc() dropping levels of MultiIndex when DataFrame used as input has only one row (GH10521)

  • Bug in DataFrame.__getitem__() and Series.__getitem__() always raising KeyError when slicing with existing strings an Index with milliseconds (GH33589)

  • Bug in setting timedelta64 or datetime64 values into numeric Series failing to cast to object dtype (GH39086, issue:39619)

  • Bug in setting Interval values into a Series or DataFrame with mismatched IntervalDtype incorrectly casting the new values to the existing dtype (GH39120)

  • Bug in setting datetime64 values into a Series with integer-dtype incorrect casting the datetime64 values to integers (GH39266)

  • Bug in setting np.datetime64("NaT") into a Series with Datetime64TZDtype incorrectly treating the timezone-naive value as timezone-aware (GH39769)

  • Bug in Index.get_loc() not raising KeyError when method is specified for NaN value when NaN is not in Index (GH39382)

  • Bug in DatetimeIndex.insert() when inserting np.datetime64("NaT") into a timezone-aware index incorrectly treating the timezone-naive value as timezone-aware (GH39769)

  • Bug in incorrectly raising in Index.insert(), when setting a new column that cannot be held in the existing frame.columns, or in Series.reset_index() or DataFrame.reset_index() instead of casting to a compatible dtype (GH39068)

  • Bug in RangeIndex.append() where a single object of length 1 was concatenated incorrectly (GH39401)

  • Bug in RangeIndex.astype() where when converting to CategoricalIndex, the categories became a Int64Index instead of a RangeIndex (GH41263)

  • Bug in setting numpy.timedelta64 values into an object-dtype Series using a boolean indexer (GH39488)

  • Bug in setting numeric values into a into a boolean-dtypes Series using at or iat failing to cast to object-dtype (GH39582)

  • Bug in DataFrame.__setitem__() and DataFrame.iloc.__setitem__() raising ValueError when trying to index with a row-slice and setting a list as values (GH40440)

  • Bug in DataFrame.loc() not raising KeyError when key was not found in MultiIndex when levels contain more values than used (GH41170)

  • Bug in DataFrame.loc.__setitem__() when setting-with-expansion incorrectly raising when the index in the expanding axis contains duplicates (GH40096)

  • Bug in DataFrame.loc() incorrectly matching non-boolean index elements (GH20432)

  • Bug in Series.__delitem__() with ExtensionDtype incorrectly casting to ndarray (GH40386)

  • Bug in DataFrame.__setitem__() raising TypeError when using a str subclass as the column name with a DatetimeIndex (GH37366)

Missing

  • Bug in Grouper now correctly propagates dropna argument and DataFrameGroupBy.transform() now correctly handles missing values for dropna=True (GH35612)

  • Bug in isna(), and Series.isna(), Index.isna(), DataFrame.isna() (and the corresponding notna functions) not recognizing Decimal("NaN") objects (GH39409)

  • Bug in DataFrame.fillna() not accepting dictionary for downcast keyword (GH40809)

  • Bug in isna() not returning a copy of the mask for nullable types, causing any subsequent mask modification to change the original array (GH40935)

MultiIndex

  • Bug in DataFrame.drop() raising TypeError when MultiIndex is non-unique and level is not provided (GH36293)

  • Bug in MultiIndex.intersection() duplicating NaN in result (GH38623)

  • Bug in MultiIndex.equals() incorrectly returning True when MultiIndex containing NaN even when they are differently ordered (GH38439)

  • Bug in MultiIndex.intersection() always returning empty when intersecting with CategoricalIndex (GH38653)

  • Bug in MultiIndex.reindex() raising ValueError with empty MultiIndex and indexing only a specific level (GH41170)

I/O

  • Bug in Index.__repr__() when display.max_seq_items=1 (GH38415)

  • Bug in read_csv() not recognizing scientific notation if decimal is set for engine="python" (GH31920)

  • Bug in read_csv() interpreting NA value as comment, when NA does contain the comment string fixed for engine="python" (GH34002)

  • Bug in read_csv() raising IndexError with multiple header columns and index_col specified when file has no data rows (GH38292)

  • Bug in read_csv() not accepting usecols with different length than names for engine="python" (GH16469)

  • Bug in read_csv() returning object dtype when delimiter="," with usecols and parse_dates specified for engine="python" (GH35873)

  • Bug in read_csv() raising TypeError when names and parse_dates is specified for engine="c" (GH33699)

  • Bug in read_clipboard(), DataFrame.to_clipboard() not working in WSL (GH38527)

  • Allow custom error values for parse_dates argument of read_sql(), read_sql_query() and read_sql_table() (GH35185)

  • Bug in to_hdf() raising KeyError when trying to apply for subclasses of DataFrame or Series (GH33748)

  • Bug in put() raising a wrong TypeError when saving a DataFrame with non-string dtype (GH34274)

  • Bug in json_normalize() resulting in the first element of a generator object not being included in the returned DataFrame (GH35923)

  • Bug in read_csv() applying thousands separator to date columns when column should be parsed for dates and usecols is specified for engine="python" (GH39365)

  • Bug in read_excel() forward filling MultiIndex names with multiple header and index columns specified (GH34673)

  • read_excel() now respects set_option() (GH34252)

  • Bug in read_csv() not switching true_values and false_values for nullable boolean dtype (GH34655)

  • Bug in read_json() when orient="split" does not maintain numeric string index (GH28556)

  • read_sql() returned an empty generator if chunksize was no-zero and the query returned no results. Now returns a generator with a single empty dataframe (GH34411)

  • Bug in read_hdf() returning unexpected records when filtering on categorical string columns using where parameter (GH39189)

  • Bug in read_sas() raising ValueError when datetimes were null (GH39725)

  • Bug in read_excel() dropping empty values from single-column spreadsheets (GH39808)

  • Bug in read_excel() loading trailing empty rows/columns for some filetypes (GH41167)

  • Bug in read_excel() raising AttributeError with MultiIndex header followed by two empty rows and no index, and bug affecting read_excel(), read_csv(), read_table(), read_fwf(), and read_clipboard() where one blank row after a MultiIndex header with no index would be dropped (GH40442)

  • Bug in DataFrame.to_string() misplacing the truncation column when index=False (GH40907)

  • Bug in read_orc() always raising AttributeError (GH40918)

  • Bug in read_csv() and read_table() silently ignoring prefix if names and prefix are defined, now raising ValueError (GH39123)

  • Bug in read_csv() and read_excel() not respecting dtype for duplicated column name when mangle_dupe_cols is set to True (GH35211)

  • Bug in read_csv() and read_table() misinterpreting arguments when sys.setprofile had been previously called (GH41069)

  • Bug in the conversion from pyarrow to pandas (e.g. for reading Parquet) with nullable dtypes and a pyarrow array whose data buffer size is not a multiple of dtype size (GH40896)

Period

  • Comparisons of Period objects or Index, Series, or DataFrame with mismatched PeriodDtype now behave like other mismatched-type comparisons, returning False for equals, True for not-equal, and raising TypeError for inequality checks (GH39274)

Plotting

  • Bug in scatter_matrix() raising when 2d ax argument passed (GH16253)

  • Prevent warnings when matplotlib’s constrained_layout is enabled (GH25261)

  • Bug in DataFrame.plot() was showing the wrong colors in the legend if the function was called repeatedly and some calls used yerr while others didn’t (partial fix of GH39522)

  • Bug in DataFrame.plot() was showing the wrong colors in the legend if the function was called repeatedly and some calls used secondary_y and others use legend=False (GH40044)

  • Bug in DataFrame.plot.box() in box plot when dark_background theme was selected, caps or min/max markers for the plot was not visible (GH40769)

Groupby/resample/rolling

  • Bug in DataFrameGroupBy.agg() and SeriesGroupBy.agg() with PeriodDtype columns incorrectly casting results too aggressively (GH38254)

  • Bug in SeriesGroupBy.value_counts() where unobserved categories in a grouped categorical series were not tallied (GH38672)

  • Bug in SeriesGroupBy.value_counts() where error was raised on an empty series (GH39172)

  • Bug in GroupBy.indices() would contain non-existent indices when null values were present in the groupby keys (GH9304)

  • Fixed bug in DataFrameGroupBy.sum() and SeriesGroupBy.sum() causing loss of precision through using Kahan summation (GH38778)

  • Fixed bug in DataFrameGroupBy.cumsum(), SeriesGroupBy.cumsum(), DataFrameGroupBy.mean() and SeriesGroupBy.mean() causing loss of precision through using Kahan summation (GH38934)

  • Bug in Resampler.aggregate() and DataFrame.transform() raising TypeError instead of SpecificationError when missing keys had mixed dtypes (GH39025)

  • Bug in DataFrameGroupBy.idxmin() and DataFrameGroupBy.idxmax() with ExtensionDtype columns (GH38733)

  • Bug in Series.resample() would raise when the index was a PeriodIndex consisting of NaT (GH39227)

  • Bug in core.window.rolling.RollingGroupby.corr() and core.window.expanding.ExpandingGroupby.corr() where the groupby column would return 0 instead of np.nan when providing other that was longer than each group (GH39591)

  • Bug in core.window.expanding.ExpandingGroupby.corr() and core.window.expanding.ExpandingGroupby.cov() where 1 would be returned instead of np.nan when providing other that was longer than each group (GH39591)

  • Bug in GroupBy.mean(), GroupBy.median() and DataFrame.pivot_table() not propagating metadata (GH28283)

  • Bug in Series.rolling() and DataFrame.rolling() not calculating window bounds correctly when window is an offset and dates are in descending order (GH40002)

  • Bug in SeriesGroupBy and DataFrameGroupBy on an empty Series or DataFrame would lose index, columns, and/or data types when directly using the methods idxmax, idxmin, mad, min, max, sum, prod, and skew or using them through apply, aggregate, or resample (GH26411)

  • Bug in DataFrameGroupBy.apply() where a MultiIndex would be created instead of an Index if a :meth:`core.window.rolling.RollingGroupby object was created (GH39732)

  • Bug in DataFrameGroupBy.sample() where error was raised when weights was specified and the index was an Int64Index (GH39927)

  • Bug in DataFrameGroupBy.aggregate() and Resampler.aggregate() would sometimes raise SpecificationError when passed a dictionary and columns were missing; will now always raise a KeyError instead (GH40004)

  • Bug in DataFrameGroupBy.sample() where column selection was not applied to sample result (GH39928)

  • Bug in core.window.ewm.ExponentialMovingWindow when calling __getitem__ would incorrectly raise a ValueError when providing times (GH40164)

  • Bug in core.window.ewm.ExponentialMovingWindow when calling __getitem__ would not retain com, span, alpha or halflife attributes (GH40164)

  • core.window.ewm.ExponentialMovingWindow now raises a NotImplementedError when specifying times with adjust=False due to an incorrect calculation (GH40098)

  • Bug in core.window.ewm.ExponentialMovingWindowGroupby.mean() where the times argument was ignored when engine='numba' (GH40951)

  • Bug in core.window.ewm.ExponentialMovingWindowGroupby.mean() where the wrong times were used in case of multiple groups (GH40951)

  • Bug in core.window.ewm.ExponentialMovingWindowGroupby where the times vector and values became out of sync for non-trivial groups (GH40951)

  • Bug in Series.asfreq() and DataFrame.asfreq() dropping rows when the index is not sorted (GH39805)

  • Bug in aggregation functions for DataFrame not respecting numeric_only argument when level keyword was given (GH40660)

  • Bug in SeriesGroupBy.aggregate() where using a user-defined function to aggregate a Series with an object-typed Index causes an incorrect Index shape (issue:40014)

  • Bug in core.window.RollingGroupby where as_index=False argument in groupby was ignored (GH39433)

  • Bug in GroupBy.any() and GroupBy.all() raising ValueError when using with nullable type columns holding NA even with skipna=True (GH40585)

  • Bug in GroupBy.cummin() and GroupBy.cummax() incorrectly rounding integer values near the int64 implementations bounds (GH40767)

  • Bug in GroupBy.rank() with nullable dtypes incorrectly raising TypeError (GH41010)

  • Bug in GroupBy.cummin() and GroupBy.cummax() computing wrong result with nullable data types too large to roundtrip when casting to float (GH37493)

  • Bug in DataFrame.rolling() returning mean zero for all NaN window with min_periods=0 if calculation is not numerical stable (GH41053)

  • Bug in DataFrame.rolling() returning sum not zero for all NaN window with min_periods=0 if calculation is not numerical stable (GH41053)

  • Bug in SeriesGroupBy.agg() failing to retain ordered CategoricalDtype on order-preserving aggregations (GH41147)

  • Bug in DataFrameGroupBy.min() and DataFrameGroupBy.max() with multiple object-dtype columns and numeric_only=False incorrectly raising ValueError (:issue:41111`)

  • Bug in DataFrameGroupBy.rank() with the GroupBy object’s axis=0 and the rank method’s keyword axis=1 (GH41320)

  • Bug in DataFrameGroupBy.__getitem__() with non-unique columns incorrectly returning a malformed SeriesGroupBy instead of DataFrameGroupBy (GH41427)

  • Bug in DataFrameGroupBy.transform() with non-unique columns incorrectly raising AttributeError (GH41427)

  • Bug in Resampler.apply() with non-unique columns incorrectly dropping duplicated columns (GH41445)

Reshaping

  • Bug in merge() raising error when performing an inner join with partial index and right_index when no overlap between indices (GH33814)

  • Bug in DataFrame.unstack() with missing levels led to incorrect index names (GH37510)

  • Bug in merge_asof() propagating the right Index with left_index=True and right_on specification instead of left Index (GH33463)

  • Bug in join() over MultiIndex returned wrong result, when one of both indexes had only one level (GH36909)

  • merge_asof() raises ValueError instead of cryptic TypeError in case of non-numerical merge columns (GH29130)

  • Bug in DataFrame.join() not assigning values correctly when having MultiIndex where at least one dimension is from dtype Categorical with non-alphabetically sorted categories (GH38502)

  • Series.value_counts() and Series.mode() return consistent keys in original order (GH12679, GH11227 and GH39007)

  • Bug in DataFrame.stack() not handling NaN in MultiIndex columns correct (GH39481)

  • Bug in DataFrame.apply() would give incorrect results when used with a string argument and axis=1 when the axis argument was not supported and now raises a ValueError instead (GH39211)

  • Bug in DataFrame.sort_values() not reshaping index correctly after sorting on columns, when ignore_index=True (GH39464)

  • Bug in DataFrame.append() returning incorrect dtypes with combinations of ExtensionDtype dtypes (GH39454)

  • Bug in DataFrame.append() returning incorrect dtypes with combinations of datetime64 and timedelta64 dtypes (GH39574)

  • Bug in DataFrame.pivot_table() returning a MultiIndex for a single value when operating on and empty DataFrame (GH13483)

  • Allow Index to be passed to the numpy.all() function (GH40180)

  • Bug in DataFrame.stack() not preserving CategoricalDtype in a MultiIndex (GH36991)

  • Bug in to_datetime() raising error when input sequence contains unhashable items (GH39756)

  • Bug in Series.explode() preserving index when ignore_index was True and values were scalars (GH40487)

  • Bug in to_datetime() raising ValueError when Series contains None and NaT and has more than 50 elements (GH39882)

Sparse

  • Bug in DataFrame.sparse.to_coo() raising KeyError with columns that are a numeric Index without a 0 (GH18414)

  • Bug in SparseArray.astype() with copy=False producing incorrect results when going from integer dtype to floating dtype (GH34456)

  • Implemented SparseArray.max() and SparseArray.min() (GH40921)

ExtensionArray

  • Bug in DataFrame.where() when other is a Series with ExtensionArray dtype (GH38729)

  • Fixed bug where Series.idxmax(), Series.idxmin() and argmax/min fail when the underlying data is ExtensionArray (GH32749, GH33719, GH36566)

  • Fixed a bug where some properties of subclasses of PandasExtensionDtype where improperly cached (GH40329)

  • Bug in DataFrame.mask() where masking a Dataframe with an ExtensionArray dtype raises ValueError (GH40941)

Styler

  • Bug in Styler where subset arg in methods raised an error for some valid multiindex slices (GH33562)

  • Styler rendered HTML output minor alterations to support w3 good code standard (GH39626)

  • Bug in Styler where rendered HTML was missing a column class identifier for certain header cells (GH39716)

  • Bug in Styler.background_gradient() where text-color was not determined correctly (GH39888)

  • Bug in Styler where multiple elements in CSS-selectors were not correctly added to table_styles (GH39942)

  • Bug in Styler where copying from Jupyter dropped top left cell and misaligned headers (GH12147)

  • Bug in Styler.where where kwargs were not passed to the applicable callable (GH40845)

  • Bug in Styler which caused CSS to duplicate on multiple renders. (GH39395, GH40334)

Other

  • Bug in Index constructor sometimes silently ignoring a specified dtype (GH38879)

  • Bug in pandas.api.types.infer_dtype() not recognizing Series, Index or array with a period dtype (GH23553)

  • Bug in pandas.api.types.infer_dtype() raising an error for general ExtensionArray objects. It will now return "unknown-array" instead of raising (GH37367)

  • Bug in constructing a Series from a list and a PandasDtype (GH39357)

  • inspect.getmembers(Series) no longer raises an AbstractMethodError (GH38782)

  • Bug in Series.where() with numeric dtype and other = None not casting to nan (GH39761)

  • Index.where() behavior now mirrors Index.putmask() behavior, i.e. index.where(mask, other) matches index.putmask(~mask, other) (GH39412)

  • Bug in pandas.testing.assert_series_equal(), pandas.testing.assert_frame_equal(), pandas.testing.assert_index_equal() and pandas.testing.assert_extension_array_equal() incorrectly raising when an attribute has an unrecognized NA type (GH39461)

  • Bug in pandas.testing.assert_index_equal() with exact=True not raising when comparing CategoricalIndex instances with Int64Index and RangeIndex categories (GH41263)

  • Bug in DataFrame.equals(), Series.equals(), Index.equals() with object-dtype containing np.datetime64("NaT") or np.timedelta64("NaT") (GH39650)

  • Bug in pandas.util.show_versions() where console JSON output was not proper JSON (GH39701)

  • Bug in DataFrame.convert_dtypes() incorrectly raised ValueError when called on an empty DataFrame (GH40393)

  • Bug in DataFrame.clip() not interpreting missing values as no threshold (GH40420)

  • Bug in Series backed by DatetimeArray or TimedeltaArray sometimes failing to set the array’s freq to None (GH41425)

Contributors