What’s new in 1.3.0 (??)¶
These are the changes in pandas 1.3.0. See Release notes for a full changelog including other versions of pandas.
Warning
When reading new Excel 2007+ (.xlsx) files, the default argument
engine=None to read_excel() will now result in using the
openpyxl engine in all cases
when the option io.excel.xlsx.reader is set to "auto".
Previously, some cases would use the
xlrd engine instead. See
What’s new 1.2.0 for background on this change.
Enhancements¶
Custom HTTP(s) headers when reading csv or json files¶
When reading from a remote URL that is not handled by fsspec (ie. HTTP and
HTTPS) the dictionary passed to storage_options will be used to create the
headers included in the request. This can be used to control the User-Agent
header or send other custom headers (GH36688).
For example:
In [1]: headers = {"User-Agent": "pandas"}
In [2]: df = pd.read_csv(
...: "https://download.bls.gov/pub/time.series/cu/cu.item",
...: sep="\t",
...: storage_options=headers
...: )
...:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-2dee62044b9a> in <module>
2 "https://download.bls.gov/pub/time.series/cu/cu.item",
3 sep="\t",
----> 4 storage_options=headers
5 )
~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
608 kwds.update(kwds_defaults)
609
--> 610 return _read(filepath_or_buffer, kwds)
611
612
~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
460
461 # Create the parser.
--> 462 parser = TextFileReader(filepath_or_buffer, **kwds)
463
464 if chunksize or iterator:
~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
817 self.options["has_index_names"] = kwds["has_index_names"]
818
--> 819 self._engine = self._make_engine(self.engine)
820
821 def close(self):
~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
1048 )
1049 # error: Too many arguments for "ParserBase"
-> 1050 return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
1051
1052 def _failover_to_python(self):
~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
1865
1866 # open handles
-> 1867 self._open_handles(src, kwds)
1868 assert self.handles is not None
1869 for key in ("storage_options", "encoding", "memory_map", "compression"):
~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/io/parsers.py in _open_handles(self, src, kwds)
1366 compression=kwds.get("compression", None),
1367 memory_map=kwds.get("memory_map", False),
-> 1368 storage_options=kwds.get("storage_options", None),
1369 )
1370
~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/io/common.py in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
561 compression=compression,
562 mode=mode,
--> 563 storage_options=storage_options,
564 )
565
~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/io/common.py in _get_filepath_or_buffer(filepath_or_buffer, encoding, compression, mode, storage_options)
285 if storage_options:
286 raise ValueError(
--> 287 "storage_options passed with file object or non-fsspec file path"
288 )
289 req = urlopen(filepath_or_buffer)
ValueError: storage_options passed with file object or non-fsspec file path
Read and write XML documents¶
We added I/O support to read and render shallow versions of XML documents with
pandas.read_xml() and DataFrame.to_xml(). Using lxml as parser,
both XPath 1.0 and XSLT 1.0 is available. (GH27554)
In [1]: xml = """<?xml version='1.0' encoding='utf-8'?>
...: <data>
...: <row>
...: <shape>square</shape>
...: <degrees>360</degrees>
...: <sides>4.0</sides>
...: </row>
...: <row>
...: <shape>circle</shape>
...: <degrees>360</degrees>
...: <sides/>
...: </row>
...: <row>
...: <shape>triangle</shape>
...: <degrees>180</degrees>
...: <sides>3.0</sides>
...: </row>
...: </data>"""
In [2]: df = pd.read_xml(xml)
In [3]: df
Out[3]:
shape degrees sides
0 square 360 4.0
1 circle 360 NaN
2 triangle 180 3.0
In [4]: df.to_xml()
Out[4]:
<?xml version='1.0' encoding='utf-8'?>
<data>
<row>
<index>0</index>
<shape>square</shape>
<degrees>360</degrees>
<sides>4.0</sides>
</row>
<row>
<index>1</index>
<shape>circle</shape>
<degrees>360</degrees>
<sides/>
</row>
<row>
<index>2</index>
<shape>triangle</shape>
<degrees>180</degrees>
<sides>3.0</sides>
</row>
</data>
For more, see io.xml in the user guide on IO tools.
Styler Upgrades¶
We provided some focused development on Styler, including altering methods
to accept more universal CSS language for arguments, such as 'color:red;' instead of
[('color', 'red')] (GH39564). This is also added to the built-in methods
to allow custom CSS highlighting instead of default background coloring (GH40242).
Enhancements to other built-in methods include extending the Styler.background_gradient()
method to shade elements based on a given gradient map and not be restricted only to
values in the DataFrame (GH39930 GH22727 GH28901). Additional
built-in methods such as Styler.highlight_between() and Styler.highlight_quantile()
have been added (GH39821 and GH40926).
The Styler.apply() now consistently allows functions with ndarray output to
allow more flexible development of UDFs when axis is None 0 or 1 (GH39393).
Styler.set_tooltips() is a new method that allows adding on hover tooltips to
enhance interactive displays (GH35643). Styler.set_td_classes(), which was recently
introduced in v1.2.0 (GH36159) to allow adding specific CSS classes to data cells, has
been made as performant as Styler.apply() and Styler.applymap() (GH40453),
if not more performant in some cases. The overall performance of HTML
render times has been considerably improved to
match DataFrame.to_html() (GH39952 GH37792 GH40425).
The Styler.format() has had upgrades to easily format missing data,
precision, and perform HTML escaping (GH40437 GH40134). There have been numerous other bug fixes to
properly format HTML and eliminate some inconsistencies (GH39942 GH40356 GH39807 GH39889 GH39627)
Styler has also been compatible with non-unique index or columns, at least for as many features as are fully compatible, others made only partially compatible (GH41269).
Documentation has also seen major revisions in light of new features (GH39720 GH39317 GH40493)
DataFrame constructor honors copy=False with dict¶
When passing a dictionary to DataFrame with copy=False,
a copy will no longer be made (GH32960)
In [3]: arr = np.array([1, 2, 3])
In [4]: df = pd.DataFrame({"A": arr, "B": arr.copy()}, copy=False)
In [5]: df
Out[5]:
A B
0 1 1
1 2 2
2 3 3
df["A"] remains a view on arr:
In [6]: arr[0] = 0
In [7]: assert df.iloc[0, 0] == 0
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-7-85b1f54195bb> in <module>
----> 1 assert df.iloc[0, 0] == 0
AssertionError:
The default behavior when not passing copy will remain unchanged, i.e.
a copy will be made.
Centered Datetime-Like Rolling Windows¶
When performing rolling calculations on DataFrame and Series
objects with a datetime-like index, a centered datetime-like window can now be
used (GH38780).
For example:
In [8]: df = pd.DataFrame(
...: {"A": [0, 1, 2, 3, 4]}, index=pd.date_range("2020", periods=5, freq="1D")
...: )
...:
In [9]: df
Out[9]:
A
2020-01-01 0
2020-01-02 1
2020-01-03 2
2020-01-04 3
2020-01-05 4
In [10]: df.rolling("2D", center=True).mean()
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-10-1c8c1fd36122> in <module>
----> 1 df.rolling("2D", center=True).mean()
~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/core/generic.py in rolling(self, window, min_periods, center, win_type, on, axis, closed)
11251 on=on,
11252 axis=axis,
> 11253 closed=closed,
11254 )
11255
~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/core/window/rolling.py in __init__(self, obj, window, min_periods, center, win_type, axis, on, closed, **kwargs)
111 self.win_freq = None
112 self.axis = obj._get_axis_number(axis) if axis is not None else None
--> 113 self.validate()
114
115 @property
~/checkouts/readthedocs.org/user_builds/quantopy/envs/latest/lib/python3.7/site-packages/pandas/core/window/rolling.py in validate(self)
1902 if self.center:
1903 raise NotImplementedError(
-> 1904 "center is not implemented for "
1905 "datetimelike and offset based windows"
1906 )
NotImplementedError: center is not implemented for datetimelike and offset based windows
Other enhancements¶
RollingandExpandingnow support amethodargument with a'table'option that performs the windowing operation over an entireDataFrame. See ref:window.overview for performance and functional benefits (GH15095, GH38995)Added
MultiIndex.dtypes()(GH37062)Added
endandend_dayoptions fororigininDataFrame.resample()(GH37804)Improve error message when
usecolsandnamesdo not match forread_csv()andengine="c"(GH29042)Improved consistency of error message when passing an invalid
win_typeargument inWindow(GH15969)pandas.read_sql_query()now accepts adtypeargument to cast the columnar data from the SQL database based on user input (GH10285)Improved integer type mapping from pandas to SQLAlchemy when using
DataFrame.to_sql()(GH35076)to_numeric()now supports downcasting of nullableExtensionDtypeobjects (GH33013)Add support for dict-like names in
MultiIndex.set_namesandMultiIndex.rename(GH20421)pandas.read_excel()can now auto detect .xlsb files (GH35416)pandas.ExcelWriternow accepts anif_sheet_existsparameter to control the behaviour of append mode when writing to existing sheets (GH40230)Rolling.sum(),Expanding.sum(),Rolling.mean(),Expanding.mean(),ExponentialMovingWindow.mean(),Rolling.median(),Expanding.median(),Rolling.max(),Expanding.max(),Rolling.min(), andExpanding.min()now supportNumbaexecution with theenginekeyword (GH38895, GH41267)DataFrame.apply()can now accept NumPy unary operators as strings, e.g.df.apply("sqrt"), which was already the case forSeries.apply()(GH39116)DataFrame.apply()can now accept non-callable DataFrame properties as strings, e.g.df.apply("size"), which was already the case forSeries.apply()(GH39116)DataFrame.applymap()can now accept kwargs to pass on to func (GH39987)Disallow
DataFrameindexer forilocforSeries.__getitem__()andDataFrame.__getitem__(), (GH39004)Series.apply()can now accept list-like or dictionary-like arguments that aren’t lists or dictionaries, e.g.ser.apply(np.array(["sum", "mean"])), which was already the case forDataFrame.apply()(GH39140)DataFrame.plot.scatter()can now accept a categorical column as the argument toc(GH12380, GH31357)Styler.set_tooltips()allows on hover tooltips to be added to styled HTML dataframes (GH35643, GH21266, GH39317, GH39708, GH40284)Styler.set_table_styles()amended to optionally allow certain css-string input arguments (GH39564)Styler.apply()now more consistently accepts ndarray function returns, i.e. in all cases foraxisis0, 1 or None(GH39359)Styler.apply()andStyler.applymap()now raise errors if wrong format CSS is passed on render (GH39660)Styler.format()adds keyword argumentescapefor optional HTML escaping (GH40437)Styler.background_gradient()now allows the ability to supply a specific gradient map (GH22727)Styler.clear()now clearsStyler.hidden_indexandStyler.hidden_columnsas well (GH40484)Builtin highlighting methods in
Stylerhave a more consistent signature and css customisability (GH40242)Styler.highlight_between()added to list of builtin styling methods (GH39821)Series.loc.__getitem__()andSeries.loc.__setitem__()withMultiIndexnow raising helpful error message when indexer has too many dimensions (GH35349)pandas.read_stata()andStataReadersupport reading data from compressed files.Add support for parsing
ISO 8601-like timestamps with negative signs topandas.Timedelta()(GH37172)Add support for unary operators in
FloatingArray(GH38749)RangeIndexcan now be constructed by passing arangeobject directly e.g.pd.RangeIndex(range(3))(GH12067)round()being enabled for the nullable integer and floating dtypes (GH38844)pandas.read_csv()andpandas.read_json()expose the argumentencoding_errorsto control how encoding errors are handled (GH39450)GroupBy.any()andGroupBy.all()use Kleene logic with nullable data types (GH37506)GroupBy.any()andGroupBy.all()return aBooleanDtypefor columns with nullable data types (GH33449)GroupBy.rank()now supports object-dtype data (GH38278)Constructing a
DataFrameorSerieswith thedataargument being a Python iterable that is not a NumPyndarrayconsisting of NumPy scalars will now result in a dtype with a precision the maximum of the NumPy scalars; this was already the case whendatais a NumPyndarray(GH40908)Add keyword
sorttopivot_table()to allow non-sorting of the result (GH39143)Add keyword
dropnatoDataFrame.value_counts()to allow counting rows that includeNAvalues (GH41325)Series.replace()will now cast results toPeriodDtypewhere possible instead ofobjectdtype (GH41526)
Notable bug fixes¶
These are bug fixes that might have notable behavior changes.
Categorical.unique now always maintains same dtype as original¶
Previously, when calling unique() with categorical data, unused categories in the new array
would be removed, meaning that the dtype of the new array would be different than the
original, if some categories are not present in the unique array (GH18291)
As an example of this, given:
In [11]: dtype = pd.CategoricalDtype(['bad', 'neutral', 'good'], ordered=True)
In [12]: cat = pd.Categorical(['good', 'good', 'bad', 'bad'], dtype=dtype)
In [13]: original = pd.Series(cat)
In [14]: unique = original.unique()
pandas < 1.3.0:
In [1]: unique
['good', 'bad']
Categories (2, object): ['bad' < 'good']
In [2]: original.dtype == unique.dtype
False
pandas >= 1.3.0
In [15]: unique
Out[15]:
['good', 'bad']
Categories (2, object): ['bad' < 'good']
In [16]: original.dtype == unique.dtype
Out[16]: False
Preserve dtypes in combine_first()¶
combine_first() will now preserve dtypes (GH7509)
In [17]: df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=[0, 1, 2])
In [18]: df1
Out[18]:
A B
0 1 1
1 2 2
2 3 3
In [19]: df2 = pd.DataFrame({"B": [4, 5, 6], "C": [1, 2, 3]}, index=[2, 3, 4])
In [20]: df2
Out[20]:
B C
2 4 1
3 5 2
4 6 3
In [21]: combined = df1.combine_first(df2)
pandas 1.2.x
In [1]: combined.dtypes
Out[2]:
A float64
B float64
C float64
dtype: object
pandas 1.3.0
In [22]: combined.dtypes
Out[22]:
A float64
B float64
C float64
dtype: object
Group by methods agg and transform no longer changes return dtype for callables¶
Previously the methods DataFrameGroupBy.aggregate(),
SeriesGroupBy.aggregate(), DataFrameGroupBy.transform(), and
SeriesGroupBy.transform() might cast the result dtype when the argument func
is callable, possibly leading to undesirable results (GH21240). The cast would
occur if the result is numeric and casting back to the input dtype does not change any
values as measured by np.allclose. Now no such casting occurs.
In [23]: df = pd.DataFrame({'key': [1, 1], 'a': [True, False], 'b': [True, True]})
In [24]: df
Out[24]:
key a b
0 1 True True
1 1 False True
pandas 1.2.x
In [5]: df.groupby('key').agg(lambda x: x.sum())
Out[5]:
a b
key
1 True 2
pandas 1.3.0
In [25]: df.groupby('key').agg(lambda x: x.sum())
Out[25]:
a b
key
1 True 2
Try operating inplace when setting values with loc and iloc¶
When setting an entire column using loc or iloc, pandas will try to
insert the values into the existing data rather than create an entirely new array.
In [26]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")
In [27]: values = df.values
In [28]: new = np.array([5, 6, 7], dtype="int64")
In [29]: df.loc[[0, 1, 2], "A"] = new
In both the new and old behavior, the data in values is overwritten, but in
the old behavior the dtype of df["A"] changed to int64.
pandas 1.2.x
In [1]: df.dtypes
Out[1]:
A int64
dtype: object
In [2]: np.shares_memory(df["A"].values, new)
Out[2]: False
In [3]: np.shares_memory(df["A"].values, values)
Out[3]: False
In pandas 1.3.0, df continues to share data with values
pandas 1.3.0
In [30]: df.dtypes
Out[30]:
A int64
dtype: object
In [31]: np.shares_memory(df["A"], new)
Out[31]: False
In [32]: np.shares_memory(df["A"], values)
Out[32]: False
Never Operate Inplace When Setting frame[keys] = values¶
When setting multiple columns using frame[keys] = values new arrays will
replace pre-existing arrays for these keys, which will not be over-written
(GH39510). As a result, the columns will retain the dtype(s) of values,
never casting to the dtypes of the existing arrays.
In [33]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")
In [34]: df[["A"]] = 5
In the old behavior, 5 was cast to float64 and inserted into the existing
array backing df:
pandas 1.2.x
In [1]: df.dtypes
Out[1]:
A float64
In the new behavior, we get a new array, and retain an integer-dtyped 5:
pandas 1.3.0
In [35]: df.dtypes
Out[35]:
A float64
dtype: object
Consistent Casting With Setting Into Boolean Series¶
Setting non-boolean values into a Series with ``dtype=bool` consistently
cast to dtype=object (GH38709)
In [36]: orig = pd.Series([True, False])
In [37]: ser = orig.copy()
In [38]: ser.iloc[1] = np.nan
In [39]: ser2 = orig.copy()
In [40]: ser2.iloc[1] = 2.0
pandas 1.2.x
In [1]: ser
Out [1]:
0 1.0
1 NaN
dtype: float64
In [2]:ser2
Out [2]:
0 True
1 2.0
dtype: object
pandas 1.3.0
In [41]: ser
Out[41]:
0 1.0
1 NaN
dtype: float64
In [42]: ser2
Out[42]:
0 True
1 2.0
dtype: object
GroupBy.rolling no longer returns grouped-by column in values¶
The group-by column will now be dropped from the result of a
groupby.rolling operation (GH32262)
In [43]: df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})
In [44]: df
Out[44]:
A B
0 1 0
1 1 1
2 2 2
3 3 3
Previous behavior:
In [1]: df.groupby("A").rolling(2).sum()
Out[1]:
A B
A
1 0 NaN NaN
1 2.0 1.0
2 2 NaN NaN
3 3 NaN NaN
New behavior:
In [45]: df.groupby("A").rolling(2).sum()
Out[45]:
A B
A
1 0 NaN NaN
1 2.0 1.0
2 2 NaN NaN
3 3 NaN NaN
Removed artificial truncation in rolling variance and standard deviation¶
core.window.Rolling.std() and core.window.Rolling.var() will no longer
artificially truncate results that are less than ~1e-8 and ~1e-15 respectively to
zero (GH37051, GH40448, GH39872).
However, floating point artifacts may now exist in the results when rolling over larger values.
In [46]: s = pd.Series([7, 5, 5, 5])
In [47]: s.rolling(3).var()
Out[47]:
0 NaN
1 NaN
2 1.333333
3 0.000000
dtype: float64
GroupBy.rolling with MultiIndex no longer drops levels in the result¶
core.window.rolling.RollingGroupby will no longer drop levels of a DataFrame
with a MultiIndex in the result. This can lead to a perceived duplication of levels in the resulting
MultiIndex, but this change restores the behavior that was present in version 1.1.3 (GH38787, GH38523).
In [48]: index = pd.MultiIndex.from_tuples([('idx1', 'idx2')], names=['label1', 'label2'])
In [49]: df = pd.DataFrame({'a': [1], 'b': [2]}, index=index)
In [50]: df
Out[50]:
a b
label1 label2
idx1 idx2 1 2
Previous behavior:
In [1]: df.groupby('label1').rolling(1).sum()
Out[1]:
a b
label1
idx1 1.0 2.0
New behavior:
In [51]: df.groupby('label1').rolling(1).sum()
Out[51]:
a b
label1
idx1 1.0 2.0
Increased minimum versions for dependencies¶
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package |
Minimum Version |
Required |
Changed |
|---|---|---|---|
numpy |
1.17.3 |
X |
X |
pytz |
2017.3 |
X |
|
python-dateutil |
2.7.3 |
X |
|
bottleneck |
1.2.1 |
||
numexpr |
2.6.8 |
||
pytest (dev) |
6.0 |
X |
|
mypy (dev) |
0.800 |
X |
|
setuptools |
38.6.0 |
X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package |
Minimum Version |
Changed |
|---|---|---|
beautifulsoup4 |
4.6.0 |
|
fastparquet |
0.4.0 |
X |
fsspec |
0.7.4 |
|
gcsfs |
0.6.0 |
|
lxml |
4.3.0 |
|
matplotlib |
2.2.3 |
|
numba |
0.46.0 |
|
openpyxl |
3.0.0 |
X |
pyarrow |
0.17.0 |
X |
pymysql |
0.8.1 |
X |
pytables |
3.5.1 |
|
s3fs |
0.4.0 |
|
scipy |
1.2.0 |
|
sqlalchemy |
1.2.8 |
|
tabulate |
0.8.7 |
X |
xarray |
0.12.0 |
|
xlrd |
1.2.0 |
|
xlsxwriter |
1.0.2 |
|
xlwt |
1.3.0 |
|
pandas-gbq |
0.12.0 |
See Dependencies and Optional dependencies for more.
Other API changes¶
Partially initialized
CategoricalDtype(i.e. those withcategories=Noneobjects will no longer compare as equal to fully initialized dtype objects.Accessing
_constructor_expanddimon aDataFrameand_constructor_slicedon aSeriesnow raise anAttributeError. Previously aNotImplementedErrorwas raised (GH38782)Added new
engineand**engine_kwargsparameters toDataFrame.to_sql()to support other future “SQL engines”. Currently we still only useSQLAlchemyunder the hood, but more engines are planned to be supported such asturbodbc(GH36893)
Deprecations¶
Deprecated allowing scalars to be passed to the
Categoricalconstructor (GH38433)Deprecated allowing subclass-specific keyword arguments in the
Indexconstructor, use the specific subclass directly instead (GH14093, GH21311, GH22315, GH26974)Deprecated
astypeof datetimelike (timedelta64[ns],datetime64[ns],Datetime64TZDtype,PeriodDtype) to integer dtypes, usevalues.view(...)instead (GH38544)Deprecated
MultiIndex.is_lexsorted()andMultiIndex.lexsort_depth(), useMultiIndex.is_monotonic_increasing()instead (GH32259)Deprecated keyword
try_castinSeries.where(),Series.mask(),DataFrame.where(),DataFrame.mask(); cast results manually if desired (GH38836)Deprecated comparison of
Timestampobject withdatetime.dateobjects. Instead of e.g.ts <= mydateusets <= pd.Timestamp(mydate)orts.date() <= mydate(GH36131)Deprecated
Rolling.win_typereturning"freq"(GH38963)Deprecated
Rolling.is_datetimelike(GH38963)Deprecated
DataFrameindexer forSeries.__setitem__()andDataFrame.__setitem__()(GH39004)Deprecated
core.window.ewm.ExponentialMovingWindow.vol()(GH39220)Using
.astypeto convert betweendatetime64[ns]dtype andDatetimeTZDtypeis deprecated and will raise in a future version, useobj.tz_localizeorobj.dt.tz_localizeinstead (GH38622)Deprecated casting
datetime.dateobjects todatetime64when used asfill_valueinDataFrame.unstack(),DataFrame.shift(),Series.shift(), andDataFrame.reindex(), passpd.Timestamp(dateobj)instead (GH39767)Deprecated
Styler.set_na_rep()andStyler.set_precision()in favour ofStyler.format()withna_repandprecisionas existing and new input arguments respectively (GH40134, GH40425)Deprecated allowing partial failure in
Series.transform()andDataFrame.transform()whenfuncis list-like or dict-like and raises anything butTypeError;funcraising anything but aTypeErrorwill raise in a future version (GH40211)Deprecated support for
np.ma.mrecords.MaskedRecordsin theDataFrameconstructor, pass{name: data[name] for name in data.dtype.names}instead (GH40363)Deprecated using
merge()orjoin()on a different number of levels (GH34862)Deprecated the use of
**kwargsinExcelWriter; use the keyword argumentengine_kwargsinstead (GH40430)Deprecated the
levelkeyword forDataFrameandSeriesaggregations; use groupby instead (GH39983)The
inplaceparameter ofCategorical.remove_categories(),Categorical.add_categories(),Categorical.reorder_categories(),Categorical.rename_categories(),Categorical.set_categories()is deprecated and will be removed in a future version (GH37643)Deprecated
merge()producing duplicated columns through thesuffixeskeyword and already existing columns (GH22818)Deprecated setting
Categorical._codes, create a newCategoricalwith the desired codes instead (GH40606)Deprecated behavior of
DatetimeIndex.union()with mixed timezones; in a future version both will be cast to UTC instead of object dtype (GH39328)Deprecated using
usecolswith out of bounds indices forread_csvwithengine="c"(GH25623)Deprecated passing arguments as positional (except for
"method") inDataFrame.interpolate()andSeries.interpolate()(GH41485)
Performance improvements¶
Performance improvement in
IntervalIndex.isin()(GH38353)Performance improvement in
Series.mean()for nullable data types (GH34814)Performance improvement in
Series.isin()for nullable data types (GH38340)Performance improvement in
DataFrame.fillna()withmethod="pad|backfill"for nullable floating and nullable integer dtypes (GH39953)Performance improvement in
DataFrame.corr()for method=kendall (GH28329)Performance improvement in
core.window.rolling.Rolling.corr()andcore.window.rolling.Rolling.cov()(GH39388)Performance improvement in
core.window.rolling.RollingGroupby.corr(),core.window.expanding.ExpandingGroupby.corr(),core.window.expanding.ExpandingGroupby.corr()andcore.window.expanding.ExpandingGroupby.cov()(GH39591)Performance improvement in
unique()for object data type (GH37615)Performance improvement in
pd.json_normalize()for basic cases (including separators) (GH40035 GH15621)Performance improvement in
core.window.rolling.ExpandingGroupbyaggregation methods (GH39664)Performance improvement in
Stylerwhere render times are more than 50% reduced (GH39972 GH39952)Performance improvement in
core.window.ewm.ExponentialMovingWindow.mean()withtimes(GH39784)Performance improvement in
GroupBy.apply()when requiring the python fallback implementation (GH40176)Performance improvement in the conversion of pyarrow boolean array to a pandas nullable boolean array (GH41051)
Performance improvement for concatenation of data with type
CategoricalDtype(GH40193)Performance improvement in
GroupBy.cummin()andGroupBy.cummax()with nullable data types (GH37493)Performance improvement in
Series.nunique()with nan values (GH40865)Performance improvement in
DataFrame.transpose(),Series.unstack()withDatetimeTZDtype(GH40149)
Bug fixes¶
Categorical¶
Bug in
CategoricalIndexincorrectly failing to raiseTypeErrorwhen scalar data is passed (GH38614)Bug in
CategoricalIndex.reindexfailed whenIndexpassed with elements all in category (GH28690)Bug where constructing a
Categoricalfrom an object-dtype array ofdateobjects did not round-trip correctly withastype(GH38552)Bug in constructing a
DataFramefrom anndarrayand aCategoricalDtype(GH38857)Bug in
DataFrame.reindex()was throwingIndexErrorwhen new index contained duplicates and old index wasCategoricalIndex(GH38906)Bug in setting categorical values into an object-dtype column in a
DataFrame(GH39136)Bug in
DataFrame.reindex()was raisingIndexErrorwhen new index contained duplicates and old index wasCategoricalIndex(GH38906)
Datetimelike¶
Bug in
DataFrameandSeriesconstructors sometimes dropping nanoseconds fromTimestamp(resp.Timedelta)data, withdtype=datetime64[ns](resp.timedelta64[ns]) (GH38032)Bug in
DataFrame.first()andSeries.first()returning two months for offset one month when first day is last calendar day (GH29623)Bug in constructing a
DataFrameorSerieswith mismatcheddatetime64data andtimedelta64dtype, or vice-versa, failing to raiseTypeError(GH38575, GH38764, GH38792)Bug in constructing a
SeriesorDataFramewith adatetimeobject out of bounds fordatetime64[ns]dtype or atimedeltaobject out of bounds fortimedelta64[ns]dtype (GH38792, GH38965)Bug in
DatetimeIndex.intersection(),DatetimeIndex.symmetric_difference(),PeriodIndex.intersection(),PeriodIndex.symmetric_difference()always returning object-dtype when operating withCategoricalIndex(GH38741)Bug in
Series.where()incorrectly castingdatetime64values toint64(GH37682)Bug in
Categoricalincorrectly typecastingdatetimeobject toTimestamp(GH38878)Bug in comparisons between
Timestampobject anddatetime64objects just outside the implementation bounds for nanoseconddatetime64(GH39221)Bug in
Timestamp.round(),Timestamp.floor(),Timestamp.ceil()for values near the implementation bounds ofTimestamp(GH39244)Bug in
Timedelta.round(),Timedelta.floor(),Timedelta.ceil()for values near the implementation bounds ofTimedelta(GH38964)Bug in
date_range()incorrectly creatingDatetimeIndexcontainingNaTinstead of raisingOutOfBoundsDatetimein corner cases (GH24124)Bug in
infer_freq()incorrectly fails to infer ‘H’ frequency ofDatetimeIndexif the latter has a timezone and crosses DST boundaries (GH39556)
Timedelta¶
Bug in constructing
Timedeltafromnp.timedelta64objects with non-nanosecond units that are out of bounds fortimedelta64[ns](GH38965)Bug in constructing a
TimedeltaIndexincorrectly acceptingnp.datetime64("NaT")objects (GH39462)Bug in constructing
Timedeltafrom input string with only symbols and no digits failed to raise an error (GH39710)Bug in
TimedeltaIndexandto_timedelta()failing to raise when passed non-nanosecondtimedelta64arrays that overflow when converting totimedelta64[ns](GH40008)
Timezones¶
Numeric¶
Bug in
DataFrame.quantile(),DataFrame.sort_values()causing incorrect subsequent indexing behavior (GH38351)Bug in
DataFrame.sort_values()raising anIndexErrorfor emptyby(GH40258)Bug in
DataFrame.select_dtypes()withinclude=np.numbernow retains numericExtensionDtypecolumns (GH35340)Bug in
DataFrame.mode()andSeries.mode()not keeping consistent integerIndexfor empty input (GH33321)Bug in
DataFrame.rank()withnp.infand mixture ofnp.nanandnp.inf(GH32593)Bug in
DataFrame.rank()withaxis=0and columns holding incomparable types raisingIndexError(GH38932)Bug in
rankmethod forSeries,DataFrame,DataFrameGroupBy, andSeriesGroupBytreating the most negativeint64value as missing (GH32859)Bug in
select_dtypes()different behavior between Windows and Linux withinclude="int"(GH36569)Bug in
DataFrame.apply()andDataFrame.agg()when passed argumentfunc="size"would operate on the entireDataFrameinstead of rows or columns (GH39934)Bug in
DataFrame.transform()would raiseSpecificationErrorwhen passed a dictionary and columns were missing; will now raise aKeyErrorinstead (GH40004)Bug in
DataFrameGroupBy.rank()giving incorrect results withpct=Trueand equal values between consecutive groups (GH40518)Bug in
Series.count()would result in anint32result on 32-bit platforms when argumentlevel=None(GH40908)Bug in
SeriesandDataFramereductions with methodsanyandallnot returning boolean results for object data (GH12863, GH35450, GH27709)Bug in
Series.clip()would fail if series contains NA values and has nullable int or float as a data type (GH40851)
Conversion¶
Bug in
Series.to_dict()withorient='records'now returns python native types (GH25969)Bug in
Series.view()andIndex.view()when converting between datetime-like (datetime64[ns],datetime64[ns, tz],timedelta64,period) dtypes (GH39788)Bug in creating a
DataFramefrom an emptynp.recarraynot retaining the original dtypes (GH40121)Bug in
DataFramefailing to raiseTypeErrorwhen constructing from afrozenset(GH40163)Bug in
Indexconstruction silently ignoring a passeddtypewhen the data cannot be cast to that dtype (GH21311)Bug in
StringArray.astype()falling back to numpy and raising when converting todtype='categorical'(GH40450)Bug in
factorize()where, when given an array with a numeric numpy dtype lower than int64, uint64 and float64, the unique values did not keep their original dtype (GH41132)Bug in
DataFrameconstruction with a dictionary containing an arraylike withExtensionDtypeandcopy=Truefailing to make a copy (GH38939)Bug in
qcut()raising error when takingFloat64DTypeas input (GH40730)
Strings¶
Bug in the conversion from
pyarrow.ChunkedArraytoStringArraywhen the original had zero chunks (GH41040)Bug in
Series.replace()andDataFrame.replace()ignoring replacements withregex=TrueforStringDTypedata (GH41333, GH35977)Bug in
Series.str.extract()withStringArrayreturning object dtype for emptyDataFrame(GH41441)
Interval¶
Bug in
IntervalIndex.intersection()andIntervalIndex.symmetric_difference()always returning object-dtype when operating withCategoricalIndex(GH38653, GH38741)Bug in
IntervalIndex.intersection()returning duplicates when at least one of both Indexes has duplicates which are present in the other (GH38743)IntervalIndex.union(),IntervalIndex.intersection(),IntervalIndex.difference(), andIntervalIndex.symmetric_difference()now cast to the appropriate dtype instead of raisingTypeErrorwhen operating with anotherIntervalIndexwith incompatible dtype (GH39267)PeriodIndex.union(),PeriodIndex.intersection(),PeriodIndex.symmetric_difference(),PeriodIndex.difference()now cast to object dtype instead of raisingIncompatibleFrequencywhen operating with anotherPeriodIndexwith incompatible dtype (GH??)
Indexing¶
Bug in
Index.union()dropping duplicateIndexvalues whenIndexwas not monotonic orsortwas set toFalse(GH36289, GH31326, GH40862)Bug in
CategoricalIndex.get_indexer()failing to raiseInvalidIndexErrorwhen non-unique (GH38372)Bug in inserting many new columns into a
DataFramecausing incorrect subsequent indexing behavior (GH38380)Bug in
DataFrame.__setitem__()raisingValueErrorwhen setting multiple values to duplicate columns (GH15695)Bug in
DataFrame.loc(),Series.loc(),DataFrame.__getitem__()andSeries.__getitem__()returning incorrect elements for non-monotonicDatetimeIndexfor string slices (GH33146)Bug in
DataFrame.reindex()andSeries.reindex()with timezone aware indexes raisingTypeErrorformethod="ffill"andmethod="bfill"and specifiedtolerance(GH38566)Bug in
DataFrame.reindex()withdatetime64[ns]ortimedelta64[ns]incorrectly casting to integers when thefill_valuerequires casting to object dtype (GH39755)Bug in
DataFrame.__setitem__()raisingValueErrorwith emptyDataFrameand specified columns for string indexer and non emptyDataFrameto set (GH38831)Bug in
DataFrame.loc.__setitem__()raising ValueError when expanding unique column forDataFramewith duplicate columns (GH38521)Bug in
DataFrame.iloc.__setitem__()andDataFrame.loc.__setitem__()with mixed dtypes when setting with a dictionary value (GH38335)Bug in
Series.loc.__setitem__()andDataFrame.loc.__setitem__()raisingKeyErrorfor boolean Iterator indexer (GH39614)Bug in
Series.iloc()andDataFrame.iloc()raisingKeyErrorfor Iterator indexer (GH39614)Bug in
DataFrame.__setitem__()not raisingValueErrorwhen right hand side is aDataFramewith wrong number of columns (GH38604)Bug in
Series.__setitem__()raisingValueErrorwhen setting aSerieswith a scalar indexer (GH38303)Bug in
DataFrame.loc()dropping levels ofMultiIndexwhenDataFrameused as input has only one row (GH10521)Bug in
DataFrame.__getitem__()andSeries.__getitem__()always raisingKeyErrorwhen slicing with existing strings anIndexwith milliseconds (GH33589)Bug in setting
timedelta64ordatetime64values into numericSeriesfailing to cast to object dtype (GH39086, issue:39619)Bug in setting
Intervalvalues into aSeriesorDataFramewith mismatchedIntervalDtypeincorrectly casting the new values to the existing dtype (GH39120)Bug in setting
datetime64values into aSerieswith integer-dtype incorrect casting the datetime64 values to integers (GH39266)Bug in setting
np.datetime64("NaT")into aSerieswithDatetime64TZDtypeincorrectly treating the timezone-naive value as timezone-aware (GH39769)Bug in
Index.get_loc()not raisingKeyErrorwhen method is specified forNaNvalue whenNaNis not inIndex(GH39382)Bug in
DatetimeIndex.insert()when insertingnp.datetime64("NaT")into a timezone-aware index incorrectly treating the timezone-naive value as timezone-aware (GH39769)Bug in incorrectly raising in
Index.insert(), when setting a new column that cannot be held in the existingframe.columns, or inSeries.reset_index()orDataFrame.reset_index()instead of casting to a compatible dtype (GH39068)Bug in
RangeIndex.append()where a single object of length 1 was concatenated incorrectly (GH39401)Bug in
RangeIndex.astype()where when converting toCategoricalIndex, the categories became aInt64Indexinstead of aRangeIndex(GH41263)Bug in setting
numpy.timedelta64values into an object-dtypeSeriesusing a boolean indexer (GH39488)Bug in setting numeric values into a into a boolean-dtypes
Seriesusingatoriatfailing to cast to object-dtype (GH39582)Bug in
DataFrame.__setitem__()andDataFrame.iloc.__setitem__()raisingValueErrorwhen trying to index with a row-slice and setting a list as values (GH40440)Bug in
DataFrame.loc()not raisingKeyErrorwhen key was not found inMultiIndexwhen levels contain more values than used (GH41170)Bug in
DataFrame.loc.__setitem__()when setting-with-expansion incorrectly raising when the index in the expanding axis contains duplicates (GH40096)Bug in
DataFrame.loc()incorrectly matching non-boolean index elements (GH20432)Bug in
Series.__delitem__()withExtensionDtypeincorrectly casting tondarray(GH40386)Bug in
DataFrame.__setitem__()raisingTypeErrorwhen using a str subclass as the column name with aDatetimeIndex(GH37366)
Missing¶
Bug in
Groupernow correctly propagatesdropnaargument andDataFrameGroupBy.transform()now correctly handles missing values fordropna=True(GH35612)Bug in
isna(), andSeries.isna(),Index.isna(),DataFrame.isna()(and the correspondingnotnafunctions) not recognizingDecimal("NaN")objects (GH39409)Bug in
DataFrame.fillna()not accepting dictionary fordowncastkeyword (GH40809)Bug in
isna()not returning a copy of the mask for nullable types, causing any subsequent mask modification to change the original array (GH40935)
MultiIndex¶
Bug in
DataFrame.drop()raisingTypeErrorwhenMultiIndexis non-unique andlevelis not provided (GH36293)Bug in
MultiIndex.intersection()duplicatingNaNin result (GH38623)Bug in
MultiIndex.equals()incorrectly returningTruewhenMultiIndexcontainingNaNeven when they are differently ordered (GH38439)Bug in
MultiIndex.intersection()always returning empty when intersecting withCategoricalIndex(GH38653)Bug in
MultiIndex.reindex()raisingValueErrorwith empty MultiIndex and indexing only a specific level (GH41170)
I/O¶
Bug in
Index.__repr__()whendisplay.max_seq_items=1(GH38415)Bug in
read_csv()not recognizing scientific notation if decimal is set forengine="python"(GH31920)Bug in
read_csv()interpretingNAvalue as comment, whenNAdoes contain the comment string fixed forengine="python"(GH34002)Bug in
read_csv()raisingIndexErrorwith multiple header columns andindex_colspecified when file has no data rows (GH38292)Bug in
read_csv()not acceptingusecolswith different length thannamesforengine="python"(GH16469)Bug in
read_csv()returning object dtype whendelimiter=","withusecolsandparse_datesspecified forengine="python"(GH35873)Bug in
read_csv()raisingTypeErrorwhennamesandparse_datesis specified forengine="c"(GH33699)Bug in
read_clipboard(),DataFrame.to_clipboard()not working in WSL (GH38527)Allow custom error values for parse_dates argument of
read_sql(),read_sql_query()andread_sql_table()(GH35185)Bug in
to_hdf()raisingKeyErrorwhen trying to apply for subclasses ofDataFrameorSeries(GH33748)Bug in
put()raising a wrongTypeErrorwhen saving a DataFrame with non-string dtype (GH34274)Bug in
json_normalize()resulting in the first element of a generator object not being included in the returnedDataFrame(GH35923)Bug in
read_csv()applying thousands separator to date columns when column should be parsed for dates andusecolsis specified forengine="python"(GH39365)Bug in
read_excel()forward fillingMultiIndexnames with multiple header and index columns specified (GH34673)read_excel()now respectsset_option()(GH34252)Bug in
read_csv()not switchingtrue_valuesandfalse_valuesfor nullablebooleandtype (GH34655)Bug in
read_json()whenorient="split"does not maintain numeric string index (GH28556)read_sql()returned an empty generator ifchunksizewas no-zero and the query returned no results. Now returns a generator with a single empty dataframe (GH34411)Bug in
read_hdf()returning unexpected records when filtering on categorical string columns usingwhereparameter (GH39189)Bug in
read_sas()raisingValueErrorwhendatetimeswere null (GH39725)Bug in
read_excel()dropping empty values from single-column spreadsheets (GH39808)Bug in
read_excel()loading trailing empty rows/columns for some filetypes (GH41167)Bug in
read_excel()raisingAttributeErrorwithMultiIndexheader followed by two empty rows and no index, and bug affectingread_excel(),read_csv(),read_table(),read_fwf(), andread_clipboard()where one blank row after aMultiIndexheader with no index would be dropped (GH40442)Bug in
DataFrame.to_string()misplacing the truncation column whenindex=False(GH40907)Bug in
read_orc()always raisingAttributeError(GH40918)Bug in
read_csv()andread_table()silently ignoringprefixifnamesandprefixare defined, now raisingValueError(GH39123)Bug in
read_csv()andread_excel()not respecting dtype for duplicated column name whenmangle_dupe_colsis set toTrue(GH35211)Bug in
read_csv()andread_table()misinterpreting arguments whensys.setprofilehad been previously called (GH41069)Bug in the conversion from pyarrow to pandas (e.g. for reading Parquet) with nullable dtypes and a pyarrow array whose data buffer size is not a multiple of dtype size (GH40896)
Period¶
Comparisons of
Periodobjects orIndex,Series, orDataFramewith mismatchedPeriodDtypenow behave like other mismatched-type comparisons, returningFalsefor equals,Truefor not-equal, and raisingTypeErrorfor inequality checks (GH39274)
Plotting¶
Bug in
scatter_matrix()raising when 2daxargument passed (GH16253)Prevent warnings when matplotlib’s
constrained_layoutis enabled (GH25261)Bug in
DataFrame.plot()was showing the wrong colors in the legend if the function was called repeatedly and some calls usedyerrwhile others didn’t (partial fix of GH39522)Bug in
DataFrame.plot()was showing the wrong colors in the legend if the function was called repeatedly and some calls usedsecondary_yand others uselegend=False(GH40044)Bug in
DataFrame.plot.box()in box plot whendark_backgroundtheme was selected, caps or min/max markers for the plot was not visible (GH40769)
Groupby/resample/rolling¶
Bug in
DataFrameGroupBy.agg()andSeriesGroupBy.agg()withPeriodDtypecolumns incorrectly casting results too aggressively (GH38254)Bug in
SeriesGroupBy.value_counts()where unobserved categories in a grouped categorical series were not tallied (GH38672)Bug in
SeriesGroupBy.value_counts()where error was raised on an empty series (GH39172)Bug in
GroupBy.indices()would contain non-existent indices when null values were present in the groupby keys (GH9304)Fixed bug in
DataFrameGroupBy.sum()andSeriesGroupBy.sum()causing loss of precision through using Kahan summation (GH38778)Fixed bug in
DataFrameGroupBy.cumsum(),SeriesGroupBy.cumsum(),DataFrameGroupBy.mean()andSeriesGroupBy.mean()causing loss of precision through using Kahan summation (GH38934)Bug in
Resampler.aggregate()andDataFrame.transform()raisingTypeErrorinstead ofSpecificationErrorwhen missing keys had mixed dtypes (GH39025)Bug in
DataFrameGroupBy.idxmin()andDataFrameGroupBy.idxmax()withExtensionDtypecolumns (GH38733)Bug in
Series.resample()would raise when the index was aPeriodIndexconsisting ofNaT(GH39227)Bug in
core.window.rolling.RollingGroupby.corr()andcore.window.expanding.ExpandingGroupby.corr()where the groupby column would return 0 instead ofnp.nanwhen providingotherthat was longer than each group (GH39591)Bug in
core.window.expanding.ExpandingGroupby.corr()andcore.window.expanding.ExpandingGroupby.cov()where 1 would be returned instead ofnp.nanwhen providingotherthat was longer than each group (GH39591)Bug in
GroupBy.mean(),GroupBy.median()andDataFrame.pivot_table()not propagating metadata (GH28283)Bug in
Series.rolling()andDataFrame.rolling()not calculating window bounds correctly when window is an offset and dates are in descending order (GH40002)Bug in
SeriesGroupByandDataFrameGroupByon an emptySeriesorDataFramewould lose index, columns, and/or data types when directly using the methodsidxmax,idxmin,mad,min,max,sum,prod, andskewor using them throughapply,aggregate, orresample(GH26411)Bug in
DataFrameGroupBy.apply()where aMultiIndexwould be created instead of anIndexif a:meth:`core.window.rolling.RollingGroupbyobject was created (GH39732)Bug in
DataFrameGroupBy.sample()where error was raised whenweightswas specified and the index was anInt64Index(GH39927)Bug in
DataFrameGroupBy.aggregate()andResampler.aggregate()would sometimes raiseSpecificationErrorwhen passed a dictionary and columns were missing; will now always raise aKeyErrorinstead (GH40004)Bug in
DataFrameGroupBy.sample()where column selection was not applied to sample result (GH39928)Bug in
core.window.ewm.ExponentialMovingWindowwhen calling__getitem__would incorrectly raise aValueErrorwhen providingtimes(GH40164)Bug in
core.window.ewm.ExponentialMovingWindowwhen calling__getitem__would not retaincom,span,alphaorhalflifeattributes (GH40164)core.window.ewm.ExponentialMovingWindownow raises aNotImplementedErrorwhen specifyingtimeswithadjust=Falsedue to an incorrect calculation (GH40098)Bug in
core.window.ewm.ExponentialMovingWindowGroupby.mean()where the times argument was ignored whenengine='numba'(GH40951)Bug in
core.window.ewm.ExponentialMovingWindowGroupby.mean()where the wrong times were used in case of multiple groups (GH40951)Bug in
core.window.ewm.ExponentialMovingWindowGroupbywhere the times vector and values became out of sync for non-trivial groups (GH40951)Bug in
Series.asfreq()andDataFrame.asfreq()dropping rows when the index is not sorted (GH39805)Bug in aggregation functions for
DataFramenot respectingnumeric_onlyargument whenlevelkeyword was given (GH40660)Bug in
SeriesGroupBy.aggregate()where using a user-defined function to aggregate aSerieswith an object-typedIndexcauses an incorrectIndexshape (issue:40014)Bug in
core.window.RollingGroupbywhereas_index=Falseargument ingroupbywas ignored (GH39433)Bug in
GroupBy.any()andGroupBy.all()raisingValueErrorwhen using with nullable type columns holdingNAeven withskipna=True(GH40585)Bug in
GroupBy.cummin()andGroupBy.cummax()incorrectly rounding integer values near theint64implementations bounds (GH40767)Bug in
GroupBy.rank()with nullable dtypes incorrectly raisingTypeError(GH41010)Bug in
GroupBy.cummin()andGroupBy.cummax()computing wrong result with nullable data types too large to roundtrip when casting to float (GH37493)Bug in
DataFrame.rolling()returning mean zero for allNaNwindow withmin_periods=0if calculation is not numerical stable (GH41053)Bug in
DataFrame.rolling()returning sum not zero for allNaNwindow withmin_periods=0if calculation is not numerical stable (GH41053)Bug in
SeriesGroupBy.agg()failing to retain orderedCategoricalDtypeon order-preserving aggregations (GH41147)Bug in
DataFrameGroupBy.min()andDataFrameGroupBy.max()with multiple object-dtype columns andnumeric_only=Falseincorrectly raisingValueError(:issue:41111`)Bug in
DataFrameGroupBy.rank()with the GroupBy object’saxis=0and therankmethod’s keywordaxis=1(GH41320)Bug in
DataFrameGroupBy.__getitem__()with non-unique columns incorrectly returning a malformedSeriesGroupByinstead ofDataFrameGroupBy(GH41427)Bug in
DataFrameGroupBy.transform()with non-unique columns incorrectly raisingAttributeError(GH41427)Bug in
Resampler.apply()with non-unique columns incorrectly dropping duplicated columns (GH41445)
Reshaping¶
Bug in
merge()raising error when performing an inner join with partial index andright_indexwhen no overlap between indices (GH33814)Bug in
DataFrame.unstack()with missing levels led to incorrect index names (GH37510)Bug in
merge_asof()propagating the right Index withleft_index=Trueandright_onspecification instead of left Index (GH33463)Bug in
join()overMultiIndexreturned wrong result, when one of both indexes had only one level (GH36909)merge_asof()raisesValueErrorinstead of crypticTypeErrorin case of non-numerical merge columns (GH29130)Bug in
DataFrame.join()not assigning values correctly when havingMultiIndexwhere at least one dimension is from dtypeCategoricalwith non-alphabetically sorted categories (GH38502)Series.value_counts()andSeries.mode()return consistent keys in original order (GH12679, GH11227 and GH39007)Bug in
DataFrame.stack()not handlingNaNinMultiIndexcolumns correct (GH39481)Bug in
DataFrame.apply()would give incorrect results when used with a string argument andaxis=1when the axis argument was not supported and now raises aValueErrorinstead (GH39211)Bug in
DataFrame.sort_values()not reshaping index correctly after sorting on columns, whenignore_index=True(GH39464)Bug in
DataFrame.append()returning incorrect dtypes with combinations ofExtensionDtypedtypes (GH39454)Bug in
DataFrame.append()returning incorrect dtypes with combinations ofdatetime64andtimedelta64dtypes (GH39574)Bug in
DataFrame.pivot_table()returning aMultiIndexfor a single value when operating on and emptyDataFrame(GH13483)Allow
Indexto be passed to thenumpy.all()function (GH40180)Bug in
DataFrame.stack()not preservingCategoricalDtypein aMultiIndex(GH36991)Bug in
to_datetime()raising error when input sequence contains unhashable items (GH39756)Bug in
Series.explode()preserving index whenignore_indexwasTrueand values were scalars (GH40487)Bug in
to_datetime()raisingValueErrorwhenSeriescontainsNoneandNaTand has more than 50 elements (GH39882)
Sparse¶
Bug in
DataFrame.sparse.to_coo()raisingKeyErrorwith columns that are a numericIndexwithout a 0 (GH18414)Bug in
SparseArray.astype()withcopy=Falseproducing incorrect results when going from integer dtype to floating dtype (GH34456)Implemented
SparseArray.max()andSparseArray.min()(GH40921)
ExtensionArray¶
Bug in
DataFrame.where()whenotheris aSerieswithExtensionArraydtype (GH38729)Fixed bug where
Series.idxmax(),Series.idxmin()andargmax/minfail when the underlying data isExtensionArray(GH32749, GH33719, GH36566)Fixed a bug where some properties of subclasses of
PandasExtensionDtypewhere improperly cached (GH40329)Bug in
DataFrame.mask()where masking aDataframewith anExtensionArraydtype raisesValueError(GH40941)
Styler¶
Bug in
Stylerwheresubsetarg in methods raised an error for some valid multiindex slices (GH33562)Stylerrendered HTML output minor alterations to support w3 good code standard (GH39626)Bug in
Stylerwhere rendered HTML was missing a column class identifier for certain header cells (GH39716)Bug in
Styler.background_gradient()where text-color was not determined correctly (GH39888)Bug in
Stylerwhere multiple elements in CSS-selectors were not correctly added totable_styles(GH39942)Bug in
Stylerwhere copying from Jupyter dropped top left cell and misaligned headers (GH12147)Bug in
Styler.wherewherekwargswere not passed to the applicable callable (GH40845)Bug in
Stylerwhich caused CSS to duplicate on multiple renders. (GH39395, GH40334)
Other¶
Bug in
Indexconstructor sometimes silently ignoring a specifieddtype(GH38879)Bug in
pandas.api.types.infer_dtype()not recognizing Series, Index or array with a period dtype (GH23553)Bug in
pandas.api.types.infer_dtype()raising an error for generalExtensionArrayobjects. It will now return"unknown-array"instead of raising (GH37367)Bug in constructing a
Seriesfrom a list and aPandasDtype(GH39357)inspect.getmembers(Series)no longer raises anAbstractMethodError(GH38782)Bug in
Series.where()with numeric dtype andother = Nonenot casting tonan(GH39761)Index.where()behavior now mirrorsIndex.putmask()behavior, i.e.index.where(mask, other)matchesindex.putmask(~mask, other)(GH39412)Bug in
pandas.testing.assert_series_equal(),pandas.testing.assert_frame_equal(),pandas.testing.assert_index_equal()andpandas.testing.assert_extension_array_equal()incorrectly raising when an attribute has an unrecognized NA type (GH39461)Bug in
pandas.testing.assert_index_equal()withexact=Truenot raising when comparingCategoricalIndexinstances withInt64IndexandRangeIndexcategories (GH41263)Bug in
DataFrame.equals(),Series.equals(),Index.equals()with object-dtype containingnp.datetime64("NaT")ornp.timedelta64("NaT")(GH39650)Bug in
pandas.util.show_versions()where console JSON output was not proper JSON (GH39701)Bug in
DataFrame.convert_dtypes()incorrectly raised ValueError when called on an empty DataFrame (GH40393)Bug in
DataFrame.clip()not interpreting missing values as no threshold (GH40420)Bug in
Seriesbacked byDatetimeArrayorTimedeltaArraysometimes failing to set the array’sfreqtoNone(GH41425)