Filling in date gaps in MultiIndex Pandas Dataframe

You can make a new multi index based on the Cartesian product of the levels of the existing multi index. Then, re-index your data frame using the new index.

new_index = pd.MultiIndex.from_product(df.index.levels)
new_df = df.reindex(new_index)

# Optional: convert missing values to zero, and convert the data back
# to integers. See explanation below.
new_df = new_df.fillna(0).astype(int)

That’s it! The new data frame has all the possible index values. The existing data is indexed correctly.

Read on for a more detailed explanation.


Explanation

Set up sample data

import pandas as pd

df = pd.DataFrame({'A': ['loc_a'] * 12 + ['loc_b'],
                   'B': ['group_a'] * 7 + ['group_b'] * 3 + ['group_c'] * 2 + ['group_a'],
                   'Date': ["2013-06-11",
                           "2013-07-02",
                           "2013-07-09",
                           "2013-07-30",
                           "2013-08-06",
                           "2013-09-03",
                           "2013-10-01",
                           "2013-07-09",
                           "2013-08-06",
                           "2013-09-03",
                           "2013-07-09",
                           "2013-09-03",
                           "2013-10-01"],
                    'Value': [22, 35, 14,  9,  4, 40, 18, 4, 2, 5, 1, 2, 3]})

df.Date = pd.to_datetime(df.Date)

df = df.set_index(['A', 'B', 'Date'])

Here’s what the sample data looks like

                          Value
A     B       Date
loc_a group_a 2013-06-11     22
              2013-07-02     35
              2013-07-09     14
              2013-07-30      9
              2013-08-06      4
              2013-09-03     40
              2013-10-01     18
      group_b 2013-07-09      4
              2013-08-06      2
              2013-09-03      5
      group_c 2013-07-09      1
              2013-09-03      2
loc_b group_a 2013-10-01      3

Make new index

Using from_product we can make a new multi index. This new index is the Cartesian product of all the values from all the levels of the old index.

new_index = pd.MultiIndex.from_product(df.index.levels)

Reindex

Use the new index to reindex the existing data frame.

new_df = df.reindex(new_index)

All the possible combinations are now present. The missing values are null (NaN).

The expanded, re-indexed data frame looks like this:

                          Value
loc_a group_a 2013-06-11   22.0
              2013-07-02   35.0
              2013-07-09   14.0
              2013-07-30    9.0
              2013-08-06    4.0
              2013-09-03   40.0
              2013-10-01   18.0
      group_b 2013-06-11    NaN
              2013-07-02    NaN
              2013-07-09    4.0
              2013-07-30    NaN
              2013-08-06    2.0
              2013-09-03    5.0
              2013-10-01    NaN
      group_c 2013-06-11    NaN
              2013-07-02    NaN
              2013-07-09    1.0
              2013-07-30    NaN
              2013-08-06    NaN
              2013-09-03    2.0
              2013-10-01    NaN
loc_b group_a 2013-06-11    NaN
              2013-07-02    NaN
              2013-07-09    NaN
              2013-07-30    NaN
              2013-08-06    NaN
              2013-09-03    NaN
              2013-10-01    3.0
      group_b 2013-06-11    NaN
              2013-07-02    NaN
              2013-07-09    NaN
              2013-07-30    NaN
              2013-08-06    NaN
              2013-09-03    NaN
              2013-10-01    NaN
      group_c 2013-06-11    NaN
              2013-07-02    NaN
              2013-07-09    NaN
              2013-07-30    NaN
              2013-08-06    NaN
              2013-09-03    NaN
              2013-10-01    NaN

Nulls in integer column

You can see that the data in the new data frame has been converted from ints to floats. Pandas can’t have nulls in an integer column. Optionally, we can convert all the nulls to 0, and cast the data back to integers.

new_df = new_df.fillna(0).astype(int)

Result

                          Value
loc_a group_a 2013-06-11     22
              2013-07-02     35
              2013-07-09     14
              2013-07-30      9
              2013-08-06      4
              2013-09-03     40
              2013-10-01     18
      group_b 2013-06-11      0
              2013-07-02      0
              2013-07-09      4
              2013-07-30      0
              2013-08-06      2
              2013-09-03      5
              2013-10-01      0
      group_c 2013-06-11      0
              2013-07-02      0
              2013-07-09      1
              2013-07-30      0
              2013-08-06      0
              2013-09-03      2
              2013-10-01      0
loc_b group_a 2013-06-11      0
              2013-07-02      0
              2013-07-09      0
              2013-07-30      0
              2013-08-06      0
              2013-09-03      0
              2013-10-01      3
      group_b 2013-06-11      0
              2013-07-02      0
              2013-07-09      0
              2013-07-30      0
              2013-08-06      0
              2013-09-03      0
              2013-10-01      0
      group_c 2013-06-11      0
              2013-07-02      0
              2013-07-09      0
              2013-07-30      0
              2013-08-06      0
              2013-09-03      0
              2013-10-01      0

Leave a Comment