How to provide a reproducible copy of your DataFrame with to_clipboard()

First: Do not post images of data, text only please

Second: Do not paste data in the comments section or as an answer, edit your question instead


How to quickly provide sample data from a pandas DataFrame

  • There is more than one way to answer this question. However, this answer isn’t meant as an exhaustive solution. It provides the simplest method possible.
  • For the curious, there are other more verbose solutions provided on Stack Overflow.
  1. Provide a link to a shareable dataset (maybe on GitHub or a shared file on Google). This is particularly useful if it’s a large dataset and the objective is to optimize some method. The drawback is that the data may no longer be available in the future, which reduces the benefit of the post.
    • Data must be provided in the question, but can be accompanied by a link to a more extensive dataset.
    • Do not post only a link or an image of the data.
  2. Provide the output of df.head(10).to_clipboard(sep=',', index=True)

Code:

Provide the output of pandas.DataFrame.to_clipboard

df.head(10).to_clipboard(sep=',', index=True)
  • If you have a multi-index DataFrame add a note, telling which columns are the indices.
  • Note: when the previous line of code is executed, no output will appear.
    • The result of the code is now on the clipboard.
  • Paste the clipboard into a code block in your Stack Overflow question
,a,b
2020-07-30,2,4
2020-07-31,1,5
2020-08-01,2,2
2020-08-02,9,8
2020-08-03,4,0
2020-08-04,3,3
2020-08-05,7,7
2020-08-06,7,0
2020-08-07,8,4
2020-08-08,3,2
  • This can be copied to the clipboard by someone trying to answer your question, and followed by:
df = pd.read_clipboard(sep=',')

Locations of the dataframe other the .head(10)

  • Specify a section of the dataframe with the .iloc property
  • The following example selects rows 3 – 11 and all the columns
df.iloc[3:12, :].to_clipboard(sep=',')

Additional References for pd.read_clipboard

  • Specify Multi-Level columns using pd.read_clipboard?
  • How do you handle column names having spaces in them when using pd.read_clipboard?
  • How to handle custom named index when copying a dataframe using pd.read_clipboard?

Google Colab Users

  • .to_clipboard() won’t work
  • Use .to_dict() to copy your dataframe
# if you have a datetime column, convert it to a str
df['date'] = df['date'].astype('str')

# if you have a datetime index, convert it to a str
df.index = df.index.astype('str')

# output to a dict
df.head(10).to_dict(orient="index")

# which will look like
{'2020-07-30': {'a': 2, 'b': 4},
 '2020-07-31': {'a': 1, 'b': 5},
 '2020-08-01': {'a': 2, 'b': 2},
 '2020-08-02': {'a': 9, 'b': 8},
 '2020-08-03': {'a': 4, 'b': 0},
 '2020-08-04': {'a': 3, 'b': 3},
 '2020-08-05': {'a': 7, 'b': 7},
 '2020-08-06': {'a': 7, 'b': 0},
 '2020-08-07': {'a': 8, 'b': 4},
 '2020-08-08': {'a': 3, 'b': 2}}

# copy the previous dict and paste into a code block on SO
# the dict can be converted to a dataframe with 
# df = pd.DataFrame.from_dict(d, orient="index")  # d is the name of the dict
# convert datatime column or index back to datetime
  • For a more thorough answer using .to_dict()
    • How to efficiently build and share a sample dataframe?
    • How to make good reproducible pandas examples

Leave a Comment