You should use the s3fs
module as proposed by yjk21. However as result of calling ParquetDataset you’ll get a pyarrow.parquet.ParquetDataset object. To get the Pandas DataFrame you’ll rather want to apply .read_pandas().to_pandas()
to it:
import pyarrow.parquet as pq
import s3fs
s3 = s3fs.S3FileSystem()
pandas_dataframe = pq.ParquetDataset('s3://your-bucket/', filesystem=s3).read_pandas().to_pandas()
Related Contents:
- How to unnest (explode) a column in a pandas DataFrame, into multiple rows
- Import multiple csv files into pandas and concatenate into one DataFrame
- Convert list of dictionaries to a pandas DataFrame
- Convert pandas dataframe to NumPy array
- How to replace NaNs by preceding or next values in pandas DataFrame?
- How to apply a function to two columns of Pandas dataframe
- Pandas filtering for multiple substrings in series
- Dynamically evaluate an expression from a formula in Pandas
- Remove pandas rows with duplicate indices
- What is the most efficient way to loop through dataframes with pandas?
- Pandas: sum DataFrame rows for given columns
- Merge two dataframes by index
- Filtering Pandas DataFrames on dates
- Get a list from Pandas DataFrame column headers
- Selecting a row of pandas series/dataframe by integer index
- Numpy “where” with multiple conditions
- pandas dataframe str.contains() AND operation
- How can I map True/False to 1/0 in a Pandas DataFrame?
- Merge multiple column values into one column in python pandas
- Drop rows containing empty cells from a pandas DataFrame
- Python pandas: how to specify data types when reading an Excel file?
- How do I find the closest values in a Pandas series to an input number?
- How to access pandas groupby dataframe by key
- python pandas extract year from datetime: df[‘year’] = df[‘date’].year is not working
- Calculate time difference between Pandas Dataframe indices
- Removing prefix from column names in Pandas
- Add column to dataframe with constant value
- How can I get a value from a cell of a dataframe?
- getting the index of a row in a pandas apply function
- Reverse a get_dummies encoding in pandas
- Creating an empty Pandas DataFrame, and then filling it
- How do I assign values based on multiple conditions for existing columns?
- How to drop duplicates based on two or more subsets criteria in Pandas data-frame
- How can I replicate rows in Pandas?
- Pandas Dataframe: Replacing NaN with row average
- How to delete multiple pandas (python) dataframes from memory to save RAM?
- Pandas get the most frequent values of a column
- Pandas merge two dataframes with different columns
- How to return a csv file/Pandas DataFrame in JSON format using FastAPI?
- How to reorder indexed rows based on a list in Pandas data frame
- Get frequency of item occurrences in a column as percentage [duplicate]
- Action with pandas SettingWithCopyWarning
- How to create a dictionary of two pandas DataFrame columns
- Convert column to row in Python Pandas
- Shift NaNs to the end of their respective rows
- Checking if particular value (in cell) is NaN in pandas DataFrame not working using ix or iloc
- Good alternative to Pandas .append() method, now that it is being deprecated?
- Store numpy.array in cells of a Pandas.DataFrame
- Check if rows in one dataframe exist in another dataframe
- DataFrame object has no attribute append