Skip to content
Snippets Groups Projects

Draft: POC read file with pandas instead of dask

Closed Jeremie Hallal requested to merge POC_read_with_pandas into master
1 unresolved thread

Merge request reports

Merge request pipeline #100613 failed

Merge request pipeline failed for e25b1d89

Approval is optional

Closed by YannickYannick 2 years ago (Nov 9, 2022 9:28am UTC)

Merge details

  • The changes were not merged into master.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
171 171
172 172 # read all chunk for requested columns
173 173 def read_parquet_files(f):
174 """ read all chunk for requested columns """
175 return read_with_dask(f.paths, columns=f.labels, storage_options=self._parameters.storage_options)
174 dfs = [pd.read_parquet(pq_file, engine='pyarrow', columns=f.labels,
175 storage_options=self._parameters.storage_options)
176 for pq_file in f.paths]
177 return pd.concat(dfs)
  • closed

  • Please register or sign in to reply
    Loading