聚合国内IT技术精华文章,分享IT技术精华,帮助IT从业人士成长

A few notes for Pandas and BigQuery

2021-01-22 10:29 浏览: 778 次 我要评论(0 条) 字号:

  1. Get the memory size of a DataFrame of Pandas
df.memory_usage(deep=True).sum()

2. Upload a large DataFrame of Pandas to BigQuery table

If your DataFrame is too big, the uploading operation will report “UDF out of memory”

google.api_core.exceptions.BadRequest: 400 Resources exceeded during query execution: UDF out of memory.; Failed to read Parquet file [...]. This might happen if the file contains a row that is too large, or if the total size of the pages loaded for the queried columns is too large.

The solution is as simple as splitting the DataFrame and upload them one by one:

client = bigquery.Client()
for df_chunk in np.array_split(df, 10):
    job_config = bigquery.LoadJobConfig()
    job_config.write_disposition = bigquery.WriteDisposition.WRITE_APPEND
    job = client.load_table_from_dataframe(df_chunk, table_id, job_config=job_config)
    job.result()

3. Restore table in BigQuery

How to recover a deleted table in BigQuery? Just use bq command

bq cp dataset.table@1577833205000 dataset.new_table

If your <timestamp> is not correct, the bq command will give you a notification about what <timestamp> is right for this table. Then you can use that correct <timestamp> again.



网友评论已有0条评论, 我也要评论

发表评论

*

* (保密)

Ctrl+Enter 快捷回复