聚合国内IT技术精华文章,分享IT技术精华,帮助IT从业人士成长

  • 3914 views阅读

    Get the number of rows for a parquet file

    We were using Pandas to get the number of rows for a parquet file: import pandas as pd df = pd.read_parquet("my.parquet") print(df.shape[0]) This is easy but will cost a lot of ti...

    分类:技术文章 时间:2021-12-17 09:52 我要评论(0个)

  • 4261 views阅读

    An old bug about PyArrow

    To save memory for my program using Pandas, I change types of some column from string to category as the reference. df[["os_type", "cpu_type", "chip_brand"]] = d...

    分类:技术文章 时间:2021-02-05 10:25 我要评论(0个)