I have a large object that I want to save to the disk because it takes a minute to generate. The OOM reaper kills the process while pickle.dump ing the object.

It’s a tuple of dicts of tuple of array.array.

Can pickle dump in chunks? If not, is there another technique I can use?

  • alehc
    link
    fedilink
    arrow-up
    5
    ·
    1 year ago

    You mean numpy arrays? I think the most efficient way to store them is via np.save. You could try creating a new directory and store all of your arrays there with clever file naming to retrieve the dictionary structure later.

    Alternatively if you are up to trying to use pytorch you can convert the arrays to tensors and use torch.save to save the entire dictionary in one file. Installing pytorch just for this might be a bit overkill as it is a >1GB installation tho.

    • lntlOP
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      It’s a tuple of dicts of tuples of array.array, no numpy or torch :(

      • alehc
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        1 year ago

        So python standard library lists?

  • radarsat1
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    If you’re pickling that much data you should definitely consider using a more appropriate data format. Maybe a database or HDF5?

    • lntlOP
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      Agreed. When I started, things were much simpler. Trying not to revise too much code but I can if there’s no other option.

  • Biorix@lemmy.fmhy.ml
    link
    fedilink
    arrow-up
    2
    ·
    1 year ago

    How are you dumping it?

    Can you show us the code?

    Have you tried splitting your tuple and save each dicts of the tuple separately?

  • McWizard@feddit.de
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    A colleague of mine replaced the internal store format of pickle with Json iirc. It was like 10x faster. Not exactly sure how you do that, but I can check if you want to go that way.