October 2017 Meeting - Data serialization battle


I was thinking we could do a comparison of data serialization options for the October meeting. Say comparing JSON/YAML, pickle, and then something more efficient … msgpack or capnproto.

Sound interesting to anyone? I can do an intro and cover one option. Any takers for the other two?



Also would be interested to hear about numpy and pandas built-in serializer
methods and how they compare to the stdlib ones!


I’m down, I can do msgpack and structs.


Awesome, thanks @wtolson … this is the 4th Tuesday meeting I am talking about here BTW. I think DesertPy Downtown will be PyTest.

@alkasm I am not exactly sure what you mean but I think I’ve looked at that and used it … is the thing I am doing at the end of this presentation what you’re talking about?

  • Austin


There’s ndarray.tofile(), numpy.save(ndarray), and
numpy.savez(ndarray). The latter two save to special numpy file
formats, .npyand .npz respectively. I think pytables seems pretty
popular for saving large datasets too in the HDF5 format (for e.g. with
pandas http://pandas.pydata.org/pandas-docs/stable/io.html#hdf5-pytables).


@wtolson, you going to be in town for this on Oct 24th?


Or anyone else for that matter?


In the UK now but I’ll be back before the 24th!


My trip was canceled so I’m available.


Sweet! Either (or BOTH) of you mind throwing together a lightning talk on the serialization topic of your choice? I will pick whatever is left over?

  • Austin