October 2017 Meeting - Data serialization battle


#1

I was thinking we could do a comparison of data serialization options for the October meeting. Say comparing JSON/YAML, pickle, and then something more efficient … msgpack or capnproto.

Sound interesting to anyone? I can do an intro and cover one option. Any takers for the other two?

Austin


#2

Also would be interested to hear about numpy and pandas built-in serializer
methods and how they compare to the stdlib ones!


#3

I’m down, I can do msgpack and structs.


#4

Awesome, thanks @wtolson … this is the 4th Tuesday meeting I am talking about here BTW. I think DesertPy Downtown will be PyTest.

@alkasm I am not exactly sure what you mean but I think I’ve looked at that and used it … is the thing I am doing at the end of this presentation what you’re talking about?

  • Austin

#5

There’s ndarray.tofile(), numpy.save(ndarray), and
numpy.savez(ndarray). The latter two save to special numpy file
formats, .npyand .npz respectively. I think pytables seems pretty
popular for saving large datasets too in the HDF5 format (for e.g. with
pandas http://pandas.pydata.org/pandas-docs/stable/io.html#hdf5-pytables).


#6

@wtolson, you going to be in town for this on Oct 24th?


#7

Or anyone else for that matter?


#8

In the UK now but I’ll be back before the 24th!


#9

My trip was canceled so I’m available.


#10

Sweet! Either (or BOTH) of you mind throwing together a lightning talk on the serialization topic of your choice? I will pick whatever is left over?

  • Austin