4 minutes
You haven’t heard a lot from us in the last weeks and that’s because we’ve been busy. Busy with probably the largest refactor in the history of your favorite spreadsheet, Neptyne. Doing large refactors is something you ideally want to avoid. It’s much nicer to take small steps all the while unlocking stuff that makes the life of your users better. But sometimes it is hard to avoid. Let’s start with a little bit of history to explain why we think this was one of those cases.
When we started out with Neptyne, we had a pretty simple model. There’s a jupyter kernel that executes your code and this is almost a stock kernel you know from your notebook setup. And we have a client that has a code editor and a spreadsheet. Between the two sits a server component that maintains the spreadsheet and converts anything happening in the spreadsheet into python code the kernel can run.
Both the kernel and the server component kept information about the spreadsheet. The kernel needed to have access to the actual values of the spreadsheet to be able to run python. The server component maintained the formulas, which cells depend on what others and cell attributes (is this cell bold?). This worked ok, but started to chafe as we built out functionality. You want to be able to make a cell bold from code that runs in the kernel. And who maintains the name of the sheets? The sheet order?
Most of this was doable by sending messages between the kernel and the server component but this was getting more and more complicated. Renaming a sheet from code involved two round trips for example. So we decided to do away with the distinction and move the entire spreadsheet engine into the kernel. How hard could it be?
Turns out, pretty hard. Giving every cell that now was going to live in the kernel its own metadata works, but becomes slow and memory intensive for large spreadsheets. So we needed to carefully decide when to create that metadata and when just having a cell was enough - or even when just having metadata was the best call (a cell can have a format but no value).
And then when it is time to save the contents of the spreadsheet, we need to put it all together again to keep the same database format (let’s not change everything!). But that operation became really quite slow, which makes the spreadsheet stop processing at seemingly random intervals. So we moved the putting-it-all-together code to the server component again where it could live in its own thread. That in turn led to all kinds of “interesting” threading issues in non-entrant libraries.
But it’s all done and since May 31st we are running this new code on our main server. We tested it quite extensively but a bug or two might have slipped through. Let us know if you spot something.
The new stack should allow us to roll out a bunch of often requested features: delete/insert row from code. Full access to cell attributes, both getting and setting. And maybe most importantly, this should allow us to scale up the size and increase processing speed!