Monday, May 27, 2013

Redo-style signals

Let's say I have signals a, b, c, d with dependencies

Inputs ~> a ~> b ~> c ~> d ~> Real World

The signal 'a' occasionally changes (due to inputs), which in turn changes the valid values of 'b', 'c' and 'd'. The value of 'd' is occasionally read and used to update some real-world object that the user can see. Maybe 'a' is set more often than 'd' is read; maybe it's the other way around; we're not sure.

The question is, when should the computations of 'b' 'c' and 'd' be performed? Should it be done in "push style", ie, every time 'a' changes, or "pull style", every time 'd' is read? In the first case we may waste a lot of computation if 'a' is set more frequently than 'd' is read. In the second, we do not waste computation, but we require a full check of the dependency tree every time we read 'd', just to see if something has changed in the meantime.

Well it turns out that the conceptual solution to this problem was already solved by Avery Pennarun in his program redo:

Dependencies are tracked in a persistent .redo database so that redo can check them later. If a file needs to be rebuilt, it re-executes the whatever.do script and regenerates the dependencies. If a file doesn't need to be rebuilt, redo can calculate that just using its persistent .redo database, without re-running the script. And it can do that check just once right at the start of your project build.

redo is based on the following simple insight: you don't actually care what the dependencies are before you build the target; if the target doesn't exist, you obviously need to build it. Then, the build script itself can provide the dependency information however it wants; unlike in make, you don't need a special dependency syntax at all. You can even declare some of your dependencies after building, which makes C-style autodependencies much simpler.

A signal's value is either up to date or obsolete. When the value is requested, if it is up to date, it returns that value immediately, not checking the dependency tree. If it's obsolete, it does a full recalculation.

Then, when a source value like 'a' changes, it simply asks the question: Who is currently relying on the correct value of this signal? In the case of 'a', that would be 'b', so 'a' notifies 'b' that the value is now obsolete (but without providing the new value), who notifies 'c', who notifies 'd'. After that notification chain is sent, no one is relying on the correct value for anything, so the next time 'a' changes, nothing happens; no notifications are done.

There is another benefit to this system. Say the dependency tree actually looks like

  d
 / \
b   c
 \ /
  a

With push-style signals, every time 'a' changes, 'd' gets updated twice, which is wasteful. With pull-style signals, 'd', would be updated once, but a would be consulted twice every time d is read. Redo-style signals get the benefit of push-style signals without having to consult the whole tree at each update.

Code

No comments:

Post a Comment