When building applications for communities, we often come across the need to aggregate information across many transactions, and make the resulting statistics available on-chain, in order for the results to be used in computations. First, let’s look at a few typical examples of when statistics would be calculated and used on-chain. And then, we’ll summarize the common functionality between them.
Suppose a company launched an Investor Token to raise money for a project. Having sold over 50% of the tokens to investors, they decided against listing it on secondary exchanges, because of regulatory risks and also because they worried the spot price would tank. Instead, they chose to focus their effort on selling a Utility Token and automatically distributing the proceeds from sales (ETH, USDT, etc.) in the form of dividends to the holders of the Investor Token.
The amount of dividends given out in any given period, would be the amount of proceeds from sales, multiplied by the proportion of the Investor Token held by each investor.
Consumer Price Index
One interesting application is implementing a consumer price index of various goods and services. For instance, a university might convince its alumni to crowdfund a daily stipend for its student body (a form of universal basic income). But, how much UBI to distribute? To answer this question, the university could tag various vendor addresses with “food”, “books”, “clothing”, etc. and start calculating statistics about the amount of money being spent on these categories. Then, the alumni themselves would be able to vote on how much to subsidize each category.
The amount of stipend given out in any given period, would be the average amount spent on each category, multiplied by the average proportion to subsidize for that category, as voted by the alumni for that period.
In both cases, we are looking to take the dot product of two different time series statistics. For the dividends, we’re multiplying the proceeds of sales by the proportion of investor tokens held. For the stipend, we’re multiplying the average spent on each category by the average subsidy voted for that category.
Whenever an investor, or a student, looks to withdraw money that has accumulated, the smart contract looks at these products and takes a sum over all the time intervals since the last time money was withdrawn. The result is the total that can be disbursed.
Since blockchains run expensive global consensus, we want to avoid large loops in our smart contract code. Therefore, we group our statistics into buckets, before taking the dot product. Buckets can represent a day, week, month, year and so forth. The latest bucket being filled is typically not yet ready, so all results (such as summing the dot products) would be calculated up to and including the second-to-last bucket. For example, the dividends could be claimed, as of yesterday, or last week, depending on the bucket size.
The statistics themselves are calculated from the inputs. Sometimes, we record a monotonically increasing sum, so we are able to find out the difference between any two timestamps. Other times, we want to update a running average – a scalar value essentially weighted towards the more recent set of transactions. In either case, the smart contract would update this as the inputs come in. It would update statistics in the current bucket, so as to cut down on the loop iterations when the time comes to calculate outputs.
Outputs are typically calculated by some sort of dot product, summing over the buckets that have not yet been processed. Once the output is generated and funds are claimed, the “cursor” is updated to point to the latest bucket, so next time we need to calculate an output, the loops and summations would start from there.