Invoicing with SQL

Table of Contents

1 What

Spreadsheet programs like Microsoft Excel and Google Sheets are often used to perform ad-hoc data analysis and reporting, especially in a business context. In this article I explore the feasibility of using relational databases with SQL as an alternative. The use case is the apportionment of utility costs—specifically, the water bill—for residents within a multi-use dwelling. In this case, the water usage for the entire building is invoiced just to one resident who pays this bill and then seeks remittance from his neighbors. Consequently, raw billing data must be used to derive costs for individual residents in the face of fluctuating prices and usage.

2 Why

Spreadsheet programs are simple, easy, and flexible, but these virtues come with trade-offs. They provide little in the way of structuring data, beyond worksheets, and their programming model is far too limited. For instance, the fundamental unit of programmability (cell formulas) is inherently scalar in its output. While cell formulas can operate over a set of cells they typically cannot project results into a set of cells. So, while they can perform reduce operations, map operations usually require copying cell formulas. This is tedious and error-prone.

Relational databases with SQL are also flexible, and much more powerful than spreadsheets. The relational model is very good at structuring data, with extensible data types and data integrity constraints. And, the programming model is much richer, supported by the relational algebra and its set-based operations. The trade-off is less ease-of-use, or so it probably is believed.

I wanted to test this belief—and also hope to escape the nightmare of spreadsheets—by using a relational database with SQL to calculate the water bills for my neighbors. I report and reflect on this experience in the rest of this article.

3 How

I use SQLite as the relational database engine that supports this exercise. The actual exercise itself comprises SQL code embedded in this article, combined with data read in from an external file (not presented here). First, a preamble sets certain useful features with the database engine. Second, the Model is defined using database tables. Third, the View is defined using database views. The article continues with an example of the results that can be obtained, then finishes with a summary of my reflections.

3.1 Preamble

I want to benefit from relational integrity constraints, but these are not enabled by default in SQLite. The pragma command sends meta-data to the SQLite database engine for a wide variety of features, and here is used to enable foreign-key support.

PRAGMA foreign_keys = ON;

3.2 Tables

The data model (Model in Model-View-Controller parlance) is defined using database tables.

3.2.1 Product

Each water bill details charges for different individual line-items. Some, such as "water service charge" are independent of usage. Others, such as "water consumption charge" and "wastewater charge" depend on usage, and even have a tiered pricing structure. I decided to model these as individual products in a product table. SQLite identifies the id integer primary key column definition and automatically supplies data for it on insert. The description column is a human-readable label. Since product prices fluctuate based on market conditions, the price cannot be included in this table. Consequently, no other columns are needed.

create table product (
id integer primary key,
description text not null);

3.2.2 Unit

Water bills are apportioned among residents in a multi-unit building, and each is represented by a row in the unit table. The SQLite id integer primary key auto-numbered column is used again (as it is in every other table) and then a human-readable label address is included. No other columns are needed.

create table unit (
id integer primary key,
address text not null);

3.2.3 Occupancy Events

Water bill apportionment is a dynamic processes owing to fluctuations both in the unit price (for some quantities) and in the usage within each residence. There is only one physical meter, which aggregates water usage for the whole building. Therefore, the number of people within each residence, or the occupancy of each residence, serves as a proxy for water usage in an individual residence. Occupancy itself fluctuates. For instance, some owners put their homes into the short-term rental market such that multiple guests will simultaneously occupy a residence for an extended period of time. Rather than record the occupancy of a residence for every single day, I record only the changes of occupancy level in an occupancy_events table. The unit_id column is a foreign key to the unit table. The effective_time column records the real-world timestamp of the occupancy change. The occupancy column records the occupancy at that time. Note that a "delta" of occupancy is another possible choice. The event_time column records the timestamp of when the event was recorded, which can occur any time after the effective_time. It defaults to the current time. Finally, constraints demand that occupancy be positive and that table entries be unique.

create table occupancy_events (
id integer primary key,
unit_id integer,
effective_time timestamp not null,
occupancy real not null,
event_time timestamp not null default current_timestamp,
constraint positive_definite check (occupancy > 0),
constraint unique_entries unique (unit_id, effective_time),
foreign key (unit_id) references unit (id));

3.2.4 Bill

The utility company sends a bill to one resident (me) essentially every month. The bill table represents this fact, recording each bill's invoice_date and start_date. The start_date corresponds to a billing period, and a billing period ends at the next start_date of any other bill. The start_date need not be the same as the invoice_date though typically they are the same or at least are near to each other. Finally, constraints impose uniqueness and not null on the dates, along with proper ordering between the two dates.

create table bill (
id integer primary key,
invoice_date date not null,
start_date date not null,
constraint invoice_date_unique unique (invoice_date),
constraint start_date_unique unique (start_date),
constraint invoice_date_type check (date(invoice_date) is not null),
constraint start_date_type check (date(start_date) is not null),
constraint date_ordering check (date(invoice_date) >= date(start_date)));

3.2.5 Bill Detail

Each utility bill contains detailed information about usage level and charges for the products consumed in that period, and this is tracked in the bill_detail table. The bill_id column is a foreign key to the bill table, while the product_id column is a foreign key to the product table. The units and unit_price columns record the dynamic consumption and pricing figures.

create table bill_detail (
id integer primary key,
bill_id integer not null,
product_id integer not null,
units integer not null,
unit_price real not null,
foreign key (bill_id) references bill (id),
foreign key (product_id) references product (id));

This completes the definition of the core data model. All that remains is to "analyze" these data using SQL views. These are the analog of the cell formulas in the original spreadsheet, but are much more powerful.

3.3 Views

The user interface (View in Model-View-Controller parlance) is defined using database views. I build these up step-wise from smaller to larger pieces, using composition.

3.3.1 Occupancy Intervals View

First, I transform the occupancy_event data into occupancy_intervals. Intervals are very useful in subsequent calculations, but involve mildly complicated expressions with datetime functions. Therefore, I guard against tedious and error-prone repetition by factoring these calculations into a view. Note that the occupancy intervals calculated here correspond to each bill.

create view occupancy_intervals as
select a.unit_id, 
       a.occupancy, 
       a.effective_time start_time, 
       date(min(ifnull(date(b.effective_time), date())), '-1 day') end_time
from occupancy_events a
left outer join occupancy_events b on 
  b.unit_id = a.unit_id and 
  date(b.effective_time) > date(a.effective_time)
group by a.unit_id, a.effective_time;

3.3.2 Occupancy per Bill View

Second, each bill-interval is augmented with the occupancy for that interval, using the occupancy_per_bill view.

create view occupancy_per_bill as
select bill.id as bill_id, 
       unit.id as unit_id, 
       occupancy
from 
   bill, 
   unit inner join occupancy_intervals b on 
     b.unit_id = unit.id and 
     date(start_date) between date(start_time) and date(end_time);

3.3.3 Total Occupancy per Bill View

Third, now armed with the occupancy per bill per unit I calculate the total occupancy per bill. This will be useful in a subsequent calculation, for determining the relative financial obligation per resident, for each bill.

create view total_occupancy_per_bill as
select bill_id, 
       sum(occupancy) as total_occupancy
from occupancy_per_bill
group by bill_id;

3.3.4 Billing Report

Fourth, the aforementioned tables and views are assembled into a "top-level" view called billing_report, which summarizes the bills, bill details, and portions owed by each residence on a bill-by-bill basis.

create view billing_report as
select invoice_date, 
       start_date, 
       description, 
       address, 
       round(units*unit_price*occupancy/total_occupancy/100.0, 2) as portion
from bill_detail bd
  inner join bill b on bd.bill_id = b.id
  inner join occupancy_per_bill opb on opb.bill_id = b.id
  inner join total_occupancy_per_bill topb on topb.bill_id = b.id
  inner join unit u on opb.unit_id = u.id
  inner join product p on bd.product_id = p.id;

3.4 Schema

The schema that I have now built is summarized by the following unified list of tables and views.

#+RESULTS:
| bill                      occupancy_events          product                  |
| bill_detail               occupancy_intervals       total_occupancy_per_bill |
| billing_report            occupancy_per_bill        unit                     |

3.5 Data

The data—for products, units, occupancy events, bills, and bill details—are read in from a separate file, which is not shown here partly for privacy reasons.

.read data.sql

3.6 Results

Finally, "invoicing" my neighbors for their portions of the water bills is as easy as a simple ad-hoc SQL query.

select address as "Address", 
       sum(portion) as "Amount Due" 
from billing_report 
group by address;
Address Amount Due
701 1235.94
703 706.28
705 1412.39
707 1188.91