Home > Latest   +   Water

MACHINE LEARNING IN MINERAL WELLS

MACHINE LEARNING IN MINERAL WELLS

City Uses Novel Techniques to Determine Stormwater Utility Rates

By Tak Makino

West of the Dallas-Fort Worth metroplex is a small city of approximately 15,000 people called Mineral Wells. Over the last decade, the city has been battered by repetitive flooding events that have strained its existing stormwater infrastructure. Facing a budget shortfall and an urgent need to upgrade its stormwater infrastructure, the City contracted Lockwood, Andrews & Newnam, Inc. (LAN), a national planning, engineering, and program management firm, and NewGen Solutions, a management consulting firm, to perform a stormwater utility fee study. Based on the study, the city wanted to set new storm water utility rates that would subsequently be used to expand its stormwater infrastructure.

Stormwater Utility Fee Components

Three major components go into a stormwater utility fee study:

  1. Utility billing
  2. Parcel data
  3. Impervious surface coverage

Utility billing data provides a convenient existing framework upon which to build the stormwater utility fee. The stormwater fee can be added as an additional charge item to existing utility customers. This avoids the creation of a new billing system for the stormwater utility fee and allows both customers and the city to use a billing system with which they are familiar.

Comparison between full color imagery (left) and color infrared (right). Infrared data provides a fourth variable for analysis. Photo: LAN

The second essential component is the parcel data. For the Mineral Wells study, LAN obtained the parcel data from the Palo Pinto and Parker County Appraisal Districts. The parcel GIS files define the geographic zone of responsibility for each utility billing customer. Utility billing customers are responsible for paying for the runoff their property contributes to the stormwater utility system. In the case of multiple utility billing customers on a single parcel (e.g. apartments or duplexes), the total parcel impervious surface is divided by the number of utility billing customers such that each billing customer pays for an equitable share of the runoff that enters the stormwater system.

The last, and perhaps the most critical piece, is the impervious surface coverage. Aerial imagery from the National Agricultural Imagery Program (NAIP) formed the basis of the impervious surface analysis. NAIP imagery provides four-band (red, green, blue, and near-infrared) aerial imagery at 0.5m resolution, meaning that each pixel in the aerial imagery represents a 0.5m x 0.5m area on the ground. Within each pixel, four values are stored – a value each for red, green, blue, and near infrared wavelengths (see figure 1). Luminosity on the red, green, and blue wavelengths produce a true-color, composite image, much like how the eye sees. The near infrared band provides a fourth dimension that allows for the discrimination of otherwise spectrally similar land cover classes. For example, aerial imagery of trees and grass – both green – can appear similar when examining a red, green, blue composite image. Using a statistical, value-based method of analysis (rather than by visual observation), it would be difficult, if not impossible, to discriminate between trees and grass from three-band aerial imagery alone. However, trees and grass reflect the infrared wavelengths differently, allowing for a statistically meaningful, value-based separation of trees and grass.   

These three components – the utility billing, land parcels, and aerial imagery – need to be combined into a single database, one that reports the amount of impervious surface for which each utility billing customer is responsible. A combination of methods was used to achieve this goal. The first step taken was to identify areas of impervious surface coverage using machine learning algorithms.

Machine Learning

Supervised machine learning is a technique in which an operator provides a training dataset to the computer, in this case user-defined areas of different types of land cover, and the algorithms then learn what characteristics define the classes in the training dataset. For example, numerous examples of paved surfaces are provided, along with the declaration that the provided examples are pavement, to the extent that the computer is then able to correctly identify a never-before-seen area of pavement as pavement.  In other words, given enough examples, the computer learns what characteristics define pavement.

For the Mineral Wells study, a supervised machine learning algorithm was used to create an initial, first-pass impervious surface dataset. Areas representative of land cover classes were manually defined from aerial imagery. Seven different land cover classes were used in this first pass. All seven land cover classes were eventually reclassified as either pervious or impervious surface coverage.

The resultant image exhibited unresolved areas where the algorithm was unable to definitively discriminate between land cover classes. The classified dataset also exhibited notable areas of false positives on impervious surface coverage. This initial, first-pass impervious surface dataset provided a near-adequate level of resolution and was temporarily set aside while an improved dataset was developed.

To address both unresolved areas and areas of false positives, three additional machine learning training datasets were made to better discriminate between spectrally similar land cover classes. Each training set consisted of two of the initial seven land cover classes and a third class comprised of all remaining land cover classes to provide a contrasting backdrop against the classes of interest.

By limiting the number of classes of interest in each training dataset (two rather than seven) the algorithms that create the statistical descriptions of land cover classes are able to do so in relative isolation, without the interfering influence of other land cover classes. For example, it can be difficult to discriminate between dirt and pavement. Both land cover classes can appear grey or brownish to the eye and have relatively high infrared reflectivity. In the full, seven-class training set, the machine learning algorithm must successfully discriminate between dirt and pavement while simultaneously discriminating between five other land cover classes. By reducing the number of classes, the algorithm is better able to discriminate dirt from pavement because there is no need to separately identify other land cover classes.

The resultant classifications from these three training datasets were combined into a single composite dataset. The composite dataset was reclassified to reflect either pervious or impervious surface coverage. Compared to the initial, first-pass classification dataset, the composite classification reduces the number of false positives on impervious surface at the expense of additional areas of uncertainty.

The composite classification, which reflected ground conditions more accurately, formed the basis of the ultimate impervious surface dataset. Values from the initial, first-pass dataset were used to fill gaps in areas of uncertainty, producing a product that capitalized on the best parts of both products. Smoothing algorithms and building footprints were applied to the impervious dataset to further refine the product.

Data Integration

Access to County Appraisal District parcel data allowed the tabulation of impervious surface data at the parcel level. This data helped determine how much impervious surface each parcel contained.

Integrating parcel-level impervious surface dataset into the existing utility billing scheme required careful execution of automated processes. The two datasets were large enough that manual integration was unpractical. Consequently, automated integration methods were developed.

The addresses were the only common identifier between the two data sets. Unfortunately, there wasn’t a perfect match between utility billing addresses and parcel addresses. For example, 123 N Example Street and 123 N Example St are not perfect matches. While most human operators would probably recognize that these two addresses refer to the same property, a computer looking for a perfect, one-for-one match would reject the pairing.

An address similarity tool was created to address and resolve this conflict. In both the billing and parcel datasets, addresses were separated into their constituent components. An address standardizing tool was used to ensure that any direction or street suffix conflicts were resolved.

By standardizing addresses and breaking them out into their components, a constituent level comparison could be made. In this comparison, every utility billing account address was compared to every parcel address and returned a similarity score from 0 (no similarity) to 1 (perfect match). In this comparison, no tolerance for deviation was allowed in the street number or street direction. While similar to 123 N Example St, 125 N Example St is a different address and does not represent a matching error. Similarly, 123 N Example St and 123 S Example St also do not represent a matching error and may be quite geographically distant from each other. Tolerance for errors on street name and street suffix was allowed. For example, the pair of 123 N Example St and 123 N Example would return high similarity scores. 123 N Example St and 123 N Exmple St would also return high similarity scores.

The address similarity tool ran comparisons between every utility billing account address and every parcel address. This raw comparison produced roughly 70 million address match combinations. It was assumed that the highest scoring match for each utility billing address represented the correct parcel match. An arbitrary similarity score threshold of 0.75 was set to separate correctly matched (≥0.75) and incorrectly matched (<0.75) pairs. The 0.75 threshold represents a conservative cutoff point for matching acceptance. A match of 123 N Example St and 123 N Exmple St would return a score of 0.916 and a match of 123 N Example St and 123 N Example would return a score of 0.75.  Incorrectly matched utility billing accounts were matched manually through a variety of methods including examining commercial Facebook and Yelp accounts, driving Google Street View, and other online records.

The address similarity tool was validated with the use of a third-party geospatial analysis tool. The results of the validation indicated an error rate of approximately 0.06 percent. In total, three address matches above the 0.75 threshold were made in error and were corrected manually. 

Fee Calculation

The three additional machine learning datasets used to improve classification quality. Photo: LAN

The Equivalent Residential Unit (ERU) is a value that represents the amount of impervious surface found on the hypothetical typical single-family residential home in a community. The U.S. EPA estimates that 80 percent of stormwater utility fees are based on the ERU concept. Zoning information allowed for the identification of single-family residential parcels at Mineral Wells. When examined, the average typical single-family parcel (excluding undeveloped areas zoned as single-family residential) exhibited approximately 2,600 square feet of impervious surface. Thus, the ERU for the City of Mineral Wells was determined to be 2,600 square feet of impervious surface.

The composite dataset created by combining the three machine learning datasets. Each color represents a unique combination of results from the three input datasets. Photo: LAN

NewGen Solutions took the impervious surface data developed by LAN and financial data from the city and determined that a per-ERU fee of $3.89 would fully fund the stormwater utility over the next five years. If every utility billing customer pays $3.89 per ERU on their property, the city will be able to recover the necessary funds to repair, maintain, expand, and ensure that high-quality stormwater services are provided to the residents. A $3.89 per ERU fee is within the range of $2.50 – $6.50 fees seen in similar communities in Texas. Under this proposed billing scheme, all single-family residential utility billing customers would pay $3.89 to the stormwater utility, regardless of actual impervious surface coverage. This simplification reduces administrative burdens on the City and is common practice in stormwater utilities. All non-single-family utility billing customers would pay on a true ERU-based schedule at a rate of $3.89 per ERU.

The composite dataset created by combining the three machine learning datasets. Each color represents a unique combination of results from the three input datasets. Photo: LAN

The City currently bills a $2.50 flat rate stormwater fee for all utility billing customers, regardless of impervious surface or property type. A jump from a flat rate of $2.50 to an ERU-based billing schedule at $3.89 per ERU will increase the amount that most utility customers would pay. A few commercial properties with relatively little impervious surface would see a reduction below the $2.50 per month rate. To ease the transition to an ERU-based system, it is possible that the City may begin with a $2.50 per ERU rate. As all single-family residential customers are billed at 1 ERU, single family residential customers would see no change to their billing rates. This introductory rate may eventually increase over time to the $3.89 per ERU rate. By using a publicly announced rate schedule that slowly trends towards the fully funded $3.89 per ERU rate, the City of Mineral Wells can allow utility billing customers to plan and budget in advance of any rate changes. Ultimately, the decisions of City Council will determine the rate setting method.


Tak M. Makino, CFM, is a Flood Mitigation Manager at Lockwood, Andrews & Newnam, Inc. (LAN), a national planning, engineering and program management firm. He can be reached at tmmakino@lan-inc.com.