Exponential Approximation with Least Squares in Python

Recently I have occasionally needed to do simple data analysis at work. The task itself is straightforward: fit a curve to measured data and obtain an approximate formula.

You can do that in Excel by plotting the data and asking it to add a trendline formula, but creating a graph every time is tedious. Once you start thinking about evaluating the fitted formula automatically, it becomes hard not to want to script the whole process.

When that happens, I usually write a Python script.

This time I wrote a script that uses least squares in SciPy to obtain an approximate formula, but I ran into a problem: the numbers were slightly different from the equation produced by Excel. I got stuck on that for about half a day, so I am leaving a note here.

The conclusion is simple: the problem was applying least squares directly to the exponential function.

Suppose the function we want to fit is y = a * exp(b * x). Then the problem is to find (a, b) from measured data (**x**, **y**) so that the sum of squared errors of y = a * exp(b * x) is minimized.

You can solve that problem directly, but Excel seems to take the logarithm of both sides and then fit the transformed equation below instead.

log(y) = log( a * exp( b * x ) )
log(y) = ( log(a) + b * x )

When I compared fitting the original form and the transformed form in Python against Excel, the log-transformed version matched the equation from Excel.

Here is the Python code I used.

#! /usr/bin/env python
# -*- coding: utf-8 -*-

import numpy
import scipy.optimize
import matplotlib.pyplot as plt

def fit_exp(parameter, x, y):
    a = parameter[0]
    b = parameter[1]
    residual = y - (a * numpy.exp(b * x))
    return residual

def fit_exp_linear(parameter, x, y):
    a = parameter[0]
    b = parameter[1]
    residual = numpy.log(y) - (numpy.log(a) + b * x)
    return residual

def exp_string(a, b):
    return "$y = %0.4f e^{ %0.4f x}$" % (a, b)

if __name__ == "__main__":

    x = numpy.array([6.559112404564946264e-01,
                     6.013845740818111185e-01,
                     4.449591514877473397e-01,
                     3.557250387126167923e-01,
                     3.798882550532960423e-01,
                     3.206955701106445344e-01,
                     2.600880460776140990e-01,
                     2.245379618606005157e-01])

    y = numpy.array([1.397354195522357567e-01,
                     1.001406990711011247e-01,
                     5.173231204524778720e-02,
                     3.445520251689743879e-02,
                     3.801366557283047953e-02,
                     2.856782588754304408e-02,
                     2.036328213585812327e-02,
                     1.566228252276009869e-02])

    parameter0 = [1, 1]

    r1 = scipy.optimize.leastsq(fit_exp, parameter0, args=(x, y))
    r2 = scipy.optimize.leastsq(fit_exp_linear, parameter0, args=(x, y))

    model_func = lambda a, b, x: a * numpy.exp(b * x)

    fig = plt.figure()

    ax1 = fig.add_subplot(2, 1, 1)
    ax2 = fig.add_subplot(2, 1, 2)

    ax1.plot(x, y, "ro")
    ax2.plot(x, y, "ro")

    xx = numpy.arange(0.7, 0.2, -0.01)

    ax1.plot(xx, model_func(r1[0][0], r1[0][1], xx))
    ax2.plot(xx, model_func(r2[0][0], r2[0][1], xx))

    ax1.legend(("Sample Data", "Fitted Function:\n" + exp_string(r1[0][0], r1[0][1])),
                "upper left")
    ax2.legend(("Sample Data", "Fitted Function:\n" + exp_string(r2[0][0], r2[0][1])),
                "upper left")

    ax1.set_title("Non-linear fit")
    ax2.set_title("Linear fit")

    ax1.grid(True)
    ax2.grid(True)

    plt.show()

It had also been a while since I last had to think about logarithms, and I realized I had forgotten more of the math than I expected. I should probably study mathematics again.

Reference:

http://stackoverflow.com/questions/3938042/fitting-exponential-decay-with-no-initial-guessing