# Multivariate linear regression on Iris Dataset

We will use Gorgonia to create a linear regression model.

The goal is, to predict the species of the Iris flowers given the characteristics:

• sepal_length
• sepal_width
• petal_length
• petal_width

The species we want to predict are:

• setosa
• virginica
• versicolor

The goal of this tutorial is to use Gorgonia to find the correct values of $\Theta$ given the iris dataset, in order to write a CLI utility that would look like this:

./iris
sepal length: 5
sepal width: 3.5
petal length: 1.4
sepal length: 0.2

It is probably a setosa

### Mathematical representation

We will consider that the species of Iris if a function of its sepal length and width as well as its petal length and width.

Therefore, if we consider that $y$ is the value of the species, we the equation we would like to solve is:

$$y = \theta_0 + \theta_1 * sepal\_length + \theta_2 * sepal\_width + \theta_3 * petal\_length + \theta_4 * petal\_width$$

Let’s consider the vectors $x$ and $\Theta$ such as:

$$x = \begin{bmatrix} sepal\_length & sepal\_width & petal\_length & petal\_width & 1\end{bmatrix}$$

$$\Theta = \begin{bmatrix} \theta_4 \theta_3 \theta_2 \theta_1 \theta_0 \end{bmatrix}$$

We have

$$y = x\cdot\Theta$$

### Linear regression

To find the correct values, we will use a linear regression. We will encode the data (the true facts from observation of different flowers) into a matrix $X$ containing 5 columns (sepal length, sepal width, petal length, petal width and 1 for the bias). A row of the matrix represent a flower.

The we will encode the corresponding species into a column vector $Y$ with float values.

• setosa = 1.0
• virginica = 2.0
• versicolor = 3.0

In the learning phase, the cost is expressed like this:

$cost = \dfrac{1}{m} \sum_{i=1}^m(X^{(i)}\cdot\Theta-Y^{(i)})^2$

We will use the gradient descent to lower the cost and get the accurate values for $\Theta$

## Generate the training set with gota (dataframe)

First, let’s generate the training data. We use a dataframe to do this smoothly.

func getXYMat() (*mat.Dense, *mat.Dense) {
f, err := os.Open("iris.csv")
if err != nil {
log.Fatal(err)
}
defer f.Close()
xDF := df.Drop("species")

toValue := func(s series.Series) series.Series {
records := s.Records()
floats := make([]float64, len(records))
for i, r := range records {
switch r {
case "setosa":
floats[i] = 1
case "virginica":
floats[i] = 2
case "versicolor":
floats[i] = 3
default:
log.Fatalf("unknown iris: %v\n", r)
}
}
return series.Floats(floats)
}

yDF := df.Select("species").Capply(toValue)
numRows, _ := xDF.Dims()
xDF = xDF.Mutate(series.New(one(numRows), series.Float, "bias"))
fmt.Println(xDF.Describe())
fmt.Println(yDF.Describe())

return mat.DenseCopyOf(&matrix{xDF}), mat.DenseCopyOf(&matrix{yDF})
}

This returns two matrices we can use in Gorgonia.

### Create the expression graph

The equation $X\cdot\Theta$ is represented as an ExprGraph:

func getXY() (*tensor.Dense, *tensor.Dense) {
x, y := getXYMat()

xT := tensor.FromMat64(x)
yT := tensor.FromMat64(y)
// Get rid of the last dimension to create a vector
s := yT.Shape()
yT.Reshape(s)
return xT, yT
}

func main() {
xT, yT := getXY()
g := gorgonia.NewGraph()
x := gorgonia.NodeFromAny(g, xT, gorgonia.WithName("x"))
y := gorgonia.NodeFromAny(g, yT, gorgonia.WithName("y"))
theta := gorgonia.NewVector(
g,
gorgonia.Float64,
gorgonia.WithName("theta"),
gorgonia.WithShape(xT.Shape()),
gorgonia.WithInit(gorgonia.Uniform(0, 1)))

pred := must(gorgonia.Mul(x, theta))
// Saving the value for later use
var predicted gorgonia.Value
gorgonia.Read(pred, &predicted)

We will use Gorgonia’s Symbolic differentiation capability.

First, we will create the cost function, and use a solver to perform a gradient descent to lower the cost.

### Create the node holding the cost

We complete the exprgraph by adding the cost ($cost = \dfrac{1}{m} \sum_{i=1}^m(X^{(i)}\cdot\Theta-Y^{(i)})^2$)

squaredError := must(gorgonia.Square(must(gorgonia.Sub(pred, y))))
cost := must(gorgonia.Mean(squaredError))

We want to lower this cost, so we evaluate the gradient wrt to $\Theta$:

if _, err := gorgonia.Grad(cost, theta); err != nil {
log.Fatalf("Failed to backpropagate: %v", err)
}

We are using the mechanism of the gradient descent. This means that we use the gradient to modulate the parameters $\Theta$ step by step.

The basic gradient descent is implemented by Vanilla Solver of Gorgonia. We set the learning rate $\gamma$ to be 0.001.

solver := gorgonia.NewVanillaSolver(gorgonia.WithLearnRate(0.001))

And at each step, we will ask the solver to update the $\Theta$ parameters thanks to its gradient. Therefore, we set an update variable that we will pass to the solver at each iteration

In our case, we want to optimize $\Theta$ and ask the solver will update its value like this:

$\Theta^{(k+1)}=\Theta^{(k)}-\gamma \nabla f\left(\Theta^{(k)}\right)$

To do so, we need to pass $\Theta$ to the Step method of the Solver:

model := []gorgonia.ValueGrad{theta}
// ...
if err = solver.Step(model); err != nil {
log.Fatal(err)
}

#### The learning iterations

Now that we have the principle, we need to run the computation with a vm several times so the gradient descent’s magic can happen.

Let’s create a vm to execute the graph (and do the gradient computation):

machine := gorgonia.NewTapeMachine(g, gorgonia.BindDualValues(theta))
defer machine.Close()

Now let’s create the loop and execute the graph at each step; the machine will learn!

iter := 1000000
var err error
for i := 0; i < iter; i++ {
if err = machine.RunAll(); err != nil {
fmt.Printf("Error during iteration: %v: %v\n", i, err)
break
}

if err = solver.Step(model); err != nil {
log.Fatal(err)
}
machine.Reset() // Reset is necessary in a loop like this
}

#### Getting some infos

We can dump some info about the learning process by using this call

fmt.Printf("theta: %2.2f  Iter: %v Cost: %2.3f Accuracy: %2.2f \r",
theta.Value(),
i,
cost.Value(),
accuracy(predicted.Data().([]float64), y.Value().Data().([]float64)))

with accuracy defined like this:

func accuracy(prediction, y []float64) float64 {
var ok float64
for i := 0; i < len(prediction); i++ {
if math.Round(prediction[i]-y[i]) == 0 {
ok += 1.0
}
}
return ok / float64(len(y))
}

This will display a line like this during the learning process:

theta: [ 0.26  -0.41   0.44  -0.62   0.83]  Iter: 26075 Cost: 0.339 Accuracy: 0.61

### Save the weights

Once the training is done, we save the values of $\Theta$ to be able to do some predictions:

func save(value gorgonia.Value) error {
f, err := os.Create("theta.bin")
if err != nil {
return err
}
defer f.Close()
enc := gob.NewEncoder(f)
err = enc.Encode(value)
if err != nil {
return err
}
return nil
}

## Create a simple CLI for predictions

First, let’s load the parameters from the training phase:

func main() {
f, err := os.Open("theta.bin")
if err != nil {
log.Fatal(err)
}
defer f.Close()
dec := gob.NewDecoder(f)
var thetaT *tensor.Dense
err = dec.Decode(&thetaT)
if err != nil {
log.Fatal(err)
}

Then, let’s create the model (the exprgraph) like we did before:

g := gorgonia.NewGraph()
theta := gorgonia.NodeFromAny(g, thetaT, gorgonia.WithName("theta"))
values := make([]float64, 5)
xT := tensor.New(tensor.WithBacking(values))
x := gorgonia.NodeFromAny(g, xT, gorgonia.WithName("x"))
y, err := gorgonia.Mul(x, theta)

Then enter a for loop that will get info from stdin, do the computation and display the result:

machine := gorgonia.NewTapeMachine(g)
values = 1.0
for {
values = getInput("sepal length")
values = getInput("sepal width")
values = getInput("petal length")
values = getInput("petal width")

if err = machine.RunAll(); err != nil {
log.Fatal(err)
}
switch math.Round(y.Value().Data().(float64)) {
case 1:
fmt.Println("It is probably a setosa")
case 2:
fmt.Println("It is probably a virginica")
case 3:
fmt.Println("It is probably a versicolor")
default:
fmt.Println("unknown iris")
}
machine.Reset()
}

This is a helper function to get the input:

func getInput(s string) float64 {
fmt.Printf("%v: ", s)
text = strings.Replace(text, "\n", "", -1)

input, err := strconv.ParseFloat(text, 64)
if err != nil {
log.Fatal(err)
}
return input
}

Now we can go build or go run the code, and voilà! We have a fully autonomous CLI that can predict the iris species regarding its features:

\$ go run main.go
sepal length: 4.4
sepal widt: 2.9
petal length: 1.4
petal width: 0.2
It is probably a setosa
sepal length: 5.9
sepal widt: 3.0
petal length: 5.1
petal width: 1.8
It is probably a virginica

# Conclusion

This is a step by step example. You can now play with the initialization values of theta, or change to solver to see how thing goes within Gorgonia.

The full code can be found in the example of the Gorgonia project.

### Bonus: visual representation

It is possible to visualize the dataset using the Gonum plotter libraries. Here is a simple example on how to achieve it: import (
"gonum.org/v1/plot"
"gonum.org/v1/plot/plotter"
"gonum.org/v1/plot/plotutil"
"gonum.org/v1/plot/vg"
"gonum.org/v1/plot/vg/draw"
)

func plotData(x []float64, a []float64) []byte {
p, err := plot.New()
if err != nil {
log.Fatal(err)
}

p.Title.Text = "sepal length & width"
p.X.Label.Text = "length"
p.Y.Label.Text = "width"

l := len(x) / len(a)
for k := 1; k <= 3; k++ {
data0 := make(plotter.XYs, 0)
for i := 0; i < len(a); i++ {
if k != int(a[i]) {
continue
}
x1 := x[i*l+0] // sepal_length
y1 := x[i*l+1] // sepal_width
data0 = append(data0, plotter.XY{X: x1, Y: y1})
}
data, err := plotter.NewScatter(data0)
if err != nil {
log.Fatal(err)
}
data.GlyphStyle.Color = plotutil.Color(k - 1)
data.Shape = &draw.PyramidGlyph{}
}