This howto explains how to create a tensor from a dataframe using gota
The goal is to read a csv file and create a *tensor.Dense
with shape (2,2).
Consider a csv file with the following content:
sepal_length,sepal_width,petal_length,petal_width,species
5.1 ,3.5 ,1.4 ,0.2 ,setosa
4.9 ,3.0 ,1.4 ,0.2 ,setosa
4.7 ,3.2 ,1.3 ,0.2 ,setosa
4.6 ,3.1 ,1.5 ,0.2 ,setosa
5.0 ,3.6 ,1.4 ,0.2 ,setosa
...
This is extract from the Iris flower data set. A copy of the dataset can be found here
We want to create a tensor with all values but the species.
gota’s dataframe package has a function ReadCSV
that takes an io.Reader as argument.
f, err := os.Open("iris.csv")
if err != nil {
log.Fatal(err)
}
defer f.Close()
df := dataframe.ReadCSV(f)
df
is a DataFrame
that contains all the data present in the file.
gota uses the first line of the CSV to reference the columns in the dataframe
Let’s remove the species column:
xDF := df.Drop("species")
To make things easier, we will convert our dataframe into a Matrix
as defined by gonum (see the matrix godoc).
matrix
is an interface. gota’s dataframe does not fulfill the Matrix
interface. As described into gota’s documentation,
we create a wrapper around DataFrame to fulfil the Matrix
interface.
type matrix struct {
dataframe.DataFrame
}
func (m matrix) At(i, j int) float64 {
return m.Elem(i, j).Float()
}
func (m matrix) T() mat.Matrix {
return mat.Transpose{Matrix: m}
}
Now we can create a *Dense
tensor thanks to the function tensor.FromMat64
by wrapping the dataframe into the matrix
structure.
xT := tensor.FromMat64(mat.DenseCopyOf(&matrix{xDF}))