User Tools

Site Tools


rame.r

RAME.R

Yet another RAME application. rame.r provides several regression tools. Based on the RAME optimization, rame.r can deal with highly flexible regression formulas that are difficult to solve with conventional techniques. For example, it supports cubic regression and gaussian Regression.

If you are interested in the background of rame.r, the Concept section at the bottom of this page could be helpful. Or just follow the Quick Start section which covers most operations of rame.r. Other materials such as Usage are available in this page, too.

Quick Start

You might need to read the Concept section for some symbols of this document such as T(), M(), and si. In rame.r, a dataset is represented as one text file. This is a 4-dimensional dataset.

1 1:-0.666667 2:-0.0833334 3:-0.830508 4:-1
1 1:-0.388889 2:0.416667 3:-0.830508 4:-0.916667
2 1:0.0555554 2:-0.25 3:0.118644 4:-4.03573e-08
2 1:-0.555556 2:-0.583333 3:-0.322034 4:-0.166667

In this file, one row represent an instance si composed of the leading function value f(si) and a list of index:value pairs.

rame.r now provides two types of transformation function. The first one is:

In this transformation function, users should specify α and β.

Concept

Suppose there is a dataset { s1, s2, …, sn } of with each instance has a function value { f(s1), f(s2), …, f(sn) }. You might regard each sample si as a d-dimensional vector <f1, f2, …, fd>. In general, Multiple Linear Regression transforms a d-dimensional dataset into an 1-dimensional dataset to fit the corresponding function values. As the following figure shown, there are 4 samples { s1, s2, s3, s4 } on a 2-dimensional plane.

We can use a linear transformation function T() to transform these points, that is, T(si) = T(<f1, f2>) = w0 + w1f1 + w2f2. The goal of most regression tools is to determine { w0, w1, w2 } for maximizing the correlation between { f(s1), f(s2), f(s3), f(s4) } and { T(s1), T(s2), T(s3), T(s4) }. We can use a measure function M() to see how fit are {f(si)} and {T(si)}. One typical measure function is Root Mean Square Deviation as shown in the next figure.

So we got an optimization problem: to determine variables in the transformation function T() for optimizing the measure function M(). Conventional techniques assume that T() and M() have good properties (ex. differentiable). These assumptions make the optimization process easier, faster, and (probably) deterministic. However, these assumptions also imply limitations on T() and M(). That's why we introduce rame.r which could support any T() and M(), i.e. have no limitations!

For example, all transformation functions in rame.r could transform a d-dimensional point to a d'-dimensional one where d' is given by users. Therefore, T(si) is a d'-dimensional vector and some measure functions in rame.r could estimate the correlation between one scalar f(si and one vector T(si). In other words, the measure functions in rame.r could be as complicated as another regression tool.

rame.r regards regression as a general optimization problem and uses RAME as its core engine. rame.r is especially suitable when either T() or M() has to be very complicated. If you have any chance to make both T() and M() become good (ex. differentiable), please don't use rame.r since it is non-deterministic.

rame.r.txt · Last modified: 2011/01/28 12:15 by dirty