Performance in R

Here is a function to compute (false positive, true positive) pair given response, ground truth, classes and threshold:

# response is e.g. predict(model,type="response") 
calc2=function(response, groundTruth, classes, threshold)
{	
	type1 = classes[1]
	type2 = classes[2]
	n=length(response)
	tpden = 0
	fpden = 0
	fp = 0
	tp = 0
	for(i in 1:n)
	{	
		predicted = ifelse(response[i]<threshold,type1,type2)
		actual = groundTruth[i]
		if (actual == type1)
		{
			fpden = fpden + 1
			if (predicted != actual)
			{
				fp = fp + 1
			}
		}
		else
		{
			tpden = tpden + 1
			if (predicted == actual)
			{
				tp = tp + 1
			}
		}		
	}
	fp = fp/fpden
	tp = tp/tpden		
	return(c(fp,tp))
}

This function is 100x slower than below which does the same thing:

calc=function(response, groundTruth, classes, threshold)
{
	type1 = classes[1]
	type2 = classes[2]
	n=length(response)
	predicted = as.factor(ifelse(response<threshold,type1,type2))
	I = which(groundTruth==type1)
	fp = length(which(predicted[I] != type1)) / length(I)
	I = which(groundTruth==type2)
	tp = length(which(predicted[I] == type2)) / length(I)
	return(c(fp,tp))
}

To benchmark, install rbenchmark package and use it like below:

> benchmark(calc(response,groundTruth,classes,0.5), calc2(response,groundTruth,classes,0.5),replications=10)
                                              test replications elapsed relative user.self sys.self user.child sys.child
1  calc(response, groundTruth, classes, 0.5)           10    0.14    1.000      0.14        0         NA        NA
2 calc2(response, groundTruth, classes, 0.5)           10   13.06   <strong>93.286</strong>     13.05        0         NA        NA

This entry was posted in Software. Bookmark the permalink.

Leave a comment