how do I replace numeric codes with value labels from a lookup table?

Go To StackoverFlow.com

9

This question is related to this question, but not quite the same.

Say I have this data frame,

df <- data.frame(
                id = c(1:6),
                profession = c(1, 5, 4, NA, 0, 5))

and a string with human readable information about the profession codes. Say,

profession.code <- c(
                     Optometrists=1, Accountants=2, Veterinarians=3, 
                     `Financial analysts`=4,  Nurses=5)

Now, I'm looking for the easiest way to replace the values in df$profession with the text found in profession.code. Preferably without use of special libraries, unless it shortens the code significantly.

I would like my end result to be

df <- data.frame(
                id = c(1:6),
                profession = c("Optometrists", "Nurses", 
                "Financial analysts", NA, 0, "Nurses"))

Any help would be greatly appreciated.

Thanks, Eric

2012-04-03 22:45
by Eric Fail


10

You can do it this way:

df <- data.frame(id = c(1:6),
                 profession = c(1, 5, 4, NA, 0, 5))

profession.code <- c(`0` = 0, Optometrists=1, Accountants=2, Veterinarians=3, 
                     `Financial analysts`=4,  Nurses=5)

df$profession.str <- names(profession.code)[match(df$profession, profession.code)]
df
#   id profession     profession.str
# 1  1          1       Optometrists
# 2  2          5             Nurses
# 3  3          4 Financial analysts
# 4  4         NA               <NA>
# 5  5          0                  0
# 6  6          5             Nurses

Note that I had to add a 0 entry in your profession.code vector to account for those zeroes.

EDIT: here is an updated solution to account for Eric's comment below that the data may contain any number of profession codes for which there are no corresponding descriptions:

match.idx <- match(df$profession, profession.code)
df$profession.str <- ifelse(is.na(match.idx),
                            df$profession,
                            names(profession.code)[match.idx])
2012-04-03 23:02
by flodel
Thank you for providing a solution. My problem is that the data gets passed to me from a database, and sometimes unexpected numbers are sent back to me, so I could get any number, not only 0. I have to account for that in some way - Eric Fail 2012-04-04 00:18
That's not a problem, I'll provide an update - flodel 2012-04-04 00:59


3

I played around with it and this is my current solution using the car package.

pLoop <- function(v) paste(profession.code[v],"='", names(profession.code[v]),"';") 
library(car)
df$profession<- recode(df$profession, paste(sapply(1:5, pLoop),collapse=""))

df
# id           profession
#  1         Optometrists 
#  2               Nurses 
#  3   Financial analysts 
#  4                 <NA>
#  5                    0
#  6               Nurses 

Still interest to if anyone have other suggestions for a solution. I would prefer to do it using only the base function in R.

2012-04-04 01:25
by Eric Fail


1

I personally like the way the arules package deals with this problem, using the decode function. From the documentation:

library(arules)
data("Adult")

## Example 1: Manual decoding
## get code
iLabels <- itemLabels(Adult)
head(iLabels)

## get undecoded list and decode in a second step
list <- LIST(Adult[1:5], decode = FALSE)
list

decode(list, itemLabels = iLabels)

Advantage is that the package also offers the functions encode and recode. Their respective purpose is straightforward, I believe.

2014-02-19 11:36
by ATN