Importing Edgelists into RSiena

1 minute read

Published:

So, importing data into RSiena is a bit of a pain. The GUI has some support for importing Pajek files, for example, but I've been working mostly from the command line, and with .R files, which are what the manual covers.

For my current project, I have CSV files in a very common edgelist format, something like -

sourceID,receiverID,weight,wave 

I think it should be simple to import these into RSiena, but it isn't.

RSiena accepts either adjacency matrices - which are matrices with a 0 or 1 in each spot, for each node - or sparse matrices. These are similar to edgelists, but they have to be in the dgTMatrix class. As you can tell by reading the documentation, it's not exactly obvious how to get the data into that format.

I started by trying the Matrix() function, then I found the sparseMatrix() function. I realized that weight didn't matter, so I simply ignored the weight column. This creates a sparse matrix of the type "ngCMatrix", which is a "pattern matrix", and can't be coerced to a dgTMatrix.

So, eventually, I ended up creating a new weight column, with everything set to 1, and reset to 1 if there are duplicate entries in the data.

My current code is below:

 edgeListToAdj <- function(x, waveID){   
# Remove entries who are not connect to anyone (NomineeID == 0), and not the
# current wave
tempNet <- x[x$NomineeID > 0 & x$NomineeID <= nodeCount & x$Wave == waveID,]
# Create a binary column for weights (since RSiena doesn't use weights).
tempNet$Weight <- 1
# Convert network obejct to adjacency matrix
adjacencyMat <- sparseMatrix(tempNet$NomineeID, tempNet$RespondentID, x=tempNet$Weight, dims=c(nodeCount,nodeCount))
# If any items appear more than once, re-binarize them.
# Yes, binarize is a real word.
adjacencyMat[adjacencyMat > 1] <- 1
# Convert to a dgTMatrix, since this is what RSiena expects
return(as(adjacencyMat, "dgTMatrix"))
}
createNetwork <- function(fileName, numWaves) {
print(fileName)
# Convert CSV file to data frame
netDF <- as.data.frame(read.csv(fileName))
# Create an array of adjacency networks
net <- lapply(1:numWaves, function(x) edgeListToAdj(netDF, x))
# Change this into an RSiena network
RSienaObj <- sienaDependent(net)
return(RSienaObj)
}