I parsed data from a website and it looks like, goes for 10,000 observations
[1] "REMARK 700 SHEET
[2] "SHEET 1 A 6 ALA A 39 LEU A 42 0 "
[3] "SHEET 2 A 6 GLU A 57 ASP A 61 1 O GLN A 59 N LEU A 42 "
[4] "SHEET 3 A 6 ARG A 72 VAL A 75 1 O THR A 74 N ILE A 60 "
[5] "SHEET 4 A 6 ALA A 89 VAL A 92 1 O GLU A 91 N VAL A 75
[6] "SHEET 5 A 6 GLN A 104 LEU A 107 1 O ARG A 106 N VAL A 92
The code which I have used for this is
library(XML)
library(RCurl)
protein_webpage <- "http://ift.tt/1kDNVOL"
webpage <- function(id){
web_combined <- paste(protein_webpage,id,sep = '',collapse = '/' )
#print(web_combined)
protein.data <- getURL(web_combined)
protein.tc <- textConnection(protein.data)
protien_readlines <- readLines(protein.tc)
#print(protien_readlines)
}
webpage("2N3D.pdb")
The class of the data is "CHARACTER"
Problem
I am not able to manipulate with this data set, I would like to convert this into a data frame which only contains the amino acids?
Amino acids are represented by those 3 letter words (ALA,GLN,LEU etc)
I've tried converting them into vectors and into a matrix data but I am still unsuccessful. Thanks in advance
Aucun commentaire:
Enregistrer un commentaire