dimanche 20 décembre 2015

Data manipulation in R parsed from web?

I parsed data from a website and it looks like, goes for 10,000 observations

[1] "REMARK 700 SHEET                                                                
[2] "SHEET    1   A 6 ALA A  39  LEU A  42  0                                        " 
[3] "SHEET    2   A 6 GLU A  57  ASP A  61  1  O  GLN A  59   N  LEU A  42           "
[4] "SHEET    3   A 6 ARG A  72  VAL A  75  1  O  THR A  74   N  ILE A  60           "
[5] "SHEET    4   A 6 ALA A  89  VAL A  92  1  O  GLU A  91   N  VAL A  75           
[6] "SHEET    5   A 6 GLN A 104  LEU A 107  1  O  ARG A 106   N  VAL A  92            

The code which I have used for this is

library(XML)
library(RCurl)
protein_webpage <- "http://ift.tt/1kDNVOL"  
webpage <- function(id){
web_combined <- paste(protein_webpage,id,sep = '',collapse = '/' )
#print(web_combined)
protein.data <- getURL(web_combined)
protein.tc <- textConnection(protein.data)
protien_readlines <- readLines(protein.tc)
#print(protien_readlines)
 } 
webpage("2N3D.pdb")

The class of the data is "CHARACTER"

Problem

I am not able to manipulate with this data set, I would like to convert this into a data frame which only contains the amino acids?

Amino acids are represented by those 3 letter words (ALA,GLN,LEU etc)

I've tried converting them into vectors and into a matrix data but I am still unsuccessful. Thanks in advance




Aucun commentaire:

Enregistrer un commentaire