I am new to python and programming. I am trying to extract pubchem ID from database called IMPAAT(https://cb.imsc.res.in/imppat/home). I have a list of chemical ids from the database for a herb, where going into each chemical ID hyperlink gives details on its pubchem ID and smiles data.
I have written a script in python to take each chemical ID as input and look for pubchem ID from the html page and print output to a text file using API web scraping method.
I am finding it difficult to get all the data as output. Pretty sure there is some error in the for loop as it prints only the first output many times, instead of the different output for each input.
Please help with this.
Also, I dont know how to save this kind of file where it prints input and corresponding output side by side. Please help.
import requests
import xmltodict
from pprint import pprint
import time
from bs4 import BeautifulSoup
import json
import pandas as pd
import os
from pathlib import Path
from tqdm.notebook import tqdm
cids = 'output.txt'
df = pd.read_csv(cids, sep='\t')
df
data = []
for line in df.iterrows():
out = requests.get(f'https://cb.imsc.res.in/imppat/Phytochemical-detailedpage-auth/CID%{line}')
soup = BeautifulSoup(out.text, "html.parser")
if soup.status_code == 200:
script_data = soup.find('div', {'class': 'views-field views-field-Pubchem-id'}).find('span', {'class': 'field-content'}).find('h3')
#print(script_data.text)
for text in script_data:
texts = script_data.get_text()
print(text)
data.append(text)
print(data)
****
input file consists of
cids
0 3A155934
1 3A117235
2 3A12312921
3 3A12303662
4 3A225688
5 3A440966
6 3A443160 ```
Aucun commentaire:
Enregistrer un commentaire