web: How to extract html text output as list for each input from list using python web scraping. I have written code, but gives only first entry output

mardi 27 juillet 2021

How to extract html text output as list for each input from list using python web scraping. I have written code, but gives only first entry output

I am new to python and programming. I am trying to extract pubchem ID from database called IMPAAT(https://cb.imsc.res.in/imppat/home). I have a list of chemical ids from the database for a herb, where going into each chemical ID hyperlink gives details on its pubchem ID and smiles data.

I have written a script in python to take each chemical ID as input and look for pubchem ID from the html page and print output to a text file using API web scraping method.

I am finding it difficult to get all the data as output. Pretty sure there is some error in the for loop as it prints only the first output many times, instead of the different output for each input.

Please help with this.

Also, I dont know how to save this kind of file where it prints input and corresponding output side by side. Please help.

import requests
import xmltodict
from pprint import pprint
import time
from bs4 import BeautifulSoup
import json
import pandas as pd
import os
from pathlib import Path
from tqdm.notebook import tqdm

cids = 'output.txt'

df = pd.read_csv(cids, sep='\t')
df

data = []

for line in df.iterrows():
    
out = requests.get(f'https://cb.imsc.res.in/imppat/Phytochemical-detailedpage-auth/CID%{line}')
    
    soup = BeautifulSoup(out.text, "html.parser")
    
    if soup.status_code == 200:
        script_data = soup.find('div', {'class': 'views-field views-field-Pubchem-id'}).find('span', {'class': 'field-content'}).find('h3')
    #print(script_data.text)
    
    for text in script_data:
        
        texts = script_data.get_text()
        
        print(text)
    
    data.append(text)
   
    
print(data)
    

****
input file consists of 

cids
0   3A155934
1   3A117235
2   3A12312921
3   3A12303662
4   3A225688
5   3A440966
6   3A443160 ```

web

mardi 27 juillet 2021

How to extract html text output as list for each input from list using python web scraping. I have written code, but gives only first entry output

Aucun commentaire:

Enregistrer un commentaire