web: Can't scrape html website for details using python because of imbedded API

jeudi 18 novembre 2021

Can't scrape html website for details using python because of imbedded API

I'm again having issue with a Kucoin Page. I've decided to use the main RSS page to get the new Crypto Names and the URL's for therir listings page. I have successfully put the name and link to each into a nested list. I have a loop to iterate through them and do a requests.get(URL). I am trying to parse each page for 2 key info's. The Issue Date and Issue Price. However, it seems that these pages also have some sort of API Load.

response = "https://www.kucoin.com/rss/news?lang=en"    #Kucoin RSS feed.
feed = feedparser.parse(response)   #Put response through feedparser for readability.
Coins_To_Check = []             #Define list.
for post in feed.entries:   #For loop to iterate through all sections.
    if ") Gets Listed on KuCoin" in post.title: #Look for string in the title of feed..
        x = (post.title).index("(") + 1  # Locate '(' in the title section of post.
        y = (post.title).index(")")  # Locate ')' in the title section of post.
        coin = (post.title)[x:y]  # Get coin name from bw. x and y position.
        link = post['link'] #Get link from link section.
        RSS_Coins = [coin, link]  # Create temp list /w coin name, URL
        Coins_To_Check.append(RSS_Coins) #append current list to main list of lists.
for eachURL in Coins_To_Check:  #iterate through list of lists
    CoinURL = eachURL[1]   #assigh CoinURL with current URL.
    CoinURLresponse = requests.get(CoinURL) #get request the URL of Coin listing.

    print(CoinURLresponse)     
# Prints: <Response [200]>

    print(json.dumps(CoinURLresponse, indent=2))
# Prints:     raise TypeError(f'Object of type {o.__class__.__name__} '
         #TypeError: Object of type Response is not JSON serializable

Listings page: https://www.kucoin.com/news/categories/listing

Sample Link: https://www.kucoin.com/news/en-earthfund-1earth-gets-listed-on-kucoin

Kucoin Listing Page Example with Items I'm looking to save to a list to eventually ad to each sub list within Coins_To_Check

Example of F12 details with ever changing URL for each coin

I have tried F12 and get the Xpath but it does not show up in the html of the site. I have tried to get the API link from the Network - Header section, but this seems to be different for each coin so it cannot be static and I dont know how to get that through code.

The Xpaht seems to be the same every time :

  List Date:    //*[@id="root"]/div/div/div[3]/div/div[2]/div[1]/div/div[2]/div/div/ul/li[2]/span/text()
                
  List Price:   //*[@id="root"]/div/div/div[3]/div/div[2]/div[1]/div/div[2]/div/div/table/tbody/tr[4]/td[3]/span

I could just try brutal parsing:

    for post in feed.entries:   #For loop to iterate through all sections
    if ") Gets Listed on KuCoin" in post.title:  #Look for string in the title of feed.
        content = post['content'].pop(0).value  #Get sub.sec.:content - has actual coin/listing details( A LOT of Details).
        l = content.index("some smart location to find price like '$', sometimes this does not work thought when more than one '$' exists ")

Any help is much appreciated! I've been at this for hours 😓.

web

jeudi 18 novembre 2021

Can't scrape html website for details using python because of imbedded API

Aucun commentaire:

Enregistrer un commentaire