mardi 26 septembre 2017

Python web scraping on div class="ng-scope"

I am new into python and I try to get some song names from my favorite radio station website but whatever I do, I can not get into div ui-view="main.header" class="ng-scope" to get de songs names.

With my code i can read from txt just the first level of divs but not deeper:

<div id="audio-player" style="width: 0px; height: 0px"></div>
<div id="fb-root"></div>
<div ui-view="main.header"></div>
<div ui-view="main.content"></div>
<div ui-view="main.footer"></div>

The song list has a refresh rate of 10s, is that area blocked for scraping because of that? I have tried also with div1 = soup.findAll(div), with no succes.

You can see the full web site code at www.rockfm.ro

Code for parsing:

<head></head>
<body ng-class="bodyClass">
    <script src="https://www.youtube.com/iframe_api" data-remove="false"></script>
    <script src="http://ift.tt/om8mte" data-remove="false"></script>
    <script src="http://ift.tt/2hugToZ" data-remove="false"></script>
    <script data-remove="false">
    <script data-remove="false">
    <div id="audio-player" style="width: 0px; height: 0px">
    <div id="fb-root" class=" fb_reset">
    <!-- uiView: main.header -->
    <div ui-view="main.header" class="ng-scope">
        <div id="topnav" ng-controller="HeaderCtrl" class="ng-scope"><
            <div class="container top-stripe">
                </div>
            <div class="container menu-expand" ng-class="{'show-expand':isMenuOpen}">
                <div class="col-md-3">
                <div class="col-md-6">
                <div class="col-md-3 menu-expand-latest-tracks">
                    <div class="latest-tracks ng-isolate-scope" track-list="trackList.lista">
                        <h4>Ultimele 10 piese</h4>
                            <ul>
                                <!-- ngRepeat: track in trackList.lista -->
                                <li ng-repeat="track in trackList.lista" class="ng-binding ng-scope">Steve Stevens - Top Gun Anthem</li>
                                <!-- end ngRepeat: track in trackList.lista -->
                                <li ng-repeat="track in trackList.lista" class="ng-binding ng-scope">Boston - More Than A Feeling</li>
                                <!-- end ngRepeat: track in trackList.lista -->
                                <li ng-repeat="track in trackList.lista" class="ng-binding ng-scope">Rammstein - Mein Hertz Brennt</li>
                                <!-- end ngRepeat: track in trackList.lista -->
                                <li ng-repeat="track in trackList.lista" class="ng-binding ng-scope">Inxs - Never Tear Us Apart</li>
                                <!-- end ngRepeat: track in trackList.lista -->
                                <li ng-repeat="track in trackList.lista" class="ng-binding ng-scope">Nirvana - Smells Like Teen Spirit</li>
                                <!-- end ngRepeat: track in trackList.lista -->
                                <li ng-repeat="track in trackList.lista" class="ng-binding ng-scope">Rockfm - Stiri</li>
                                <!-- end ngRepeat: track in trackList.lista -->
                                <li ng-repeat="track in trackList.lista" class="ng-binding ng-scope">Phoenix - Nunta</li>
                                <!-- end ngRepeat: track in trackList.lista -->
                                <li ng-repeat="track in trackList.lista" class="ng-binding ng-scope">Survivor - Burning Heart</li>
                                <!-- end ngRepeat: track in trackList.lista -->
                                <li ng-repeat="track in trackList.lista" class="ng-binding ng-scope">Holograf - Banii Vorbesc</li>
                                <!-- end ngRepeat: track in trackList.lista -->
                                <li ng-repeat="track in trackList.lista" class="ng-binding ng-scope">It Rocks</li>
                                <!-- end ngRepeat: track in trackList.lista -->
                            </ul>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
</body>

This is my code:

import urllib
from BeautifulSoup import *

url = "www.rockfm.ro"
html = urllib.urlopen('http://www.rockfm.ro').read()

soup = BeautifulSoup(html)

div1 = soup.findAll(True)

#code to get into divs` classes

for div2 in div1:
    print("Level 1: "+ str(div2))
    with open('rock.txt', 'a') as file:
        file.write("Level 1: " + str(div2) + "\n")
    div3 = div2.findAll(True)

    for div4 in div3:
        print ("Level 2: "+ str(div4))
        with open('rock.txt', 'a') as file:
            file.write("Level 2: " + str(div4) + "\n")
        div5 = div4.findAll(True)

        for div6 in div5:
            print ("Level 3:" + str(div6))
            with open('rock.txt', 'a') as file:
                file.write("Level 3: " + str(div6) + "\n")
            div7 = div6.findAll(True)

            for div8 in div7:
                print ("Level 4:" + str(div8))
                with open('rock.txt', 'a') as file:
                    file.write("Level 3: " + str(div8) + "\n")
                div9 = div8.findAll(True)

                for div10 in div9:
                    print ("Level 4:" + str(div10))
                    with open('rock.txt', 'a') as file:
                        file.write("Level 4: " + str(div10) + "\n")

Thank you very much in advance!




Aucun commentaire:

Enregistrer un commentaire