I need to search for an h2 tag with certain value and extract all text following it until the next h2 tag or end of page. so if the page is
<h1 id="DDPSupport-InternalResources"><span style="color: rgb(0,51,102);"><strong>Internal Resources</strong></span></h1>
<h2 id="DDPSupport-GeneralInformation">General Information</h2>
<ul><li><a href="/display/ladtechtme/DDP+overview">DDP overview</a></li>
<li><a href="/display/ladtechtme/DDP+Configuration+guide">DDP Config guide</a></li>
<li><a href="/pages/viewpage.action?pageId=1338281922">Custom DPR</a></li>
<li><a href="/display/ladtechtme/Build+custom+package">Build custom package</a></li>
<li><a href="/display/ladtechtme/Unit+testing">Unit testing</a></li>
<li><a href="/display/ladtechtme/FAQ">FAQ </a></li>
<li><a href="/display/ladtechtme/Misc+BKMs">Misc BKMs</a></li></ul>
<h2 id="DDPSupport-UseCases">Use Cases</h2>
<ul><li><a href="/pages/viewpage.action?pageId=1338281922">Custom DPR </a></li>...
, the expected output is
DDP overview
DDP Config guide
Custom DPR
Build custom package
Unit testing
FAQ
Misc BKMs
I am using the following code:
for head in response.xpath("//div[@class='wiki-content']/h2"):
if sub == 'General Information':
lines = head.xpath("//following-sibling::*[count(following-sibling::h2)=1]//text()").extract()
print(str(lines))
I am getting some result but not the desired one. My output consists of the text of the next h2 tag. Any help would be appreciated.
Aucun commentaire:
Enregistrer un commentaire