What I want is to get all links from a webpage and add them to a list, but if 5 minutes have passed I want it to stop and just keep the list as it is.
I get all the links, but I can't seem to make it stop.
public void fillList(String path) throws Exception
{
list = new ArrayList<String>();
Reader r = null;
timer = new StoppingThread();
timer.start();
try
{
URL u = new URL(path);
InputStream in = u.openStream();
r = new InputStreamReader(in);
ParserDelegator hp = new ParserDelegator();
hp.parse(r, new HTMLEditorKit.ParserCallback()
{
public void handleStartTag(HTML.Tag t, MutableAttributeSet a,
int pos)
{
if (t == HTML.Tag.A)
{
if (!timer.isActive())
return;
@SuppressWarnings("rawtypes")
Enumeration attrNames = a.getAttributeNames();
while (attrNames.hasMoreElements())
{
if (!timer.isActive())
return;
Object key = attrNames.nextElement();
if ("href".equals(key.toString()))
{
if (!list.contains((String) a.getAttribute(key)))
{
if (a.getAttribute(key).toString().startsWith("https://"))
{
list.add((String) a.getAttribute(key));
}
}
}
}
}
}
}, true);
}
finally
{
if (r != null)
{
r.close();
}
}
}
What I tried was a simple thread with a timer, when it stops a boolean becomes false, but return there doesn't seem to do anything. If anything it sometimes takes even longer with the timer on.
This is the Thread:
public class StoppingThread extends Thread
{
private Boolean active;
public void run()
{
active = true;
try
{
sleep(1000 * 60 * 3);
}
catch (InterruptedException e) { }
active = false;
}
public Boolean isActive()
{
return active;
}
}
Also, I'm using Apache's commons-io-2.4.jar here to do this. Can anyone tell what I'm doing wrong or how to do it right?
Aucun commentaire:
Enregistrer un commentaire