dimanche 30 septembre 2018

Web crawler in flutter

Giving you a context, my goal is to make a flutter app that logs in my school's website and sends me a notification when monitoring of a given subject is available. The flutter part is not my doubt and may be really easy, but I have absolutely zero experience in crawling through the web in any language. What I did notice about the web page:

  1. When you open it, the browser sends, as cookie, Google Analytics stuff (seems like you're still able to login without sending those) and PHPSESSID. If you send no PHPSESSID, one is returned in response by set-cookie and it's totally functional.
  2. After submiting the form, your registration ("matrícula" in the website) and password ("senha" in the website) are sent as form data in a post to the same URL, but with a "?" at the end. PHPSESSID is also sent as cookie.
  3. This post seems to do nothing, returning status 200. Then another equal post is sent, returning status 302, then I'm able use the website.

The session is probably kept with PHPSESSID, since nothing besides that is provided to the server after already logged in.

I also reproduced these steps with Firefox's network tool (in web developer menu) without the website GUI (sending the first get and both posts manually) and could get access to the page, but when I try to do that in dart with same headers and form data, I always get status 200.

My code:

import 'package:http/http.dart' as http;

// same headers that firefox uses

final Map getHeaders = {
  'Host': 'grupofibonacci.com.br',
  'Connection': 'keep-alive',
  'Pragma': 'no-cache',
  // I probably should not set this User-Agent, but without it also doesn't work
  'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0',
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Encoding': 'gzip, deflate',
  'Accept-Language': 'en-GB,en;q=0.5',
  'Cache-Control': 'no-cache',
  'Referer': 'http://grupofi.com.br/',
  'Upgrade-Insecure-Requests': '1'
};

final Map postHeaders = {
  'Host': 'grupofibonacci.com.br',
  'Connection': 'keep-alive',
  'Content-Length': '78',
  'Pragma': 'no-cache',
  'Upgrade-Insecure-Requests': '1',
  'Content-Type': 'application/x-www-form-urlencoded',
  // I probably should not set this User-Agent, but without it also doesn't work
  'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0',
  'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Encoding': 'gzip, deflate',
  'Accept-Language': 'en-GB,en;q=0.5',
  'Referer': 'http://grupofibonacci.com.br/area_aluno.php?',
  'Cache-Control': 'no-cache'
};

void main() {
  final client = new http.Client();

  client.get(
    'http://grupofibonacci.com.br/area_aluno.php',
    headers: getHeaders
  ).then((getResponse) {

    // add the provided PHPSESSID to the post cookie
    final String phpSessionID = getResponse.headers['set-cookie'].split(';')[0];
    postHeaders.addAll({ 'Cookie': phpSessionID });

    client.post(
      'http://grupofibonacci.com.br/area_aluno.php?',
      headers: postHeaders,
      body: '''
        matricula=MY_REGISTRATION&
        senha=MY_PASSWORD&
        matricula_rec=&
        email_rec=&
        email_mat=&
        cpf=
      '''
    ).then((postResponse) {

      // first post does nothing and returns code 200 as usual

      client.post(
        'http://grupofibonacci.com.br/area_aluno.php?',
        headers: postHeaders,
        body: 'matricula=0808018&senha=burr1t0Fr1t0&matricula_rec=&email_rec=&email_mat=&cpf='
      ).then((secondPostResponse) {

        // this second also returns code 200 and does nothing

      });

    });

  });
}

This is as much specific as I can be. What I am doing wrong?




Aucun commentaire:

Enregistrer un commentaire