vendredi 29 janvier 2021

How to get IAB category for 1M websites

I am writing a publication about websites privacy. Quite a standard of such works [e.g., 1, 2] is to summarize categories of websites in the study, to either prove that the study is not biased, or the opposite that some privacy violations are skewed towards some categories.

I found that categories are usually IAB categories and there is a pleothora different APIs (BrandFetch, ClearBit, WebShrinker) offering resolution of website to this IAB category. However, these APIs seem to be meant for SEO analysts, with pricing fairly exceeding 1000$ for getting categories of Alexa/Tranco top 1M websites. They offer far more details (logo, Facebook link, and similar services for marketing analysis), while I want just an offline list of these IAB categories, at most business size and country would be useful.

The past works were not helpful. [1] refer to McAfee SmartFilter Internet Database (discontinued), [2] to SimilarWeb, another online service that does not seem to scale. Also Alexa lists by categories were discontinued in September 2020.

I feel like I am just not searching the right keywords. Maybe there is some plain text/csv file with all of this for top 1M websites that would solve all my issues. In parallel I am contacting SimilarWeb and WebShrinker if they would give me academic access for acknowledgement in the publication. Do you know source of such a list of top 1M websites with their IAB categories?

References

[1] Urban, Tobias, et al. "Beyond the front page: Measuring third party dynamics in the field." Proceedings of The Web Conference 2020. 2020. https://dl.acm.org/doi/pdf/10.1145/3366423.3380203

[2] Trevisan, Martino, et al. "4 years of EU cookie law: Results and lessons learned." Proceedings on Privacy Enhancing Technologies 2019.2 (2019): 126-145. https://content.sciendo.com/downloadpdf/journals/popets/2019/2/article-p126.xml




Aucun commentaire:

Enregistrer un commentaire