Adrien Barbaresi
4 papers · 2013–2021 · 3 conferences · across top CS/AI conferences
Achievements
Jump to papers ↓+3 more ↓ Show less ↑
🐝 Cross-Pollinator (15) 🏃 Academic Marathon (8) 🧭 Keyword Pioneer 🌍 Conference Polyglot (3) 🌈 Renaissance Researcher (5)
🌉
Interdisciplinary Bridge
🐺
Lone Wolf
(4)
🏆
Keyword Champion
(2)
Conferences
ACL (2)
COLING (1)
IJCNLP (1)
Keywords
web scraping
(2)
metadata extraction
(2)
corpus construction
(2)
text extraction
(2)
content extraction
(2)
linear classifier
(1)
web crawling
(1)
data preprocessing
(1)
feature vector
(1)
data extraction
(1)
ridge classifier
(1)
web corpus construction
(1)
open source tool
(1)
benchmark evaluation
(1)
text discovery
(1)
natural language processing
(1)
text mining
(1)
information extraction
(1)
language identification
(1)
Papers
Trafilatura: A Web Scraping Library and Command-Line Tool for Text Discovery and Extraction
ACL 2021
Trafilatura: A Web Scraping Library and Command-Line Tool for Text Discovery and Extraction
IJCNLP 2021
Computationally efficient discrimination between language varieties with large feature vectors and regularized classifiers
COLING 2018
Crawling microblogging services to gather language-classified URLs. Workflow and case study
ACL 2013