Alex Ng - ProjectSharkalyze
8/2024 - 8/2024
Rust webscraper to generate Vectors for ML
Introduction
Sharkalyze is a PolyFinTech100 API Hackathon 2024 entry by my team and I, SHARKECH.
Sharkalyze TrustZone is a scam prevention solution utilizing AI and web scrapping on links and QR codes to calculate risk and detect fraud.
What I did
I built a highly parallelized web scraper in Rust, solo, designed for speed and efficiency, capable of processing up to 200,000 URLs.
- Parallel Processing: It handles 20 URLs at a time using Tokio and a semaphore to keep things smooth and controlled.
- Concurrency Management: Each URL isn't just processed—it also discovers new links, triggering up to 500 concurrent HTTP GET requests per batch, all managed with another semaphore.
- Feature Extraction: Every URL is analyzed to generate a 65-parameter vector, which feeds into our AI model to assess scam risk.
This made TrustZone capable of processing massive amounts of data quickly while keeping everything under control.
Introduction
Sharkalyze is a PolyFinTech100 API Hackathon 2024 entry by my team and I, SHARKECH.
Sharkalyze TrustZone is a scam prevention solution utilizing AI and web scrapping on links and QR codes to calculate risk and detect fraud.
What I did
I built a highly parallelized web scraper in Rust, solo, designed for speed and efficiency, capable of processing up to 200,000 URLs.
- Parallel Processing: It handles 20 URLs at a time using Tokio and a semaphore to keep things smooth and controlled.
- Concurrency Management: Each URL isn't just processed—it also discovers new links, triggering up to 500 concurrent HTTP GET requests per batch, all managed with another semaphore.
- Feature Extraction: Every URL is analyzed to generate a 65-parameter vector, which feeds into our AI model to assess scam risk.
This made TrustZone capable of processing massive amounts of data quickly while keeping everything under control.