Alex Ng - ProjectSharkalyze

8/2024 - 8/2024

/images/dark.svg/images/light.svg/images/dark.svg

Rust webscraper to generate Vectors for ML

Project Image

Introduction

Sharkalyze is a PolyFinTech100 API Hackathon 2024 entry by my team and I, SHARKECH.

Sharkalyze TrustZone is a scam prevention solution utilizing AI and web scrapping on links and QR codes to calculate risk and detect fraud.

What I did

I built a highly parallelized web scraper in Rust, solo, designed for speed and efficiency, capable of processing up to 200,000 URLs.

  • Parallel Processing: It handles 20 URLs at a time using Tokio and a semaphore to keep things smooth and controlled.
  • Concurrency Management: Each URL isn't just processed—it also discovers new links, triggering up to 500 concurrent HTTP GET requests per batch, all managed with another semaphore.
  • Feature Extraction: Every URL is analyzed to generate a 65-parameter vector, which feeds into our AI model to assess scam risk.

This made TrustZone capable of processing massive amounts of data quickly while keeping everything under control.