2024-12-09 19:54:15 +02:00
2024-12-09 19:54:00 +02:00
2024-12-09 19:54:00 +02:00
2024-12-09 19:54:08 +02:00
.
2024-11-18 16:28:45 +02:00
2024-10-31 16:58:08 +02:00
.
2024-11-18 16:28:45 +02:00
.
2024-11-18 16:28:45 +02:00
.
2024-11-18 16:28:45 +02:00
.
2024-11-18 16:28:45 +02:00
.
2024-11-18 16:28:45 +02:00
.
2024-11-18 16:28:45 +02:00
2024-12-09 19:54:15 +02:00

gemini-grc

A Gemini crawler.

URLs to visit as well as data from visited URLs are stored into "snapshots" in the database.

Done

  • Concurrent downloading with workers
  • Concurrent connection limit per host
  • URL Blacklist
  • Save image/* and text/* files
  • Configuration via environment variables
  • Storing snapshots in PostgreSQL
  • Proper response header & body UTF-8 and format validation
  • Follow robots.txt, see gemini://geminiprotocol.net/docs/companion/robots.gmi
  • Handle redirects (3X status codes)

TODO

  • Better URL normalization
  • Provide a TLS cert for sites that require it, like Astrobotany

TODO for later

Description
A Gemini protocol server written in Rust.
Readme 158 KiB
Languages
Go 98%
PLpgSQL 0.9%
Makefile 0.6%
Shell 0.5%