Explore Help

antanst/gemini-grc

1

0

You've already forked gemini-grc

Code Issues Pull Requests Actions Packages Projects Releases Wiki Activity

Files

b78fe002212befb0daf6628243392b0c0dd5ed4d

gemini-grc/README.md

antanst 90f6ecd024 Add README.md and Makefile.

2024-12-27 12:11:35 +02:00

973 B

Raw Blame History

gemini-grc

A Gemini crawler.

URLs to visit as well as data from visited URLs are stored as "snapshots" in the database. This makes it easily extendable as a "wayback machine" of Gemini.

Done

Concurrent downloading with workers
Concurrent connection limit per host
URL Blacklist
Save image/* and text/* files
Configuration via environment variables
Storing snapshots in PostgreSQL
Proper response header & body UTF-8 and format validation
Follow robots.txt, see gemini://geminiprotocol.net/docs/companion/robots.gmi
Handle redirects (3X status codes)
Better URL normalization

TODO

Add snapshot hash and support snapshot history
Add web interface
Provide a TLS cert for sites that require it, like Astrobotany

TODO with lower priority

Gopher
Scroll gemini://auragem.letz.dev/devlog/20240316.gmi
Spartan
Nex
SuperTXT https://supertxt.net/00-intro.html

Reference in New Issue View Git Blame Copy Permalink

Powered by Gitea Version: 1.25.3 Page: 13ms Template: 1ms

English

Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語简体中文繁體中文（台灣）繁體中文（香港） 한국어

Licenses API