From 43f22425580d4a484672e5e320430a0e950b452b Mon Sep 17 00:00:00 2001 From: antanst Date: Mon, 9 Dec 2024 19:54:15 +0200 Subject: [PATCH] Update README --- README.md | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 9b17907..2464823 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,8 @@ A Gemini crawler. +URLs to visit as well as data from visited URLs are stored into "snapshots" in the database. + ## Done - [x] Concurrent downloading with workers - [x] Concurrent connection limit per host @@ -10,22 +12,16 @@ A Gemini crawler. - [x] Configuration via environment variables - [x] Storing snapshots in PostgreSQL - [x] Proper response header & body UTF-8 and format validation -- [x] Follow robots.txt +- [x] Follow robots.txt, see gemini://geminiprotocol.net/docs/companion/robots.gmi +- [x] Handle redirects (3X status codes) ## TODO -- [ ] Take into account gemini://geminiprotocol.net/docs/companion/robots.gmi -- [ ] Proper handling of all response codes - - [ ] Handle 3X redirects properly -- [ ] Handle URLs that need presentation of a TLS cert, like astrobotany - + [ ] Probably have a common "grc" cert for all? -- [ ] Proper input and response validations: - + [ ] When making a request, the URI MUST NOT exceed 1024 bytes -- [ ] Subscriptions to gemini pages? gemini://geminiprotocol.net/docs/companion/ +- [ ] Better URL normalization +- [ ] Provide a TLS cert for sites that require it, like Astrobotany ## TODO for later -- [ ] Add other protocols - + [ ] Gopher - + [ ] Scroll gemini://auragem.letz.dev/devlog/20240316.gmi - + [ ] Spartan - + [ ] Nex - + [ ] SuperTXT https://supertxt.net/00-intro.html +- [ ] Gopher +- [ ] Scroll gemini://auragem.letz.dev/devlog/20240316.gmi +- [ ] Spartan +- [ ] Nex +- [ ] SuperTXT https://supertxt.net/00-intro.html