Enhance crawler with seed list and SQL utilities

Add seedList module for URL initialization, comprehensive SQL utilities for database analysis, and update project configuration.
This commit is contained in:
antanst
2025-06-16 12:29:33 +03:00
parent 5e6dabf1e7
commit 8588414b14
37 changed files with 742 additions and 682 deletions

22
test.txt Normal file
View File

@@ -0,0 +1,22 @@
# Test redirect full url:
gemini://gemini.circumlunar.space
# Test blacklist:
gemi.dev
# Test robots disallow:
gemini://tlgs.one/search?aa
# Test TLS cert required:
gemini://astrobotany.mozz.us/app/plant
// 31 redirect
gemini://gemini.circumlunar.space
// body with null byte
gemini://kennedy.gemi.dev/archive/cached?url=gemini://spam.works/mirrors/textfiles/fun/consult.how&t=638427244900000000&raw=False
// has invalid url
gemini://tlgs.one/known-hosts
// Needs SNI TLS info (our bug)
gemini://hanzbrix.pollux.casa/gemlog/20241002.gmi