Commit Graph

40 Commits

Author SHA1 Message Date
antanst
330b596497 Enhance crawler with seed list and SQL utilities
Add seedList module for URL initialization, comprehensive SQL utilities for database analysis, and update project configuration.
2025-06-16 12:29:33 +03:00
51f94c90b2 Update documentation and project configuration
- Add architecture documentation for versioned snapshots
- Update Makefile with improved build commands
- Update dependency versions in go.mod
- Add project notes and development guidelines
- Improve README with new features and instructions
2025-05-22 13:26:11 +03:00
bfaa857fae Update and refactor core functionality
- Update common package utilities
- Refactor network code for better error handling
- Remove deprecated files and functionality
- Enhance blacklist and filtering capabilities
- Improve snapshot handling and processing
2025-05-22 12:47:01 +03:00
5cc82f2c75 Modernize host pool management
- Add context-aware host pool operations
- Implement rate limiting for host connections
- Improve concurrency handling with mutexes
- Add host connection tracking
2025-05-22 12:46:42 +03:00
eca54b2f68 Implement context-aware database operations
- Add context support to database operations
- Implement versioned snapshots for URL history
- Update database queries to support URL timestamps
- Improve transaction handling with context
- Add utility functions for snapshot history
2025-05-22 12:46:36 +03:00
7d27e5a123 Add whitelist functionality
- Implement whitelist package for filtering URLs
- Support pattern matching for allowed URLs
- Add URL validation against whitelist patterns
- Include test cases for whitelist functionality
2025-05-22 12:46:28 +03:00
8a9ca0b2e7 Add robots.txt parsing and matching functionality
- Create separate robotsMatch package for robots.txt handling
- Implement robots.txt parsing with support for different directives
- Add support for both Allow and Disallow patterns
- Include robots.txt matching with efficient pattern matching
- Add test cases for robots matching
2025-05-22 12:46:21 +03:00
5940a117fd Add context-aware network operations
- Implement context-aware versions of network operations
- Add request cancellation support throughout network code
- Use structured logging with context metadata
- Support timeout management with contexts
- Improve error handling with detailed logging

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-05-22 12:45:58 +03:00
d1c326f868 Improve error handling with xerrors package
- Replace custom error handling with xerrors package
- Enhance error descriptions for better debugging
- Add text utilities for string processing
- Update error tests to use standard errors package
- Add String() method to GeminiError

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-05-22 12:45:46 +03:00
a55f820f62 Implement structured logging with slog
- Replace zerolog with Go's standard slog package
- Add ColorHandler for terminal color output
- Add context-aware logging system
- Format attributes on the same line as log messages
- Use green color for INFO level logs
- Set up context value extraction helpers

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-05-22 12:44:08 +03:00
ad224a328e Change errors to use xerrors package. 2025-05-12 20:37:58 +03:00
a823f5abc3 Fix Makefile. 2025-03-10 16:54:06 +02:00
658c5f5471 Fix linter warnings in gemini/network.go
Remove redundant nil checks before len() operations as len() for nil slices is defined as zero in Go.

🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
2025-03-10 11:34:29 +02:00
efaedcc6b2 Improvements in error handling & descriptions 2025-02-27 09:20:22 +02:00
9dc008cb0f Use go_errors library everywhere. 2025-02-26 13:31:46 +02:00
c82b436d32 Update license and readme. 2025-02-26 10:39:51 +02:00
4f47521401 update gitignore 2025-02-26 10:37:20 +02:00
96a39ec3b6 Improve main error handling 2025-02-26 10:37:09 +02:00
54474d45cd Use Go race detector 2025-02-26 10:36:51 +02:00
d306c44f3d Tidy go mod 2025-02-26 10:36:41 +02:00
79e3175467 Add gemget script that downloads Gemini pages 2025-02-26 10:35:54 +02:00
d89dd72fe9 Add Gopherspace crawling! 2025-02-26 10:35:28 +02:00
29877cb2da Simplify host pool 2025-02-26 10:35:11 +02:00
4bceb75695 Reorganize code for more granular imports 2025-02-26 10:34:46 +02:00
a9983f3531 Reorganize errors 2025-02-26 10:32:38 +02:00
5cf720103f Improve blacklist to use regex matching 2025-02-26 10:32:01 +02:00
b6dd77e57e Add regex matching function to util 2025-01-16 22:37:39 +02:00
973a4f3a2d Add tidy & update Makefile targets 2025-01-16 22:37:39 +02:00
b30b7274ec Simplify duplicate code 2025-01-16 22:37:39 +02:00
63adf73ef9 Proper package in tests 2025-01-16 10:04:02 +02:00
b3387ce7ad Add DB scan error 2025-01-16 10:04:02 +02:00
9ade26b6e8 Simplify IP pool and convert it to host pool 2025-01-16 10:04:02 +02:00
4a345a1763 Break up Gemtext link parsing code and improve tests. 2025-01-16 10:04:02 +02:00
64f98bb37c Add mode that prints multiple worker status in console 2025-01-16 10:04:02 +02:00
ccb8f6838e Update DB init instructions & README 2025-01-04 15:39:21 +02:00
4e6fad873b Break up common functions and small refactor. 2025-01-04 15:31:26 +02:00
b78fe00221 Add license. 2024-12-27 12:13:05 +02:00
90f6ecd024 Add README.md and Makefile. 2024-12-27 12:11:35 +02:00
b52df073e9 Add first version of gemini-grc. 2024-12-27 12:09:55 +02:00
93822b239e Initial commit. 2024-12-26 21:34:54 +02:00