tclrobot.git
18 years agoAdded.
Adam Dickmeiss [Wed, 1 Nov 2006 10:40:03 +0000 (10:40 +0000)]
Added.

20 years agoFixed unset
Adam Dickmeiss [Wed, 10 Dec 2003 09:58:22 +0000 (09:58 +0000)]
Fixed unset

21 years agoNonest on anchors
Adam Dickmeiss [Thu, 18 Sep 2003 07:41:25 +0000 (07:41 +0000)]
Nonest on anchors

21 years agorules changed to obey version 4 compatability tclrobot.0.2.0
Marc Cromme [Fri, 22 Aug 2003 14:23:36 +0000 (14:23 +0000)]
rules changed to obey version 4 compatability

21 years agoversion number 0.2 added in changelog
Marc Cromme [Fri, 22 Aug 2003 13:22:53 +0000 (13:22 +0000)]
version number 0.2 added in changelog

21 years agoexeeding newlines removed
Marc Cromme [Wed, 20 Aug 2003 12:27:24 +0000 (12:27 +0000)]
exeeding newlines removed

21 years agostill one bug: robot not build in debian/tclrobot , but in debian/tmp
Marc Cromme [Fri, 15 Aug 2003 14:05:21 +0000 (14:05 +0000)]
still one bug: robot not build in debian/tclrobot , but in debian/tmp

21 years agofirst working debian package now
Marc Cromme [Fri, 15 Aug 2003 13:17:01 +0000 (13:17 +0000)]
first working debian package now

21 years agoinital debian package made
Marc Cromme [Fri, 15 Aug 2003 12:49:03 +0000 (12:49 +0000)]
inital debian package made

21 years agono specific tkl stuff present any more
Marc Cromme [Thu, 14 Aug 2003 10:18:24 +0000 (10:18 +0000)]
no specific tkl stuff present any more

21 years agodiverse rules added
Marc Cromme [Thu, 14 Aug 2003 08:19:15 +0000 (08:19 +0000)]
diverse rules added

21 years agoinit script for package tkl-web-harvester added
Marc Cromme [Thu, 14 Aug 2003 08:17:05 +0000 (08:17 +0000)]
init script for package tkl-web-harvester added

21 years agotcl web harvesting script for tkl project added
Marc Cromme [Thu, 14 Aug 2003 08:02:10 +0000 (08:02 +0000)]
tcl web harvesting script for tkl project added

21 years agoxmlwf output
Adam Dickmeiss [Wed, 11 Jun 2003 10:29:41 +0000 (10:29 +0000)]
xmlwf output

21 years agoXML headers with character encoding as specified by HTTP server
Adam Dickmeiss [Wed, 11 Jun 2003 10:11:39 +0000 (10:11 +0000)]
XML headers with character encoding as specified by HTTP server

21 years agoUse suffix _.tkl
Adam Dickmeiss [Wed, 11 Jun 2003 09:40:22 +0000 (09:40 +0000)]
Use suffix _.tkl

21 years agorobotSeq(t) moved to control(task,seq)
Adam Dickmeiss [Wed, 11 Jun 2003 08:49:09 +0000 (08:49 +0000)]
robotSeq(t) moved to control(task,seq)

21 years agoFix default rule match
Adam Dickmeiss [Tue, 10 Jun 2003 13:16:16 +0000 (13:16 +0000)]
Fix default rule match

21 years agoFix unlink
Adam Dickmeiss [Tue, 10 Jun 2003 12:46:04 +0000 (12:46 +0000)]
Fix unlink

21 years agoDeny is default
Adam Dickmeiss [Tue, 10 Jun 2003 12:29:48 +0000 (12:29 +0000)]
Deny is default

21 years agoIgnore non-task files
Adam Dickmeiss [Tue, 10 Jun 2003 12:12:35 +0000 (12:12 +0000)]
Ignore non-task files

21 years agoFixes for tasks w full paths
Adam Dickmeiss [Tue, 10 Jun 2003 12:08:17 +0000 (12:08 +0000)]
Fixes for tasks w full paths

21 years agoREADME updates
Adam Dickmeiss [Tue, 10 Jun 2003 11:55:57 +0000 (11:55 +0000)]
README updates

21 years agoUsage
Adam Dickmeiss [Tue, 10 Jun 2003 11:55:18 +0000 (11:55 +0000)]
Usage

21 years agoTasks. TKL integration
Adam Dickmeiss [Tue, 10 Jun 2003 11:43:52 +0000 (11:43 +0000)]
Tasks. TKL integration

21 years agoFix check for content-type ZMBOT.0.1
Adam Dickmeiss [Mon, 13 Jan 2003 13:59:07 +0000 (13:59 +0000)]
Fix check for content-type

22 years agoLook for Tcl on Debian systems
Adam Dickmeiss [Fri, 20 Sep 2002 09:45:02 +0000 (09:45 +0000)]
Look for Tcl on Debian systems

22 years agounset meta attributes (so they are reset for next meta)
Adam Dickmeiss [Tue, 18 Jun 2002 19:57:53 +0000 (19:57 +0000)]
unset meta attributes (so they are reset for next meta)

22 years agoRemove code that skips ?'s in URL
Adam Dickmeiss [Mon, 25 Mar 2002 16:13:21 +0000 (16:13 +0000)]
Remove code that skips ?'s in URL

22 years ago*** empty log message ***
Adam Dickmeiss [Mon, 25 Mar 2002 16:11:08 +0000 (16:11 +0000)]
*** empty log message ***

22 years agoFix unvisited status
Adam Dickmeiss [Thu, 28 Feb 2002 14:04:11 +0000 (14:04 +0000)]
Fix unvisited status

22 years agoRobot honour robots meta tag
Adam Dickmeiss [Sun, 17 Feb 2002 09:29:18 +0000 (09:29 +0000)]
Robot honour robots meta tag

23 years agoFile status written with counts of areas: unvisited, bad, visited.
Adam Dickmeiss [Wed, 14 Nov 2001 09:15:23 +0000 (09:15 +0000)]
File status written with counts of areas: unvisited, bad, visited.
Tag area src=.. used for relative links.

23 years agoMIME check when reading HTTP header (not when reading content).
Adam Dickmeiss [Tue, 13 Nov 2001 11:17:26 +0000 (11:17 +0000)]
MIME check when reading HTTP header (not when reading content).
File robots.txt always read - even when text/plain is denied.

23 years agoRobot follows <frame src=...>.
Adam Dickmeiss [Fri, 9 Nov 2001 13:26:50 +0000 (13:26 +0000)]
Robot follows <frame src=...>.

23 years agoAdded tests script.
Adam Dickmeiss [Thu, 8 Nov 2001 14:22:21 +0000 (14:22 +0000)]
Added tests script.

23 years agoFixed bug regarding relative URLs.
Adam Dickmeiss [Thu, 8 Nov 2001 13:49:06 +0000 (13:49 +0000)]
Fixed bug regarding relative URLs.

23 years agoFixed bug in skipSpace (didn't check for null-byte).
Adam Dickmeiss [Thu, 8 Nov 2001 10:23:02 +0000 (10:23 +0000)]
Fixed bug in skipSpace (didn't check for null-byte).

23 years agoUse simpler regular expression to avoid Tcl regsub error (Tcl8.0.4-5).
Adam Dickmeiss [Wed, 7 Nov 2001 11:50:07 +0000 (11:50 +0000)]
Use simpler regular expression to avoid Tcl regsub error (Tcl8.0.4-5).

23 years agoGlob-expressions may be expressed as a list in rules (multi-OR).
Adam Dickmeiss [Wed, 7 Nov 2001 11:30:52 +0000 (11:30 +0000)]
Glob-expressions may be expressed as a list in rules (multi-OR).

23 years agoRobot saves metadata with unique names in directory "flat" (if it exists).
Adam Dickmeiss [Wed, 31 Oct 2001 08:51:49 +0000 (08:51 +0000)]
Robot saves metadata with unique names in directory "flat" (if it exists).

23 years agoPattern may be negated in rules (! as first character does that)
Adam Dickmeiss [Tue, 30 Oct 2001 08:29:54 +0000 (08:29 +0000)]
Pattern may be negated in rules (! as first character does that)

23 years agoImplemented Allow/deny rules. Better Tcl autoconfig.
Adam Dickmeiss [Fri, 26 Oct 2001 13:26:11 +0000 (13:26 +0000)]
Implemented Allow/deny rules. Better Tcl autoconfig.

23 years agoYet another fix regarding relative links.
Adam Dickmeiss [Fri, 29 Jun 2001 22:25:55 +0000 (22:25 +0000)]
Yet another fix regarding relative links.

23 years agoAdded option to specify Accept-Language.
Adam Dickmeiss [Fri, 29 Jun 2001 21:47:31 +0000 (21:47 +0000)]
Added option to specify Accept-Language.

23 years agoFixes for robots.txt handling (bug introduced by previous commit).
Adam Dickmeiss [Thu, 7 Jun 2001 08:17:00 +0000 (08:17 +0000)]
Fixes for robots.txt handling (bug introduced by previous commit).

23 years agoBug fix for relative links.
Adam Dickmeiss [Thu, 7 Jun 2001 08:10:10 +0000 (08:10 +0000)]
Bug fix for relative links.

23 years agoAdded some character entities for mapping.
Adam Dickmeiss [Wed, 6 Jun 2001 09:37:18 +0000 (09:37 +0000)]
Added some character entities for mapping.

23 years agoAdded README. Ignore case in keywords in robots.txt.
Adam Dickmeiss [Wed, 6 Jun 2001 07:10:31 +0000 (07:10 +0000)]
Added README. Ignore case in keywords in robots.txt.

23 years agomaxDistance set to 50 default.
Adam Dickmeiss [Tue, 5 Jun 2001 08:44:50 +0000 (08:44 +0000)]
maxDistance set to 50 default.

23 years agoRemove characters after semicolon in header contents.
Adam Dickmeiss [Tue, 5 Jun 2001 07:46:00 +0000 (07:46 +0000)]
Remove characters after semicolon in header contents.

23 years agoMinor changes.
Adam Dickmeiss [Tue, 27 Feb 2001 10:45:44 +0000 (10:45 +0000)]
Minor changes.

23 years agoAdded config for zebra/zmbol.
Adam Dickmeiss [Mon, 26 Feb 2001 22:51:51 +0000 (22:51 +0000)]
Added config for zebra/zmbol.

23 years agoMinor fix for anchor references.
Adam Dickmeiss [Tue, 23 Jan 2001 14:28:41 +0000 (14:28 +0000)]
Minor fix for anchor references.

23 years agoRemoved YAZ dependency.
Adam Dickmeiss [Tue, 23 Jan 2001 12:05:06 +0000 (12:05 +0000)]
Removed YAZ dependency.

23 years agoAdded options for the robot.
Adam Dickmeiss [Tue, 23 Jan 2001 11:26:43 +0000 (11:26 +0000)]
Added options for the robot.

23 years agoMultiple http connections. Bug fixes.
Adam Dickmeiss [Tue, 23 Jan 2001 09:20:32 +0000 (09:20 +0000)]
Multiple http connections. Bug fixes.

23 years agoFixed problem with links having .. for root directory of web server.
Adam Dickmeiss [Mon, 11 Dec 2000 17:11:03 +0000 (17:11 +0000)]
Fixed problem with links having .. for root directory of web server.
Thank you FrontPage.

23 years agoImplemented robots.txt rules.
Adam Dickmeiss [Sun, 10 Dec 2000 22:27:48 +0000 (22:27 +0000)]
Implemented robots.txt rules.

23 years agoFile robots.txt now read the each domain.
Adam Dickmeiss [Fri, 8 Dec 2000 22:46:53 +0000 (22:46 +0000)]
File robots.txt now read the each domain.
Pages are now fetched in a Round-robin fashion.

23 years agoDCdot doesn't rely on htmlSwitch no more.
Adam Dickmeiss [Fri, 8 Dec 2000 08:55:35 +0000 (08:55 +0000)]
DCdot doesn't rely on htmlSwitch no more.

23 years agoAdded -nonest for htmlSwitch statement. Robot puts reference to
Adam Dickmeiss [Thu, 7 Dec 2000 20:16:11 +0000 (20:16 +0000)]
Added -nonest for htmlSwitch statement. Robot puts reference to
bad URLs in bad area.

24 years agoMajor speed improvement.
Adam Dickmeiss [Mon, 27 Dec 1999 11:49:30 +0000 (11:49 +0000)]
Major speed improvement.

25 years agoUpdated configure script.
Adam Dickmeiss [Thu, 4 Feb 1999 21:32:00 +0000 (21:32 +0000)]
Updated configure script.

25 years agoChanged tags for the output.
Per M. Hansen [Thu, 4 Feb 1999 20:37:25 +0000 (20:37 +0000)]
Changed tags for the output.

26 years agoMinor changes.
Adam Dickmeiss [Thu, 15 Oct 1998 13:27:19 +0000 (13:27 +0000)]
Minor changes.

26 years agoAdded configure script.
Adam Dickmeiss [Thu, 15 Oct 1998 12:31:25 +0000 (12:31 +0000)]
Added configure script.

26 years agoBuf fixes. Robot saves body of text without tags and java script sections.
Adam Dickmeiss [Thu, 15 Oct 1998 12:30:59 +0000 (12:30 +0000)]
Buf fixes. Robot saves body of text without tags and java script sections.

28 years agoInitial revision
Adam Dickmeiss [Tue, 6 Aug 1996 14:04:22 +0000 (14:04 +0000)]
Initial revision