|
Perl Scripts
|
Checking if a directory has any filesThis section contains some Perl scripts that I've found useful. Maybe you will too. As for the code itself, while I've been programming since 1981 I've only been using Perl for a short time so the code may not be at the guru level. But each script has been tested over time as I've used it on a working box, so it should be pretty bug-free in the circumstances I describe in the text below. As I've said elsewhere on this site, while I've tested the material you'll find here and found it to work, I make no guarantees as to the fitness of any of the contents of this website for any purpose whatsoever. If it works for you, great; if it doesn't then please let me know.
Using ROT13 encodingThe problem: I use Leafnode as a local caching news proxy (See the tutorial) and want to check if there are any outgoing posts in /var/spool/news/out.going. If there are, I copy them to my database of previous posts, then do a 'fetchnews -P' to post them before doing a 'fetchnews' to get new posts. This way I get to see my own posts in the next download, not the one after that, and I get a backup copy of my articles before they're shot off into cyberspace.
The solution: This script takes a single command-line parameter: the directory to check. Like most Unix commands, it returns 0 if it succeeds (i.e. the directory has files) and the exit code 1 if it fails (there are no files in the named directory). Note that it returns 0 if it finds any non-directory kind of file, e.g. pipes or block-special devices.
Download the script
Reversing the contents of a fileThe problem: Some newsreaders don't have the ability to ROT13-encode text on the fly.
The solution: This script is a filter; that is, it takes its input from standard input and writes to standard output. Each character read in that is in the range a-z or A-Z is encoded such that characters in the range a-m are recoded into the range n-z and vice versa. Upper-case characters are recoded in a similar way. Here are a couple of examples of its use:
rot13.pl <inputfile >outputfile #rot13 from one file to another
rot13.pl >outputfile # read text from command line until Ctrl+D pressed
Download the script
Running a program then shutting down the systemThe problem: This is just an example for newcomers to Perl. It shows how to read lines from standard input, filtering them and sending them to standard output.
The solution: This script is a filter; that is, it takes its input from standard input and writes to standard output. Each line read in is reversed and added to an initally-empty string. Once the entire files is read in, the string is dumped to STDOUT. Since it reads everything on STDIN before doing any output, it's not suitable for use with large files unless you have enough memory. Here are a couple of examples of its use:
sdrawkcab.pl <inputfile >outputfile #read from one file to another
sdrawkcab.pl >outputfile # read text from command line until Ctrl+D pressed
Download the script
Logging your Internet online timeThe problem: Someone asked me in the alt.os.linux.mandrake newsgroup:
> Is there a way to shutdown the computer with a command when the download has finished?
He wanted to set his FTP program to download some files overnight, then shut down his machine when the download was over. The program itself (Downloader for X) has the ability to close down at the completion of the file transfer. All that was needed was a way to shut the machine down cleanly when the program closed.The solution: I put together a script which could be called as 'runshut prog [opts] [args]' where prog is the name of the program to run (Downloader for X in the case of the Original Poster), [opts] is options to the program, and args are optional arguments to the program. For example: runshut emacs -nw ~/.bashrc. The script can be used to run any command-line program, or any X-based app as long as it's run by the owner of the display. Obviously, you can also use it to run bash scripts as well, so you can do some complex jobs and the system will only shut down when the entire bash script has finished. The script comes with fairly complete documentation in a .zip file.
Download the script
Removing duplicate entries from your $PATHThe problem: I wanted a sure-fire way of logging my Internet online time without having to rely on Kppp's logging function since I might not always want to use Kppp. For example, it's sometimes convenient to write a script that dials my ISP, connects to the FTP upload server, and uploads a batch of files before logging off and disconnecting. This would by-pass Kppp's logging function.
The solution: I put together a couple of scripts. One would be called whenever I went online, no matter whether I used Kppp or any other method to dial, even if it were a chat script. The other would be called whenever I went offline. The Linux ppp daemon calls the ip-up bash script (in /etc/ppp) when it connects and calls the ip-down script when it disconnects. Each of these bash scripts calls another script with the name ip-up.local or ip-down.local. I just call my two Perl scripts from these files. The first of my scripts notes the time I went online and saves it to a temporary file. The second script notes the time I came offline and calculates the total time I spent online. It writes the results to a logfile called /var/log/inetlog. There are two other scripts: timeonline tells you the total time logged in inetlog, and timeonlinetoday tells you the total time logged for a given day. Of course, you can always just look at inetlog to see how long you were on. The scripts are intended for use by anyone who dials up on a modem. The .zip file you can download contains all four scripts together with full details on how to install and use them.
Download the scripts
Newsgroup statisticsThe problem: If you do 'echo $PATH' in a console, you'll probably see that it contains some duplicates. There are many scripts that run during the boot and login process that set your path. Because it's not guaranteed which scripts will be run depending on your settings, the scripts tend to add the same set of directories over and over again.
The solution: You could try and trace through the boot and login process yourself, making a note of which scripts get run when, and which scripts set which parts of $PATH, and try and clean up the mess yourself. Or you can set a specific $PATH as the last line of your ~/.bashrc; for example:
set PATH=//home/garry/bin:/usr/lib/java:/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin:/usr/local/sbinThe problem with this is that if any of the system scripts get changed, your $PATH won't be updated to reflect this. It will still be set to whatever you specifically set it to in ~/.bashrc. But there's no need for all this rigmarole. The 'uniqpath' script can be called from your ~/.bashrc script and it will strip out all duplicate directories from $PATH. Download the script, unzip it and put it somewhere on your $PATH; I suggest you put it in /usr/local/bin. You can then call it as the last line in your ~/.bashrc:
#!/bin/bash[Note the use of the backquotes in this command.] This way, it doesn't matter which scripts have added whatever to your $PATH, every new shell gets the shortest version of it since ~/.bashrc is run whenever you start up a new non-login shell, and it's called from ~/.bash_profile anyway, so you get a clean $PATH on login shells, too.
... rest of ~/.bashrc ...
export PATH=`/usr/local/bin/uniqpath`
Download the script
The problem: I wanted to produce a meaningful set of statistics on articles posted to the alt.os.linux.mandrake newsgroup. I checked around for existing scripts but the most likely-looking one I found didn't quite give me what I was looking for. So I wrote my own.
The solution: The Perl script assumes that the newsgroup articles have been downloaded using Leafnode into the /var/spool/news/alt/os/linux/mandrake directory and it's easy to change this - just edit the first two variables declared. It reads in all articles in the directory and collects statistics from all articles whose file timestamps were modified within the previous 7-day period. It ranks the top 20 posters by number of posts and by size of articles. It ranks the top and bottom 20 responders (people replying to posts) by the percentage of original text in the posts. It ranks the top 20 threads by number of articles and by size of articles. It lists the top 10 cross-posted groups, the top 10 user agents by poster and by number of posts, and it lists the top 10 time zones from which posts were made. While collating these lists it also summarizes totals and averages for number of articles, threads and posters, along with a few other sundry statistics.
Some regulars (and not-so regulars) in the alt.os.linux.mandrake newsgroup have suggested some additions they'd like to see. For example, top-posted articles as a percentage of total articles; the top 20 cross-posters; the top 20 news posting hosts; a count of certain "sensitive" words in the Subject header, such as "newbie", "kde", "burner" - words that come up time and time again. There are a couple of changes I'd like to make myself. For example, the Top 20 User Agents list is compiled by counting one UA per poster, so if someone changes UA during the 7-day period, only one UA is taken into account. I'd like to fix this. I'd also like to take into account unusual quote styles. For example, there's at least one poster who likes to quote] like this rather thanI'll get round to fixing some or all of these one day. But if you download and use the script, and if you make some of these changes or changes of your own, I'd be grateful if you'd e-mail me with the results so the whole community can benefit. And on the subject of the community, I'd like to thank Chris van Ophuijsen (who posts in aolm as "HoboSong") for his additions to the script, and Robert Marshall for regularly posting the statistics when I can't.
> like this!
Note: This article was written back in about 2005 and so is quite old in Internet time. The script has been updated and had some bugs fixed in the intervening years, and I no longer have the time to maintain the original. You can download it from the link below if you want to see how it's done, but I would suggest that if you seriously want to generate statistics for a Usenet group, you should subscribe to the alt.os.linux.mandriva newsgroup (or do a search on Google Groups) and find out who is the latest person posting stats to the group, then ask them for a copy of the latest version of the script. I'm sure they'll oblige as the script was originally in the public domain. Tell them Garry said it was OK, and feel free to refer them to this page if necessary.
|
Site design © Garry Knight 1998-2007
|
|