Home | | Python | | Share This Page |
(double-click any word to see its definition)
This is a new version of a Ruby program with almost the same name. I wrote this version because (a) I have begun writing Python programs, (b) I have found that Python is much faster than Ruby, and (c) I wanted to add some features and update the old program.
The idea of this program is to combine the features of some classic Unix command-line utilities in order to search a filesystem for files and/or directories by name, search file content by search string, and even edit file content selectively across a file system.
I can use grep or egrep to search files, but grep isn't designed to scan directory trees. I can use find to scan directory trees, but find isn't designed to search file content. And neither of these utilities can search and replace file content using regular expressions. So there's an unmet need.
SearchReplaceGlobalPy (hereafter SRG, for the sake of brevity) is a Python program that has a number of file management options, from the most benign to the most risky. It's fast, efficient, and for certain tasks, you may begin to wonder how you got along without it.
SRG is open-source, free and is licensed under the GPL. It is packaged in a zip file containing three components (a Python script, a Glade configuration file and a help file) that you can place anywhere convenient. SRG requires Python 2.7 and gtk2.
To install SRG, unpack the zip archive and place the contents in any convenient directory. To make SRG accessible in a desktop environment, create a shortcut to the executable Python file:
- Click here to download the distribution zip file.
- Click here to view the Python source online.
- Linux: Copy this template for your desktop shortcut, and edit the path entry to suit your needs:
[Desktop Entry] Encoding=UTF-8 Name=SearchReplace Comment=Python-based global file content search/replace Exec=(path/to/python/script)/search_replace_global.py Terminal=false Type=Application Icon=system-search- Windows: The simplest approach is for readers to right-click the Python script and select "Create Shortcut", then drag the created shortcut onto your desktop.
First and foremost, remember that, if egregiously misused, this program can erase all the files on your system. It's very powerful and useful, but it requires a certain amount of responsibility and caution.
There is actually only one option that could cause harm — the "Replace" option. If you used every feature of this program except "Replace," you would be quite safe. But if you plan to replace file content as well as search, please be sure to have backups, because a poorly written regular expression along with a full-system scan could do a lot of damage.
Also, some knowledge of regular expressions is very desirable, because virtually all entries to SRG are regular expressions. Here are some regular expression links:
- Regular Expression (Wikipedia)
- Regular Expression Reference
- Regular Expression Tutorial
- Regular Expression laboratory — experiment with regular expressions
Here is typical step-by-step procedure for using SRG:
Phase I — Searching for files (quite safe)
- In the entry window titled "Search Path", enter a search path or click "Browse" and choose a path using the directory dialog.
- In the entry window titled "File Filter", enter a specifier for the files you want to scan. Some example entries:
- "\.txt$" would select all files ending in ".txt".
- "\.(html?|php|asp)$" would select all files ending in .html, .htm, .php or .asp.
- "/webdocs/.*\.css$" would select only files from the "webdocs" directory ending in .css.
- The above entries will allow you to locate particular files and/or directory contents.
- Now press the "Scan" button to scan the specified search path for the desired files.
- SRG will search for and list any files that meet your criteria.
- To see a list of the files SRG found, click the "Found" tab at the bottom of the program dialog.
- To hand-edit any of the listed files, simply click its name and an editor will be launched with the desired file on display.
Phase II — Searching within files (also quite safe)
Now that we have learned how to locate files, we can examine their contents using yet another regular expression. Here is the procedure:
- Continuing from the steps in the prior section, enter a regular expression into the entry pane titled "Search For". For early practice, just type in some ordinary text you expect to see in your files.
- Press the "Search" button to search the contents of files for agreement with the entered search criteria. In this phase, SRG reads and examines each located file for correspondence with the search criteria, so this kind of search can take much longer than simply locating files by name.
- As before, SRG will list the files that contained the search expression, and you can examine the list and edit individual files as explained above.
Phase III — Searching and replacing file contents (not safe)
For this phase, some precautions are in order. Never search and replace using your only file copies, always create backups. And while learning how to use SRG, it's even a good idea to make a copy of some files that you expect to trash while learning the ropes.
- Continuing from the steps in the prior section, enter an expression into the entry pane titled "Replace With." For each file that meets the prior criteria, the "Search For" text will be replaced with this entry.
- Set some options for both search and replace:
- "Global" means replace all cases found within a file. if disabled, only the first case is replaced.
- "Case" means be selective about uppercase and lowercase. If selected, "The" and "the" are treated as separate entities in your search.
- "DotAll" defines how the regular expression metacharacter '.' is interpreted. If "DotAll" is enabled, '.' matches all characters. If disabled, '.' won't match linefeeds. To treat file content just as though it is a long, single string, enable "DotAll."
- "Multiline" changes how the line beginning and ending tokens "^" and "$" are interpreted. If Multiline is enabled, "^" and "$" mark the beginning and end of individual lines in the file, delimited by linefeeds. If disabled, these tokens mark the beginning and end of the entire file.
- "Reverse" inverts the search engine's logic and only lists files that don't have the specified search expression. This option can't be used with search and replace, because it's hard to imagine how to replace an expression you haven't found.
- "Update" gives a changed file the present date and time. There are circumstances when you don't want this behavior, when you need to retain a changed file's original date and time. To do this, disable "Update." But users have the option of updating file times by a small amount, to allow changed files to be backed up by automated methods -- this value is called "epsilon" and it is located near the beginning of the SearchReplaceGlobalPy class definition in the primary program file. Its default setting is 1 second.
- Again, while learning how to search and replace, it's a good idea to set up a directory of practice files that you expect to trash while figuring it all out.
- When you have things set up as you wish, press the "Replace" button. You will be presented with a warning dialog. Click "Yes".
- The search and replace action will commence. For each changed file, SRG will create a backup copy of the original with the same name plus an appended '~'.
- If you decide to undo the file changes, and immediately after performing just one search and replace action, press "Undo" to restore the original file contents.
- Remember: if you perform more than one search and replace action without undoing the prior action, you lose the original file contents — "Undo" is only possible immediately after a single search and replace action.
- After a search and replace action, a list of the changed files appears under the "Changed" tab. As with the results in the "Found" tab described earlier, these files can be clicked and edited.
SRG has a number of benign uses — it's not always being used to edit great numbers of files on the fly. I find myself using it to analyze my site's content — count how many times a certain term is used — or to locate files that have a given typographical error. In the latter case, I might search for a typo a Website visitor has reported, then click the filename in the "Found" list to make a correction "by hand", to avoid the risks of automated replacement.
One can also use SRG to interactively apply particular regular expressions to a test file's content, as part of the process of designing regular expressions. This activity involes selecting one file instead of many, then typing in regular expressions to test their effect on that file. Once a regular expression has proven itself in tests, it can be applied with confidence to more files. Readers also have the option of visiting my Regular Expression Laboratory for fast, interactive experiments with regular expressions.
Having a good understanding of regular expressions greatly increases the power of a tool like SRG. With practice, one can locate content using rather sophisticated search expressions, and edit content in similarly sophisticated ways. Here are some examples:
- Task: wrap bare URLs in HTML tags for use in a Web page:
- "Search" expression: \b(https?|ftp|telnet|gopher)(://)(\S+)\b
- "Replace" expression: <a href="\1\2\3">\3</a>
- Options: set Global replacement and Case insensitive
- Task: Locate plausible e-mail addresses and wrap them in "mailto:" tags:
- "Search" expression: \b([\w_-]+@[\w_-]+\.[\w_-]{2,4})\b
- "Replace" expression: <a href="mailto:\1">\1</a>
- Options: set Global replacement and Case insensitive
- This is a greatly simplified e-mail address filter. There are some extraordinarily complex e-mail filter expressions online.
- Task: convert a certain textual convention for "italic" into HTML tags:
- Some social and blogging Websites accept *this* (note the asterisks) as a syntax to suggest italic text style. Let's detect this and turn it into the equivalent HTML.
- "Search" Expression: \b\*(\w+?)\*\b
- "Replace" Expression: <i>\1</i>
- Options: set Global replacement
- The reason for escaping (\*) the asterisks is because an asterisk is normally a regular expression metacharacter — escaping it causes it to be interpreted as an ordinary character.
Some readers may wonder whether a GUI-based application like SRG is not a bit of overkill — isn't it sometimes easier to write the shortest possible script to perform certain tasks? Well, yes, I agree, and because I offer the SRG source under the GPL, I encourage readers to examine its algorithms and extract the essence for dedicated, short, search and replace scripts.
(reverse chronological order)
- 10/18/2011 Version 3.2. Fixed a bug that locked the application while a target editor window was open.
- 10/03/2011 Version 3.1. Added the ability to bump file times by a small amount when "Update" is disabled. This allows backup programs to detect and back up changed files without significantly changing a file's time.
- 10/01/2011 Version 3.0. Initial public release.
Home | | Python | | Share This Page |