Home | | Python | | Share This Page |
A beautifier for Bash shell scripts written in Python
— P. Lutus — Message Page —
Copyright © 2011, P. Lutus
(double-click any word to see its definition)
This is the second Bash script beautifier I have written — the first was written in Ruby and it's become pretty well-known. But since that time, for those tasks where it's appropriate, I have decided to program in Python instead of Ruby, and eventually I decided to rewrite the Bash beautifier and clean up some annoying inconsistencies in the process.
Beautifying Bash scripts is not trivial. Bash scripts aren't like C or Java programs — they have a lot of ambiguous syntax, and (shudder) keywords can be used as variables. Years ago, while testing the first version of this program, I encountered this example:
done=3;echo done;doneSame name, but three distinct meanings (sigh). The Bash interpreter can sort out this perversity, but I decided not to try to recreate the Bash interpreter just to beautify a script. This means there will be some border cases this Python program won't be able to process. But in tests with many large Linux system Bash scripts, its error-free score was roughly 99%.
BeautifyBash has three modes of operation:
- If presented with a list of file names —
beautify_bash.py file1.sh file2.sh file3.sh— for each file name, it will create a backup (i.e. file1.sh~) and overwrite the original file with a beautified replacement.- If given '-' as a command-line argument, it will use stdin as its source and stdout as its sink:
beautify_bash.py - < infile.sh > outfile.sh- If called as a module, it will behave itself and not execute its main() function:
#!/usr/bin/env python # -*- coding: utf-8 -*- from beautify_bash import BeautifyBash [ ... ] result,error = BeautifyBash().beautify_string(source)BeautifyBash handles Bash here-docs very carefully (and there are probably some border cases it doesn't handle). The basic idea is that the originator knew what format he wanted in the here-doc, and a beautifier shouldn't try to outguess him. So BeautifyBash does all it can to pass along the here-doc content unchanged:
if true then echo "Before here-doc" # Insert 2 lines in file, then save. #--------Begin here document-----------# vi $TARGETFILE <<x23LimitStringx23 i This is line 1 of the example file. This is line 2 of the example file. ^[ ZZ x23LimitStringx23 #----------End here document-----------# echo "After here-doc" fiAs written, BeautifyBash can beautify large numbers of Bash scripts when called from ... well, among other things, a Bash script:
#!/bin/sh for path in `find /path -name '*.sh'` do bash_beautify.py $path doneAs well as the more obvious example:
$ beautify_bash.py *.shCAUTION: Because BeautifyBash overwrites all the files submitted to it, this could have disastrous consequences if the files include some of the increasingly common Bash scripts that have appended binary content (a regime where BeautifyBash's behavior is undefined). So please — back up your files, and don't treat BeautifyBash as though it is a harmless utility. That's only true most of the time.
Licensing, Source
BeautifyBash is released under the GNU General Public License.
Here is the plain-text source file without line numbers.
Revision History
- Version 1.0 04/14/2011. Initial Public Release.
Program Listing
1: #!/usr/bin/env python 2: # -*- coding: utf-8 -*- 3: 4: #************************************************************************** 5: # Copyright (C) 2011, Paul Lutus * 6: # * 7: # This program is free software; you can redistribute it and/or modify * 8: # it under the terms of the GNU General Public License as published by * 9: # the Free Software Foundation; either version 2 of the License, or * 10: # (at your option) any later version. * 11: # * 12: # This program is distributed in the hope that it will be useful, * 13: # but WITHOUT ANY WARRANTY; without even the implied warranty of * 14: # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * 15: # GNU General Public License for more details. * 16: # * 17: # You should have received a copy of the GNU General Public License * 18: # along with this program; if not, write to the * 19: # Free Software Foundation, Inc., * 20: # 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * 21: #************************************************************************** 22: 23: import re, sys 24: 25: PVERSION = '1.0' 26: 27: class BeautifyBash: 28: 29: def __init__(self): 30: self.tab_str = ' ' 31: self.tab_size = 2 32: 33: def read_file(self,fp): 34: with open(fp) as f: 35: return f.read() 36: 37: def write_file(self,fp,data): 38: with open(fp,'w') as f: 39: f.write(data) 40: 41: def beautify_string(self,data,path = ''): 42: tab = 0 43: case_stack = [] 44: in_here_doc = False 45: defer_ext_quote = False 46: in_ext_quote = False 47: ext_quote_string = '' 48: here_string = '' 49: output = [] 50: line = 1 51: for record in re.split('\n',data): 52: record = record.rstrip() 53: stripped_record = record.strip() 54: 55: # collapse multiple quotes between ' ... ' 56: test_record = re.sub(r'\'.*?\'','',stripped_record) 57: # collapse multiple quotes between " ... " 58: test_record = re.sub(r'".*?"','',test_record) 59: # collapse multiple quotes between ` ... ` 60: test_record = re.sub(r'`.*?`','',test_record) 61: # collapse multiple quotes between \` ... ' (weird case) 62: test_record = re.sub(r'\\`.*?\'','',test_record) 63: # strip out any escaped single characters 64: test_record = re.sub(r'\\.','',test_record) 65: # remove '#' comments 66: test_record = re.sub(r'(\A|\s)(#.*)','',test_record,1) 67: if(not in_here_doc): 68: if(re.search('<<-?',test_record)): 69: here_string = re.sub('.*<<-?\s*[\'|"]?([_|\w]+)[\'|"]?.*','\\1',stripped_record,1) 70: in_here_doc = (len(here_string) > 0) 71: if(in_here_doc): # pass on with no changes 72: output.append(record) 73: # now test for here-doc termination string 74: if(re.search(here_string,test_record) and not re.search('<<',test_record)): 75: in_here_doc = False 76: else: # not in here doc 77: if(in_ext_quote): 78: if(re.search(ext_quote_string,test_record)): 79: # provide line after quotes 80: test_record = re.sub('.*%s(.*)' % ext_quote_string,'\\1',test_record,1) 81: in_ext_quote = False 82: else: # not in ext quote 83: if(re.search(r'(\A|\s)(\'|")',test_record)): 84: # apply only after this line has been processed 85: defer_ext_quote = True 86: ext_quote_string = re.sub('.*([\'"]).*','\\1',test_record,1) 87: # provide line before quote 88: test_record = re.sub('(.*)%s.*' % ext_quote_string,'\\1',test_record,1) 89: if(in_ext_quote): 90: # pass on unchanged 91: output.append(record) 92: else: # not in ext quote 93: inc = len(re.findall('(\s|\A|;)(case|then|do)(;|\Z|\s)',test_record)) 94: inc += len(re.findall('(\{|\(|\[)',test_record)) 95: outc = len(re.findall('(\s|\A|;)(esac|fi|done|elif)(;|\)|\||\Z|\s)',test_record)) 96: outc += len(re.findall('(\}|\)|\])',test_record)) 97: if(re.search(r'\besac\b',test_record)): 98: if(len(case_stack) == 0): 99: sys.stderr.write( 100: 'File %s: error: "esac" before "case" in line %d.\n' % (path,line) 101: ) 102: else: 103: outc += case_stack.pop() 104: # sepcial handling for bad syntax within case ... esac 105: if(len(case_stack) > 0): 106: if(re.search('\A[^(]*\)',test_record)): 107: # avoid overcount 108: outc -= 2 109: case_stack[-1] += 1 110: if(re.search(';;',test_record)): 111: outc += 1 112: case_stack[-1] -= 1 113: # an ad-hoc solution for the "else" keyword 114: else_case = (0,-1)[re.search('^(else)',test_record) != None] 115: net = inc - outc 116: tab += min(net,0) 117: extab = tab + else_case 118: extab = max(0,extab) 119: output.append((self.tab_str * self.tab_size * extab) + stripped_record) 120: tab += max(net,0) 121: if(defer_ext_quote): 122: in_ext_quote = True 123: defer_ext_quote = False 124: if(re.search(r'\bcase\b',test_record)): 125: case_stack.append(0) 126: line += 1 127: error = (tab != 0) 128: if(error): 129: sys.stderr.write('File %s: error: indent/outdent mismatch: %d.\n' % (path,tab)) 130: return '\n'.join(output), error 131: 132: def beautify_file(self,path): 133: error = False 134: if(path == '-'): 135: data = sys.stdin.read() 136: result,error = self.beautify_string(data,'(stdin)') 137: sys.stdout.write(result) 138: else: # named file 139: data = self.read_file(path) 140: result,error = self.beautify_string(data,path) 141: if(data != result): 142: # make a backup copy 143: self.write_file(path + '~',data) 144: self.write_file(path,result) 145: return error 146: 147: def main(self): 148: error = False 149: sys.argv.pop(0) 150: if(len(sys.argv) < 1): 151: sys.stderr.write('usage: shell script filenames or \"-\" for stdin.\n') 152: else: 153: for path in sys.argv: 154: error |= self.beautify_file(path) 155: sys.exit((0,1)[error]) 156: 157: # if not called as a module 158: if(__name__ == '__main__'): 159: BeautifyBash().main() 160:
Home | | Python | | Share This Page |