Java Expanding and Compressing Tabs
Problem
You need to convert space characters to tab characters in a file, or vice versa. You
might want to replace spaces with tabs to save space on disk, or go the other way to
deal with a device or program that can’t handle tabs.
Solution
Use my Tabs class or its subclass EnTab.
Explained
EnTab, complete with a sample main program. The program
works a line at a time. For each character on the line, if the character is a space,
we see if we can coalesce it with previous spaces to output a single tab character.
This program depends on the Tabs class, which we’ll come to shortly. The Tabs class
is used to decide which column positions represent tab stops and which do not. The
code also has several Debug printouts.
Entab.java /** * EnTab: replace blanks by tabs and blanks. Transmuted from K&R Software Tools * book into C. Transmuted again, years later, into Java. Totally rewritten to * be line-at-a-time instead of char-at-a-time. * * @author Ian F. Darwin, http://www.darwinsys.com/ * @version $Id: ch03,v 1.3 2004/05/04 18:03:14 ian Exp $ */ public class EnTab { /** The Tabs (tab logic handler) */ protected Tabs tabs; /** * Delegate tab spacing information to tabs. * * @return */ public int getTabSpacing( ) { return tabs.getTabSpacing( ); } /** * Main program: just create an EnTab object, and pass the standard input * or the named file(s) through it. */ public static void main(String[] argv) throws IOException { EnTab et = new EnTab(8); if (argv.length == 0) // do standard input et.entab( new BufferedReader(new InputStreamReader(System.in)), System.out); else for (int i = 0; i < argv.length; i++) { // do each file et.entab( new BufferedReader(new FileReader(argv[i])), System.out); } } /** * Constructor: just save the tab values. * * @param n * The number of spaces each tab is to replace. */ public EnTab(int n) { tabs = new Tabs(n); } public EnTab( ) { tabs = new Tabs( ); } /** * entab: process one file, replacing blanks with tabs. * * @param is A BufferedReader opened to the file to be read. * @param out a PrintWriter to send the output to. */ public void entab(BufferedReader is, PrintWriter out) throws IOException { String line; int c, col = 0, newcol; // main loop: process entire file one line at a time. while ((line = is.readLine( )) != null) { out.println(entabLine(line)); } } /** * entab: process one file, replacing blanks with tabs. * * @param is A BufferedReader opened to the file to be read. * @param out A PrintStream to write the output to. */ public void entab(BufferedReader is, PrintStream out) throws IOException { entab(is, new PrintWriter(out)); } /** * entabLine: process one line, replacing blanks with tabs. * * @param line - * the string to be processed */ public String entabLine(String line) { int N = line.length( ), outCol = 0; StringBuffer sb = new StringBuffer( ); char ch; int consumedSpaces = 0; for (int inCol = 0; inCol < N; inCol++) { ch = line.charAt(inCol); // If we get a space, consume it, don't output it. // If this takes us to a tab stop, output a tab character. if (ch == ' ') { Debug.println("space", "Got space at " + inCol); if (!tabs.isTabStop(inCol)) { consumedSpaces++; } else { Debug.println("tab", "Got a Tab Stop "+ inCol); sb.append('\t'); outCol += consumedSpaces; consumedSpaces = 0; } continue; } // We're at a non-space; if we're just past a tab stop, we need // to put the "leftover" spaces back out, since we consumed // them above. while (inCol-1 > outCol) { Debug.println("pad", "Padding space at "+ inCol); sb.append(' '); outCol++; } // Now we have a plain character to output. sb.append(ch); outCol++; } // If line ended with trailing (or only!) spaces, preserve them. for (int i = 0; i < consumedSpaces; i++) { Debug.println("trail", "Padding space at end # " + i); sb.append(' '); } return sb.toString( ); } }
As the comments state, this code was patterned after a program in Kernighan and
Plauger’s classic work, Software Tools. While their version was in a language called
RatFor (Rational Fortran), my version has since been through several translations.
Their version actually worked one character at a time, and for a long time I tried to
preserve this overall structure. For this edition of the book, I finally rewrote it to be a
line-at-a-time program.
The program that goes in the opposite direction—putting tabs in rather than taking
them out—is the DeTab class; only the core methods are
shown.
DeTab.java public class DeTab { Tabs ts; // iniitialized in Constructor public static void main(String[] argv) throws IOException { DeTab dt = new DeTab(8); dt.detab(new BufferedReader(new InputStreamReader(System.in)), new PrintWriter(System.out)); } /** detab one file (replace tabs with spaces) * @param is - the file to be processed * @param out - the updated file */ public void detab(BufferedReader is, PrintWriter out) throws IOException { String line; char c; int col; while ((line = is.readLine( )) != null) { out.println(detabLine(line)); } } /** detab one line (replace tabs with spaces) * @param line - the line to be processed * @return the updated line */ public String detabLine(String line) { char c; int col; StringBuffer sb = new StringBuffer( ); col = 0; for (int i = 0; i < line.length( ); i++) { // Either ordinary character or tab. if ((c = line.charAt(i)) != '\t') { sb.append(c); // Ordinary ++col; continue; } do { // Tab, expand it, must put >=1 space sb.append(' '); } while (!ts.isTabStop(++col)); } return sb.toString( ); } }
The Tabs class provides two methods, settabpos( ) and istabstop( ). Example 3-8
lists the source for the Tabs class.
Tabs.java public class Tabs { /** tabs every so often */ public final static int DEFTABSPACE = 8; /** the current tab stop setting. */ protected int tabSpace = DEFTABSPACE; /** The longest line that we worry about tabs for. */ public final static int MAXLINE = 250; /** the current tab stops */ protected boolean[] tabstops; /** Construct a Tabs object with a given tab stop settings */ public Tabs(int n) { if (n <= 0) n = 1; tabstops = new boolean[MAXLINE]; tabSpace = n; settabs( ); } /** Construct a Tabs object with a default tab stop settings */ public Tabs( ) { this(DEFTABSPACE); } /** settabs - set initial tab stops */ private void settabs( ) { for (int i = 0; i < tabstops.length; i++) { tabstops[i] = ((i+1) % tabSpace) == 0; Debug.println("tabs", "Tabs[" + i + "]=" + tabstops[i]); } } /** * @return Returns the tabSpace. */ public int getTabSpacing( ) { return tabSpace; } /** isTabStop - returns true if given column is a tab stop. * If current input line is too long, we just put tabs wherever, * no exception is thrown. * @param col - the current column number */ public boolean isTabStop(int col) { if (col > tabstops.length-1) { tabstops = new boolean[tabstops.length * 2; settabs( ); } return tabstops[col]; } }
No comments:
Post a Comment