A Framework for Reducing Ink-Bleed in Old Documentsmbrown/pdf/cvpr_ink_bleed_yi.pdf · Ink-bleed is a serious problem commonly found in aging handwritten documents. Ink-bleed occurs

A Framework for Reducing Ink-Bleed in Old Documents

Yi Huang† Michael S. Brown‡ Dong Xu†

†Nanyang Technological University; School of Computer Engineering; Republic of Singapore‡National University of Singapore; School of Computing; Republic of Singapore ∗

Abstract

We describe a novel application framework to reduce theeffects of ink-bleed in old documents. This task is treated asa classification problem where training-data is used to com-pute per-pixel likelihoods for use in a dual-layer MarkovRandom Field (MRF) that simultaneously labels image pix-els of the front and back of a document as either foreground,background, or ink-bleed, while maintaining the integrityof foreground strokes. Our approach obtains better resultsthan previous work without the need for assumptions aboutink-bleed intensities or extensive parameter tuning. Ouroverall framework is detailed, including front and back im-age alignment, training-data collection, and the MRF for-mulation with associated likelihoods and intra- and inter-layer cost computations.

1. IntroductionInk-bleed is a serious problem commonly found in aging

handwritten documents. Ink-bleed occurs when ink writtenon one side of a page penetrates the paper to become visi-ble on the opposite side. The severity and characteristics ofink-bleed is related to a variety of factors including the ink’schemical makeup, the paper’s physical and chemical con-struction, the amount of ink applied and the paper’s thick-ness (both spatially varying), the document’s age, and theamount of humidity in the environment housing the docu-ments. Figure 1-(a) shows examples of ink-bleed exhibitingvarious levels of severity and intensity characteristics fromfour different documents. The documents are from the samearchival collection dating from 1820-1850.

The obvious drawback of ink-bleed is the reduction inthe document’s legibility. The motivation of our work is toprovide a practical approach to reduce ink-bleed interfer-ence in imaged documents in order to improve legibility asshown in Figure 1-(b). We also strive for a solution that isapplicable in a real-world setting that considers the wide-range of ink-bleed diversity as well as practical concerns,such as the abilities of the end users.

∗Email: [email protected]; {hu0005yi|dongxu}@ntu.edu.sg

(a) (b)Figure 1. (a) Closeup of image regions from different handwrittendocuments, circa 1820-1850, suffering ink-bleed. (b) Our goal isto retain the original foreground strokes to improve legibility.

[Contribution] This paper describes a novel applicationframework to reduce ink-bleed from images of the frontand back of a document. The problem is treated as one ofclassification where image pixels are labeled as either fore-ground, ink-bleed, or background. This pixel labeling isaided by a dual-layer MRF with smoothness cost designedto reduce noise while maintaining foreground strokes in re-gions where foreground and ink-bleed overlap. Our ap-proach operates on a wide-range of ink-bleed and does notrequire assumptions about the ink-bleed intensity or exten-sive parameter tuning. All necessary components neededfor this application are presented, including front and backimage alignment, collection of training-data, and the dual-layer MRF setup with associated likelihoods and intra- andinter-layer cost computations.

The remainder of this paper is organized as follows:section 2 discusses related work; section 3 provides anoverview of our framework; section 4 details the data-costcomputations and dual-layered MRF formulation; section5 presents results including comparisons against other ap-proaches; section 6 provides a short discussion and sum-mary.

1

2. Related WorkThere is surprisingly little previous work targeting com-

plex ink-bleed. This is likely due to the difficulty in obtain-ing access to older handwritten materials that are housedunder tight regulation. Much of the previous work thatdoes exist focuses on relatively simple ink-bleed that canbe removed using variations of local or global threshold-ing (e.g. [1, 6, 10]).

Drira et al [5] presented a recent approach targetingcomplex ink-bleed that uses Principal Component Analy-sis (PCA) to first reduce the dimensionality of an RGBinput image. Pixels then are classified as foreground andbackground by iteratively clustering the PCA data into twogroups via an adaptive threshold. Tonazzini et al [14] tar-geted complex ink-bleed using blind signal separation viaIndependent Component Analysis which linearly decom-poses an RGB image into three signals assumed to be fore-ground, background, and ink-bleed. Wolf [16] recently ex-tended this idea using non-linear blind separation via anMRF framework that could separate the foreground and ink-bleed from either RGB or grayscale images.

Thresholding and source separation approaches pro-duce good results when the ink-bleed and foreground haveclearly distinguishable graylevel intensities or RGB signa-tures. Both techniques, however, suffer when the ink-bleedand foreground have similar intensities as shown in some ofthe examples in figure 1-(a). Thresholding techniques makea further assumption that the ink-bleed intensity is alwayslighter than the foreground, an often invalid assumption. Inaddition, these techniques use a single image only whichprovides limited information.

One obvious strategy to obtain more information is touse images from both the front and back side of a doc-ument. Sharma [9] demonstrated a successful two-image‘show through’ reduction approach for use in xerox imag-ing. Show-through, however, assumes global bleeding be-tween the front and back images where ink-bleed typicallyvaries spatially making it more difficult to model. The mostsignificant two-image approach targeting ink-bleed are thewavelet-based approaches introduced by Tan et al [12] andWang et al [15]. These techniques first globally align thefront and back images from which an initial classificationof the foreground and ink-bleed strokes is made using themagnitude of the image difference. Iterative filtering ofthe wavelet coefficients is used to dampen ink-bleed whilesharpening foreground pixels. While this technique pro-duces good results, six parameters must be tuned per exam-ple, including thresholds for the difference-image, damp-ening and sharpening coefficients, the number of waveletscale-levels, and the number of iterations.

Our approach is unique from previous work in severaldistinct ways. First, while previous work performs well forexamples that meet their assumptions, our approach oper-

ates on a larger range of inputs without explicit thresholdingor extensive parameter tuning. Our approach also simul-taneously corrects both the front and back images, whereother two-images approaches ( [15, 12, 9]) process each sideindividually. Lastly, we present a complete framework, in-cluding overlooked components such as the need for local-alignment of the front and back images, as well as an easyway to collect training-data.

3. Framework Overview

Our overall framework is discussed, including front andback image alignment, the feature used for classification,and training-data collection. Details to the labeling MRFare given in section 4. A brief description of our applica-tion’s usage is presented first.

3.1. Application Usage

This work is done in partnership with the NationalArchives of Singapore which houses hundreds of volumesof governmental ledgers, circa 1820-1850, that suffer fromink-bleed. Many of these ledgers have been imaged tograyscale microfilm while others are imaged upon request.Researchers of these ledgers are most often lawyers and le-gal aides who still rely on these documents in legal disputes.Only digital images of the original materials are made avail-able to users. Our application serves as a post-processingtool to help make the documents more legible. While mostusers are computer-literate they have little to no backgroundin computer vision or image processing.

3.2. Step-by-step Procedure

Our framework starts with two high-resolution images(∼2K×3K) of the two sides of a page. Images obtainedfrom microfilm are grayscale, while others are RGB scansof the original material. The pages in these volumes aretypically bound and as a result are not completely pressedflat when imaged. This non-planar imaging compoundedwith small 3D surface variations that are typical of theseolder documents makes it impossible to align the front andback images with a single global transform. While ‘flatten-ing’ techniques can remove these 3D surface variations (e.g.[4, 8, 13]), such approaches require additional 3D scanningequipment not available in mainstream imaging setups andcannot be used on existing microfilmed documents. As a re-sult, a local alignment procedure in addition to global align-ment is needed.

Considering our input images our overall procedure isas follows: 1) image alignment with local refinement; 2)training-data collection via minimal user-assistance; 3) pix-els labeling using the dual-layer MRF framework; 4) outputimage generation.

2

Local displacements

Front image

Back image(mirrored) 50% opacity of front and back shown

with and without local alignment.

(without) (with)

Figure 2. Local displacements computed between the front andback images. A zoomed inset shows front regions (black-crosses)matched to their corresponding locations in the back image (white-crosses). Overlapped image regions with and without local align-ment are shown with 50% opacity. ‘Ghosting’, visible from mis-alignment, is removed with the local alignment procedure.

3.3. Image Alignment with Local Refinement

While input images are already coarsely aligned they stillrequire an initial global alignment. To do this, the back im-age is mirrored and the location of the maximum correlationscore of the front and mirrored back image over a [−20, 20]pixel range in the horizontal and vertical direction is thentaken be the global displacement. For our examples, a trans-lation alignment suffices, however, a full affine alignmentcould be easily incorporated.

Local displacements are computed by dividing the frontimage into local windows (60× 60 for our examples). Cor-relation is performed between each front image windowwith its corresponding back location over a [−10, 10] pixelrange in the horizontal and vertical direction. The locationof the maximum correlation score for each window is takento be the local displacement. If the maximum score is be-low a minimum threshold, it is assumed that there is no lo-cal change. Using the local displacements, thin-plate-spline(TPS) interpolation [2] is used to warp the back page toalign with the front. Figure 2 shows this local-alignmentprocedure. The front and back image regions are shownoverlapped with 50% opacity. Ghosting from misalignmentis visible when local alignment is not performed; this is re-moved after TPS warping.

3.4. Ratio Feature

A good feature is crucial for classification. Given thealigned images, a ratio feature is defined as ρp = Cp

Cp′,

where Cp and Cp′ are the intensities of front image pixelp and corresponding back image pixel p′ respectively. Theback image feature, ρp′ , is the reciprocal of the front fea-ture.

This ratio feature saliently captures the difference in

the various front-back configurations, including situationswhere the ink-bleed intensity is darker or lighter than theopposite page’s foreground pixels. Difficulties can arisewhen the ink-bleed and foreground have similar intensities,resulting in a ratio close to the value 1, a value that can alsooccur in features where both front and back pixels are back-ground or foreground. Inter-layer costs in our dual-layerMRF will be used to disambiguate this situation.

While this is a simple feature, it proved to be the best dis-criminant over other all available information including fea-ture from pixel intensity (including RGB when available),pixel differences, and combinations of these that also in-cluded ratio.

3.5. Training Data Collection

Due to the ink-bleed diversity, training-data needs to beobtained for each image pair. This requires user assistancewhich is kept to a minimum using a two step procedure.The user first draws simple color-coded strokes on the frontand back images, labeling a few examples of foreground,background, and ink-bleed.

The initial user labeled training samples are too sparsefor practical use and unbalanced in terms of number oflabels, with generally many more background examplesprovided than foreground and ink-bleed. To enlarge thetraining-sets, a K-NN classifier based on the ratio feature’sdistance from the sparse user-labeled data is used to labelthe entire image, where K is the square root of the sizeof the user-labeled data. Pixel-wise confidence scores arecomputed as discussed in Eq (2) in Section 4.1. The 10%most confident pixels for each class are selected to definethe enlarged training-sets. Figure 3 shows this training-datacollection procedure.

Red = foregroundGreen = ink-bleedBlue = background

Figure 3. Initial training-data is provided via minimal user markupin the form of color strokes or points drawn over foreground (red),ink-bleed (green) and background (blue) examples in the front andback images (markup is enlarged for clarity). The training-data isenlarged using highly confident pixels labeled via the initial usermarkup.

3

Front Image

Back Image

MRF Node p

MRF Node p’

Node Edges

V1

V2 MRF Node q’

MRF Node q

V1

Figure 4. Our MRF network with associated nodes and edges.

4. MRF formulationAfter training-data collection, each image pixel is la-

beled as one of three classes: Foreground, Ink-bleed andBackground denoted as {F , I,B}. This task is formulatedas a discrete labeling MRF where each pixel, p is assigneda label lp, where lp ∈ {F , I,B} (see [7] for details to MRFformulations). The optimal label assignment is found byminimizing the following energy terms:

E = Ed + λEs, (1)

where Ed represents the data-cost energy associated withthe likelihoods of assigning a lp to each pixel and Es isa smoothness energy based on the MRF’s prior cost for as-signing neighboring pixels different label values. The scalarweight λ is set to one in our work. While this energy func-tion is standard for all MRFs, the associated likelihood (datacost) and the prior (smoothing cost) are unique for eachproblem. Details of Ed are given in section 4.1.

Our smoothness termEs is composed of intra-layer edgecosts, V1(lp, lq), that computes the cost of assigning neigh-boring pixels the labels lp and lq and inter-layer edge costs(between layer), V2(lp, lp′), that computes the cost of as-signing a label combination to pixel p and its correspond-ing pixel on the opposite layer p′. Intra-layer edges are setfor both the front and back image, thus we also have edgesV1(lp′ , lq′) as shown in figure 4. Intra-layer edge costs aredesigned to encourage consistent labels based on featureand color similarity. The inter-layer edge costs are designedto avoid invalid label configurations and aid in resolvingregions with overlapping ink. This dual-layer combina-tion proves significantly more effective at maintaining fore-ground strokes compared with using the intra-layer alone.Details to Es are given in section 4.2.

4.1. Data Cost Energy EdThe data-cost Ed is defined for both the front and back

image. Only the front is described here for example. The 1-dimensional ratio feature is normalized to be zero centeredwith standard deviation of one before the following proce-dure. For speedup, the dense training-data is clustered by

K-means, with cluster centers of each class represented as{ρFi }|Li=1, {ρIj }|Mj=1 and {ρBk }|Nk=1. While choosing the op-timal number of cluster centers is an open problem, we setL = M = N as 10% of the size of the smallest training-set.

For each pixel p, we compute the Euclidean distances(L2-norm) between ρp and all the L + M + N clustercenters and then select the top-K closest centers where Kis set as

√L+M +N . The top-K centers are denoted

as {ρm}|Km=1 and are further divided into three index setsπF ,πI and πB according to their labels. The distance be-tween ρp and the m-th cluster center ρm is computed bydpm = ||ρp − ρm||. We also denote d2

p as the mean squareddistance to the top-K centers. The similarity of pixel p toeach class is defined as:

SF =∑m∈πF

exp(−d2pm/d

2p)

SI =∑m∈πI

exp(−d2pm/d

2p) (2)

SB =∑m∈πB

exp(−d2pm/d

2p).

The data-cost term, Ed, for each label is defined as:

Ed(lp = F) =SI + SB

2× (SF + SI + SB)

Ed(lp = I) =SF + SB

2× (SF + SI + SB)(3)

Ed(lp = B) =SF + SI

2× (SF + SI + SB).

Eq (3) results in Ed ranging between zero and one, andEd(lp = F) + Ed(lp = I) + Ed(lp = B) = 1.

4.2. Smoothness term Es

As previously stated, the prior term Es is computed asedge costs within a layer and between layers giving rise to:

Es =∑p,q∈N

V1(lp, lq) +∑

p,p′∈MV2(lp, lp′), (4)

where p, q ∈ N are the within layer edges and p, p′ ∈ Mare the between layer edges. These two terms are weightedequally.

4.2.1 Intra-Layer Edge Costs

Intra-layer costs are based on the intensity difference or ra-tio difference between two intra-layer neighbors p and q.We define dρpq = ||ρp − ρq|| as the distance between p andq ratio feature. Similarly we define dcpq = ||Cp − Cq|| asthe distance between p and q pixel intensity. We normalizedρpq and dcpq to range between zero and one. To impose asmoothness constraints in the intra-layer while preserving

4

the edges between different classes, the intra-layer cost isexpressed as:

V1(lp, lq) =1

1 + (ξpq)2, (5)

where ξpq is defined in the following table:

lqlp Foreground Ink-Bleed Background

Foreground ∞ dρpq dcpqInk-Bleed dρpq ∞ dρpq

Background dcpq dρpq ∞

It is worthwhile to note that we use dcpq to define theintra-layer cost in Foreground-Background configurationbecause we observe that the intensity from the foregroundand background pixels differ more than that of the ratio fea-ture. In other configurations, we use the default ratio fea-ture. Moreover, if the neighbors have the same label, we usezero cost to enforce the smoothness constraint (the three∞in the diagonal cells result in a zero cost).

4.2.2 Inter-Layer Edge Costs

Inter-layer costs, V2(lp, lp′), are defined as:

lp′lp Foreground Ink-Bleed Background

Foreground 0 0 0Ink-Bleed 0 ∞ ∞

Background 0 ∞ 2ω

In the above table, we have a conditional constraint forthe Background-Background configuration. We set ω as 1if (Cp < C1

avg and Cp′ < C2avg), 0 otherwise, where C1

avg

and C2avg are the average intensities of the foreground pix-

els in the front and back images respectively. The back-ground pixels are usually the brightest pixels in the wholedocument, thus we assume that both front and back pixelthat have lower intensity (i.e. darker) are not likely to be aBackground-Background configuration and a small weightof 2 is used as penalization. We use three infinity penal-ties in the above table because these three cases will neverhappen. If one pixel is labeled as ink-bleed in one side,the corresponding pixel in the opposite image can only beforeground. All other configurations are possible and aretherefore assigned a zero cost.

4.3. Minimizing the Objective Function Energy

We use the Graph-cuts approach described by Boykovand Kolmogorov [3] to minimize our global objective func-tion stated in Eq (1). The Middlebury’s MRF code providedby [11] is modified to incorporate our dual-layer configura-tion. In all of our experiments, the optimization convergeswithin 5-6 iterations.

5. Results

We compared our approach with the single-image adap-tive thresholding approach [5]1, the front and back imagewavelet-based approach [15], and a single-layer MRF basedon our data-cost and intra-layer cost formulations. Markupvaries per example, but generally consists of 5− 15 strokesor points drawn on both the front and back images. Pro-cessed images are roughly 2K×3K in resolution. Pixelslabeled as foreground are shown with the input images in-tensity, all other pixels are set to the mean intensity of thebackground training-labels.

Figure 5 shows sub-regions from four examples that rep-resent a reasonably diverse range of ink-bleed. Shown arethe front and back input and our results, as well as a compar-ison of the other approaches on the front image only whichare combined into a single image partitioned as follows:(top) single-image adaptive thresholding, (middle) singlelayer MRF and (bottom) two-image wavelet approach. Fig-ure 6 shows a full-page example with comparisons of se-lected regions shown at the bottom. For all examples ourapproach provides subjectively the best results. The waveletapproach [15] produces comparable results in some casesbut requires six-parameters to be tuned per example.

Quantitative results were obtained by counting the num-ber of errors observed in the output images. Errors areconsidered any foreground word not detected correctly orany background/ink-bleed detected as foreground. Sinceground-truth is not available, our quantitative results are stillsubjective as errors are decided by a human observer. How-ever, in a best effort for fairness we found that our methodhad a precision accuracy of 85.96%, compared to 63.70%for [5], 71.94% for [15], and 75.00% for the single-layeredMRF. Precision is defined as (W−WF )/(W+WB), whereW is the total number of foreground words,WF is the num-ber of incorrectly classified foreground words, and WB arethe number of stroke-size background or ink-bleed regionsthat were classified as foreground. Precision was computedfor 10 full-page images with a total of 4896 words.

6. Discussion and Summary

Our results demonstrate the effectiveness of our ap-proach for reducing ink-bleed. As with all supervised learn-ing techniques training-data is needed. Our two step proce-dure for labeling data requires minimal user markup. Whilemarkup differs from input to input, only a dozen or soquickly drawn strokes are typically used per image. It isarguable that user-assisted markup is similar to setting pa-rameters or thresholds, but we note that it is much easier forthe domain user to specify image examples in lieu of tuningalgorithmic parameters. Furthermore, in a long-term work-

1The PCA step of this algorithm is omitted for grayscale input images.

5

Ex. I

Ex. II

Ex. III

Ex. VI

(a) (b) (c) (d) (e)Figure 5. [Examples I-II are from microfilm; Examples III-IV are from RGB scans.] (a-b) Front and back with markup. The back markupis mirrored for clarity. (c-d) Output obtained using our dual-layered MRF approach. (e) Comparisons with adaptive thresholding [5], singlelayer MRF and wavelet [15] in top-middle-bottom format. Red lines separate the different results. Note that sub-regions are chosen toshow some markup, however, the entire markup is not shown.

ing scenario it should be possible to use previously collectedtraining-data on new pages with similar ink-bleed.

The documents targeted in our work have the same colorink throughout a page, however, our framework can beused on other types of documents exhibiting different back-ground and foreground characteristics (e.g. multi-coloredink). Furthermore, different features for different documenttypes and even classifiers (e.g. SVM) could be used in thelikelihood computation with the same overall frameworkand dual-layer MRF setup remaining intact. Many old doc-uments exhibit problems in addition to ink-bleed; e.g. thewater-stains in Figure 5. Our classification approach is in-herently more robust to these types of problems.

Lastly, we note that additional semantic informationsuch as stroke direction or character characteristics were notused in our MRF. This was purposely done to circumventtailoring our solution to the examples at hand. Incorporationof higher-level semantics into this framework undoubtedlydeserves further consideration.

In summary, we have presented a novel framework forreducing ink-bleed. This framework provides a practicalapproach to ink-bleed removal that can target a wide rangeof examples and is suitable for use in a real-world setting.

AcknowledgementsWe gratefully acknowledge the support of our colleagues

from the National Archives of Singapore. This work wassupported by the ASTAR SERC Grant No:0521010104.

References[1] J. Bescos. Image processing algorithms for readability en-

hancement of old manuscripts. Electronic Imaging, 1:392–397, 1989.

[2] F. Bookstein. Principal warps: Thin-plate splines andthe decomposition of deformations. IEEE Trans. PAMI,11(6):567–585, June 1989.

[3] Y. Boykov and V. Kolmogorov. An experimental comparisonof min-cut/max-flow algorithms for energy minimization invision. IEEE Trans. PAMI, 26(9):1124–1137, 2004.

[4] M. Brown and W. B. Seales. Image restoration of arbitrarilywarped document. IEEE Trans. PAMI, 26(10):1295–1306,2004.

[5] F. Drira, F. L. Bourgeois, and H. Emptoz. Restoring inkbleed-through degraded document images using a recursiveunsupervised classification technique. In Document AnalysisSystems (DAS’06), pages 38–49, 2006.

[6] F. C. M. et al. Toward on-line, worldwide access to vatican li-brary materials. IBM Journal of Research and Development,40(2):139–162, March 1996.

[7] S. Li. Markov Random Field Modeling in Image Analysis(2nd Edition). Springer-Verlag, 2001.

[8] M. Pilu. Undoing paper curl distortion using applicable sur-faces. In CVPR’01.

[9] G. Sharma. Show-through cancellation in scans of du-plex printed documents. IEEE Trans. on Image Processing,10(5):736–754, 2001.

[10] Z. Shi and V. Govindaraju. Historical document image en-hancement using background light intensity normalization.In ICPR’04.

6

Figure 6. A full page example (RGB scan) with only the front is shown (top) with the complete user markup. Comparisons with othertechniques are shown for three selected regions. Our dual-layer MRF approach produces the best results.

[11] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kol-mogorov, A. Agarwala, M. Tappen, and C. Rother. Com-parative study of energy minimization methods for markovrandom fields. In ECCV’06.

[12] C. L. Tan, R. Cao, and P. Shen. Restoration of archival doc-uments using a wavelet technique. IEEE Trans. on PAMI,24(10):1399–1404, Oct 2002.

[13] C. L. Tan, Z. Li, Z. Zhang, and T. Xia. Restoring warpeddocument images through 3d shape modeling. IEEE Trans.

PAMI, 28(2):195–208, 2006.[14] A. Tonazzini, L. Bedini, and E. Salerno. Independent compo-

nent analysis for document restortion. International Journalon Document Analysis and Recognition, 7:17–27, 2004.

[15] Q. Wang, T. Xia, L. Li, and C. Tan. Document image en-hancement using directional wavelet. In CVPR’03.

[16] C. Wolf. Document ink bleed-through removal with two hid-den markov random fields and a single observation field. InTechnical Report RR-LIRIS-2006-019, 2006/2007.

7