27
SureInterview PREPARATION ON ALGORITHM http://www.sureinterview.com Mar 6, 2011 The latest version can be found at http://code.google.com/p/sureinterview/downloads

Sure interview algorithm-1103

Embed Size (px)

Citation preview

Page 1: Sure interview algorithm-1103

SureInterview

PREPARATION ON ALGORITHM

http://www.sureinterview.com

Mar 6, 2011

The latest version can be found at http://code.google.com/p/sureinterview/downloads

Page 2: Sure interview algorithm-1103

Search

Search

Application

Design data structure and algorithm for interactive spell checker, should provides correct candidates.

Design data structure and algorithm for interactive spell checker, should provides correct candidates.

Check how to write a spelling corrector, and how to improve performance of generating potential candidates.

millions of book, how to find duplicates(book title may have error)

millions of book, how to find duplicates(book title may have error)

submit my answer

Page 3: Sure interview algorithm-1103

SearchIn which case O(n^2) is better than O(nlgn)?

In which case O(n2) is better than O(nlgn)?

Consider following aspects:

saving spacemore functionsimpler code

Page 4: Sure interview algorithm-1103

Search

Search

Search

Dynamic programming

Longest Common Subsequence

Finding the longest common subsequence of two sequences of items.

For example, the longest common subsequence of following two sequences is ACAAA.

A b C d A A AA C e A f A A

Check longest common subsequence (@algorithmist) for some general description, and Dynamic programming and sequencealignment (@IBM) for detail explanation and examples.

As a quick recap, a m*n matrix is used to bookmark the current alignment result. Ai,j is calculated from previous 3 adjacentalignment result, with different score/penalty considered.

There are two steps in the longest common subsequence (or, alignment algorithm).

1. Find the length or score of the longest common subsequence. That is, calculate from A0,0 to Am-1,n-1.2. Trace back from Am-1,n-1 to A0,0 to find the exact alignment.

Given a n stair case of n levels, step up one or two levels at a time.

1. How many ways to get to 10th stair?2. Print all the possible path?3. Only {1,3,5} levels are allowed?

submit my answer

Knapsack problem

Given some items, pack the knapsack to get the maximum total value. Each item has some weight and some value. Total weightthat we can carry is no more than some fixed number W.

This slide [1] illustrates the idea of dynamic programming using knapsack problem as an example.

The similar idea applies to more general combination problems. Please check out the way combination is progressively calculatedin Yang_Hui's_Triangle.

References

1. ↑ Dynamic Programming

Page 5: Sure interview algorithm-1103

Search

Search

Search

How many different binary trees with n nodes ?

Different topology are counted as different. For example, following two trees are treated as different.

Tree 1: o /o

Tree 2:o \ o

submit my answer

sub-matrix sum

Given a matrix of integers. How to calculate the sum of a sub-matrix. A sub-matrix is represented by x, y, h, w, where x and yis the position of the upper-left corner, h is the height, w is the width.

int[][] matrix int sum(int x, int y, int h, int w){...}

2. if this function is called frequently on the same matrix, how to optimise it?

orig

submit my answer

viewable blocks

You are given N blocks of height 1..N. In how many ways can you arrange these blocks in a row such that when viewed fromleft you see only L blocks (rest are hidden by taller blocks) and when seen from right you see only R blocks?

Example given N=3, L=2, R=1 there is only one arrangement {2, 1, 3} while for N=3, L=2, R=2 there are two ways {1, 3, 2}and {2, 3, 1}.

General Idea

Reduce the size of the problem

By examine the test cases, we know the size of the problem can be reduced by fixing the smallest element.

Suppose the number of combination is F(N, L, R). If the shortest block is on left most position, there are F(N-1, L-1, R) ways ofcombination. Similarly, right most position gives F(N-1, L, R-1) ways of combination. Taking out this shortest block, it will beF(N-1, L, R). And there is N-2 positions to put this shortest block back. So, we have

F(N, L, R) = F(N-1, L-1, R) + F(N-1, L, R-1) + (N-2)*F(N-1, L, R)Divide the problem

The tallest block divides the blocks into two parts. Both the left part and right parts can be calculated independently in a similarfashion.

Now, Consider a simplified problem that we only look at the blocks from left. Using the similar logic in both-side version, wehave:

G(N, L) = G(N-1, L-1) + (N-1)*G(N-1, L)

Back to the original problem, if the tallest block is on the left most position, we have:

F(N-1, 1, R-1) = G(N-1, R-1)

If it is on position i ( 2 <= i <= N-1 ), we have:

F(N-1, L-1, R-1) = C(N-1, i-1) * (G(i-1, L-1) * G(N-i, R-1))

C(N, K) is the number of combination taking K out N elements. (Note that C(N, K) = C(N-1, K-1) + C(N-1, K).)

Finally, F can be obtained by combining all those different conditions.

Calculating F through G is better in that G is one dimensional, which saves both time and space. F can be calculated on-the-fly.

Page 6: Sure interview algorithm-1103

SearchFibonacci number

Fibonacci number[1] is defined as: With seed values

1. Coding to recursively calculate . 2. How to optimize the solution from recursive version.

References

1. ↑ wiki

The recursive Version

Translate the recursive description of an algorithm to code is a fundamental skill.

Dynamic Programming Version

Dynamic Programming[1] is essentially a technique optimizing recursive problem calculation by caching the solution of questionsof smaller size. Fibonacci number is used as an example by wiki explaining how DP works[2].

Unless the subproblems are not overlapped, caching the result of subproblem will more or less accelerate the solution byavoiding duplicated calculation. There are two ways to do this, top down and bottom up. The top down version needs to cachethe result and all parameters that affect the result. The bottom up version starts from the smallest problem and graduallyapproaches to the final solution.

All subproblem might not be useful in calculating the bigger problem. So, the top down version might be more efficient in that itcalculate the subproblems on demand. But its time complexity is not easy to analysis. Try the bottom up version in stead whenpossible.

This is the bottom up fashion of Fibonacci number calculation.

The DP algorithm usually is very space demanding. So, we can go back and identify which item is not used and revoke thespace. For example, in this case, actually only two items are necessary for the further calculation, we can easily rewrite the DP ina form of O(1) space.

123456789

10111213141516

/** * Fibonacci number. Recursive version. * * @param n * @return Fn */long Fn_recursive(int n) { // base if (n <= 0) return 0; if (n == 1) return 1L; // induction return Fn_recursive(n - 1) + Fn_recursive(n - 2);}

123456789

101112131415161718192021

/** * Fibonacci number -- DP version. * * @param n * @return Fn */long Fn_non_recursive1(int n) { if (n <= 0) return 0; if (n == 1) return 1L; long f[] = new long[n + 1]; f[0] = 0; f[1] = 1; for (int i = 2; i <= n; i++) { f[i] = f[i - 1] + f[i - 2]; } return f[n];}

123456789

10

/** * Fibonacci number. DP version with space reduced. * * @param n * @return Fn */long Fn_non_recursive2(int n) { if (n <= 0) return 0; if (n == 1)

?

?

?

Page 7: Sure interview algorithm-1103

Search

Search

The top-down version is to fill the array recursively.

Other Versions

There are also other methods to calculate Fibonacci number. But those methods are not much more than tricks that specific to

the Fibonacci number. For example, the close form is[3]: It can be calculation in O(lgn)

time.

Reverse print the sequence

Because f(n)=f(n-1)+fn(-2), we have f(n-2)=f(n)-f(n-1). We just needs two numbers and back trace to f(0). In the process, nonew variables are needed.

Code

Code can be found at: http://code.google.com/p/sureinterview/source/browse/test/solution/dp/FibonacciNumber.java

Reference

1. ↑ Dynamic programming wiki2. ↑ Fibonacci number and DP3. ↑ the close form of Fibonacci number

The Maximal Rectangle Problem

Given: A two-dimensional array b (M rows, N columns) of Boolean values ("0" and "1").

Required: Find the largest (most elements) rectangular subarray containing all ones.

submit my answer

nuts in an oasis

A pile of nuts is in an oasis, across a desert from a town. The pile contains 'N' kg of nuts, and the town is 'D' kilometers awayfrom the pile. The goal of this problem is to write a program that will compute 'X', the maximum amount of nuts that can betransported to the town.

The nuts are transported by a horse drawn cart that is initially next to the pile of nuts. The cart can carry at most 'C' kilogramsof nuts at any one time. The horse uses the nuts that it is carrying as fuel. It consumes 'F' kilograms of nuts per kilometertraveled regardless of how much weight it is carrying in the cart. The horse can load and unload the cart without using up anynuts.

Your program should have a function that takes as input 4 real numbers D,N,F,C and returns one real number: 'X'

Suppose the pile of nuts can afford the cart go back and forth R rounds fully loaded. We have

C-D*F + R*(C-2*D*F) <= N

R has to be an integer number. So,

R = floor((N - (C-D*F))/(C-2*D*F))

Let f(R)=C-D*F + R*(C-2*D*F)

Then X = max(f(R), f(R+1))

101112131415161718192021222324

if (n == 1) return 1L; long fn2 = 0; long fn1 = 1; long fn = fn1 + fn2; for (int i = 2; i <= n; i++) { fn = fn1 + fn2; fn2 = fn1; fn1 = fn; } return fn;}

Page 9: Sure interview algorithm-1103

Search

Search

Search

Swap and order

calculate the number of inversions.

Each user ranks N songs in order of preference. Given a preference list, find the user with the closest preferences. Measure"closest" according to the number of inversions. Devise an N log N algorithm for the problem.

submit my answer

transpose a two line matrix with O(n) time and O(1) space.

orig

submit my answer

Reverse the bits in a byte

Reverse the bits in a byte

http://graphics.stanford.edu/~seander/bithacks.html#ReverseParallel

Page 10: Sure interview algorithm-1103

Search

Search

Search

Sort and Search

Iterative Mergesort (not recursive)

Iterative Mergesort (not recursive).

analysis the time complexity.

General idea

The merge sort has two parts as shown in the pseudocode.

function merge_sort(m) if length(m) ? 1 return m var list left, right, result var integer middle = length(m) / 2 for each x in m up to middle add x to left for each x in m after middle add x to right left = merge_sort(left) right = merge_sort(right) result = merge(left, right) return resultThe recursive part merge_sort(m) is to bookmark the segments to merge. The function merge(left,right) is to merge the twosorted segments to a larger one.

The merge_sort can be simply done iteratively in a bottom up fashion. For example, for a list of 9 elements,

0 1 2 3 4 5 6 7 8 <xxx0 <-> xxx1 \ / \ / \ / \ / / 0 2 4 6 8 <xx00 <-> xx10 = 0000<->0010, 0100<->0110, etc with step 2. \ / \ / / 0 4 8 <x000 <-> x100 \ / / 0 8 <0000 <-> 1000 \ / 0 done

So, it is just a matter of counting the indexes with corresponding steps.

compare different sort methods

when to use merge sort and when to use quick sort

submit my answer

Find a number in a rotated sorted array.

Given an sorted array but rotated, for example, {4, 5, 0, 1, 2, 3}.

Find an element in the element and analysis the complexity.

General Idea

Two observations on the rotated array:

1. There is a breakpoint in the rotated array, which is also the single point that makes the array not fully sorted. Byappending the first sorted subarray {4,5} to the end, this search problem is transformed to a normal binary search.

2. In the normal binary search, for a given mid index, we know the value being searched falls on the left or right in constant

Page 11: Sure interview algorithm-1103

time. This is still true for the rotated sorted array.

Each of these observation can lead to a different solution.

1. Find the breakpoint

For a fully sorted array, if i < j, we have a[i] <= a[j]. Because the array is rotated, we will have a[i] >= a[j] instead, if there is abreakpoint between i and j. For example, in the array {4,5,0,1}, we have a[0]<a[3]. So, what we need is to find the breakpointin O(logn) time.

Following code find the breakpoint in O(logn) time on average. The complete source code can be found at here.

Note that the special case of {1,0,1,1,1,1}, which drags down the performance to O(n) in worst case.

Given the position of the breakpoint, it is trivial to do a binary search to find the key. The only place worth noting is to extendthe array to the right, so the normal binary search can kick in. The code below illustrates how to use binary search in therotated array given breakpoint.

2. Direct binary search

Suppose there is a rotated sorted array {4, 5, 0, 1, 2, 3, 4}. We want to find the key 0.

4 5 0 1 2 3 4^ ^ ^lo mid hi

The first mid value picked out is 1. It is obvious that all values in [4, +inf), (-inf,1] should go to the left part and valuesbetween [1, 4] should go to the right part. So, the next step is to search 0 in subarray {4, 5, 0}. And so on.

The direct binary search is based on this observation. The implementation has been discussed several times.[1] [2][3] And the

123456789

10111213141516171819202122232425262728

// binary search for the break point //// the size of the final subarray is 2. So, when this loop terminates,// rotArr[lo] > rotArr[hi] and lo + 1 = high;// Of course, this loop can be modified to terminate when final sub array is empty.//while (lo + 1 < hi) { int mid = (lo + hi) / 2; if (rotArr[lo] > rotArr[mid]) { // The lower part has the break point. hi = mid; } else if (rotArr[lo] < rotArr[mid]) { // The higher part has the break point. lo = mid; } else { /* * when rotArr[lo] == rotArr[mid] == rotArr[hi], we cannot tell * which part has the break point, try each element instead. */ if (rotArr[lo] > rotArr[lo + 1]) { start.setValue(lo + 1); // check if lo is the break point. end.setValue(lo); return; } lo++; }}

123456789

1011121314151617181920212223242526272829

int find(Integer[] rotArr, Integer key, Integer lo, Integer hi) { // if the array has a break point, mapping the hi index to the right. // for example, if the rotated array is {3,4,1,2}, imagine the array is // extended as {3,4,1,2, 3,4,1,2} int length = rotArr.length; if (hi < lo) { hi += length; } // binary search. while (hi >= lo) { // the final sub array is empty. int mid = (lo + hi) / 2; Integer midVal = rotArr[mid % length]; // found the key if (midVal == key) return mid % length; if (midVal > key) { // lower the high boundary. hi = mid - 1; } else { // raise the low boundary. lo = mid + 1; } } // when return, the subarray rotArr[ lo...hi ] is empty. return -1;}

?

?

Page 12: Sure interview algorithm-1103

Search

Search

Search

complete code can be found here.

Note the special case of {1,0,1,1,1,1}.

Important

This problem is a good example of binary search. Make sure the code looks clean and bug free.

References

1. ↑ http://www.ihas1337code.com/2010/04/searching-element-in-rotated-array.html2. ↑ http://talk.interviewstreet.com/questions/32/Search-in-rotated-sorted-array3. ↑ http://stackoverflow.com/questions/1878769/searching-a-number-in-a-rotated-sorted-array

Give an array, find the minimum of elements to delete so that the remaining array is sorted.

Give an array, find the minimum of elements to delete so that the remaining array is sorted.

General Ideal

This problem is equivalent to find the longest increasing subsequence.

We can

1. find the longest increasing subsequence2. delete the elements that are not in the subsequence.

Find the median in a large unsorted array, each number is between 0 and 255.

Find the median in a large unsorted array, each number is between 0 and 255.

scan once and collect the frequency of each number (mode).

Find the median based on the frequency.

petrol bunks in circle.

There are N petrol bunks arranged in circle. Each bunk is separated from the rest by a certain distance. You choose some modeof travel which needs 1 litre of petrol to cover 1 km distance. You can't infinitely draw any amount of petrol from each bunk aseach bunk has some limited petrol only. But you know that the sum of litres of petrol in all the bunks is equal to the distance tobe covered.

That is, let P1, P2, ... Pn be n bunks arranged circularly. d1 is distance between p1 and p2, d2 is distance between p2 and p3.dn is distance between pn and p1.Now find out the bunk from where the travel can be started such that your mode of travelnever runs out of fuel.

General idea

We want to find the starting bunk that the gasoline never run out during the travel. Because the sum of all fuel equals theneeds for travel, such starting position always exists.

Consider following example with 5 bunks and we choose p3 to starting from. At p5, we will run out of gas, which means startingfrom p3 won't work. So, we'll borrow some fuel from p2, and then p1. Then the p5 can reach to p1 and close the circle. So, weknow p1 is the starting position.

123456789

1011121314151617181920

# 1 2 3 4 5 <- numberp 3 0 2 0 0 <- amount of petrold 1 1 1 1 1 <- distance to travel ^ starting position p 3 0 2 0 0 : 2 amount of petrold 1 1 1 1 1 : 1 distance to travel ^ ^ p 3 0 2 0 0 : 2d 1 1 1 1 1 : 2 ^ ^ p 3 0 2 0 0 : 2d 1 1 1 1 1 : 3 <- run out of gas. need to borrow more from #2. ^ - ^ p 3 0 2 0 0 : 2d 1 1 1 1 1 : 4 ^ - - ^

?

Page 13: Sure interview algorithm-1103

Search

Implementation

Check the Java implementation below:

Code

Code can be found at: http://code.google.com/p/sureinterview/source/browse/src/solution/list/BunksInCircle.java#19

An alternative solution

General Idea

Follow the example above. If there is a station does not have enough fuel to move on, the stations passed by do not have thestarting point for sure. So we can start over again from the next station.

Implementation

Merge k sorted arrays

Given k sorted arrays, merge them into one sorted array

1. time and space complexity.2. optimise for only two arrays.

Merge k sorted arrays into one larger sorted array

This problem can be solved by the standard merge sort algorithm. We can maintain a small data structure, preferably a min-heap, that holds the first element of the k data streams. Each time we extract the smallest element from the min heap. Becausethe streams are also sorted, this min value is also the smallest for all current streams. In this way, we merge the sorted

2021222324

^ - - ^ p 3 0 2 0 0 : 5d 1 1 1 1 1 : 5 < get enough fuel and get back to starting position. done. ^ - - - ^

123456789

10111213141516171819202122232425

int getStartPos(int[][] travleInfo) { int startPos = 0; int ttlLegs = travleInfo.length; int ttlPetrol = 0; // accumulated fuel. int distanceToTravle = 0; // total distance. int curPos = startPos; // or loop for ttlLengts times: for (int i = 0; i < ttlLegs; i++) do { if (ttlPetrol >= distanceToTravle) { // can move to another bunk ttlPetrol += travleInfo[curPos][0]; distanceToTravle += travleInfo[curPos][1]; curPos = (curPos + 1) % ttlLegs; } else { // cannot move any more. need to borrow some fuel by moving // starting point backwards. startPos = (startPos + ttlLegs - 1) % ttlLegs; ttlPetrol += travleInfo[startPos][0]; distanceToTravle += travleInfo[startPos][1]; } } while (curPos != startPos); return startPos;}

123456789

101112131415161718192021222324

int getStartPos2(int[][] travleInfo) { int startPos = 0; int ttlLegs = travleInfo.length; int ttlPetrol = 0; int distanceToTravle = 0; int curPos = startPos; for (int i = 0; i < ttlLegs; i++) { if (ttlPetrol >= distanceToTravle) { // can move on. ttlPetrol += travleInfo[curPos][0]; distanceToTravle += travleInfo[curPos][1]; curPos = (curPos + 1) % ttlLegs; } else { // those passed station does not contain the staring point. // set the starting point to next new station. startPos = curPos; ttlPetrol = 0; distanceToTravle = 0; } } return startPos;}

?

?

Page 14: Sure interview algorithm-1103

streams into one larger stream.

In the implementation, at the end of each input stream, there needs a marker that tells the end, which is treated as larger thanany meaningful number. When this marker appears at the top of the min heap, we know all elements are merged into theoutput stream.

Some note worthy points

1. The sorted arrays are usually files on disk, which can be generally represented by an interface as follows.

2. Marker for the end of the stream.

In Java, this marker can be implemented through the Comparable interface. Or simply append the Integer.MAX_VALUE at theend of each stream.

Code

Code can be found at: http://code.google.com/p/sureinterview/source/browse/src/solution/sort/MergeNSorted.java#102

Merge 2 sorted arrays into one sorted array

Following code is quite self-explanatory.

Merge 2 sorted arrays into one sorted array in place

There are algorithms that merges array like [1,3,5,7,2,4,6,8] into [1,2,3,4,5,6,7,8] with O(1) space and O(n) time.

This problem sounds simple but the solution is not trivial at all. StackOverflow has some discussion.

A simple problem starting with will be expanded to be more complex,

1. Check with the interviewer and understand what he wants.2. Go through some examples to get some clue. Do not put down the code on whiteboard until you have the code in

your mind.3. Give a simple answer that easy to explain and implement.

Code

1234

public interface AStream { Integer curData(); //the current data in the stream. Integer readNext(); //advance one vavlue}

1234

/** * Marker of the end, which is defined as larger than any number. */final Integer SUPER_LARGE = null;

123456789

1011

public int compareTo(QElem qElem) { Integer curData = stream.curData(); Integer qElmData = qElem.stream.curData(); if (qElmData == SUPER_LARGE && curData == SUPER_LARGE) return 0; if (curData == SUPER_LARGE) return 1; if (qElmData == SUPER_LARGE) return -1; return curData.compareTo(qElmData);}

123456789

101112131415161718192021222324252627

Integer[] merge2(Integer[] data1, Integer[] data2) { // 1. no need to merge when one queue is empty if (data1 == null) return data2; if (data2 == null) return data1; // 2. merge int p1 = 0, p2 = 0, m = 0; Integer[] mrgData = new Integer[data1.length + data2.length]; while (p1 < data1.length && p2 < data2.length) { if (data1[p1] < data2[p2]) { mrgData[m++] = data1[p1++]; } else { mrgData[m++] = data2[p2++]; } } // 3. handing remaining data still in the queue. while (p1 < data1.length) { mrgData[m++] = data1[p1++]; } while (p2 < data2.length) { mrgData[m++] = data2[p2++]; } return mrgData;}

?

?

?

?

Page 15: Sure interview algorithm-1103

Search

Code can be found at: http://code.google.com/p/sureinterview/source/browse/src/solution/sort/MergeNSorted.java#142

25th fastest car in 49 cars

49 race cars and no two have the same speed. Now give you 7 tracks with equal length to find the 25th fastest car. At leasthow many races are needed.(no time recorder)

(or 25 horses)

General idea

Checking median of medians algorithm. We can divide and concur the problem using the median of medians as the pivot.

Solution 1

Round one

1. (7 races) Divide the cars into 7 groups and get the order within each group.2. (1 race) Take the 7 medians and get the order. Find the median of medians (denote as o). In following example, it is 34.3. (3 races) Find the rank of the median of medians. Take 6 elements from lower-left corner (25 ~ 33) and upper-right

corner (13 ~ 21) and race against the o (34). After 3 rounds, we know the rank of this median of medians within in thewhole set. The best case is that o is the global median (25th fastest). The worst case is that o is the 16th or 34thfastest.

This example shows one possible worst case.

Round two

We want to find the rank of other medians in a binary search fashion.

1. (3 races) Pick the median less than 34, which is 12. Race it against the lower-left and upper-right corner cars. After 3races, we know its rank is 12.

Now, the gap between those two medians are at most 21, as shown in this example.

Round three

Rearrange the 21 cars (>12 and <34) as follows.

Each row is still sorted.

1. (1 race) Find the median of medians again, which is 23.2. (1 race) Find its rank. After this step, we know the car in previous step is ranked 23 for sure.3. (1 race) Similar to a binary search, check the rank of another median, 29.4. (1 race) Sort all cars between 23 ~ 29 (exclusive). The 25th fastest car is found.

So at most 18 races are needed to get the 25th fastest car.

! An incorrect solution !

This question is not as easy as it seems.

One common error is to exclude elements that are not globally larger or smaller. For example, if we pick 7 elements out of 49.After a round of sort, obviously, we cannot exclude any element out of this 7 elements. Because the element excluded might bethe median of the 49 elements. Similarly, following solution is not correct, in that the median can be within s or l region.

There are 7 + 1 + 1 + 1 = 10 rounds needed to get the 25 fastest car.

Steps

1. divide the cars into 7 groups and get the order within each group.

2. find the 7 medians and get the order of medians.

Now we have.

s s s s + + +

1234567

1 2 3 4 13 14 15 <- group 15 6 7 8 16 17 18 9 10 11 12 19 20 21 ...22 23 24 34 35 36 3725 26 27 38 39 40 41 <- group 528 29 30 42 43 44 45 <- group 631 32 33 46 47 48 49 <- group 7

1234567

13 14 15 <- group 116 17 18 19 20 21 ...22 23 2425 26 27 <- group 528 29 30 <- group 631 32 33 <- group 7

?

?

Page 16: Sure interview algorithm-1103

Search

Search

Search

Search

s s s s + + +s s s s + + +s s s o l l l- - - l l l l- - - l l l l- - - l l l l

We know s < {+, -, o} < l. We can safely exclude those s and l. The 25th car must still remain in +, -, or o. wrong!

now we have

. . .

. . .

. . . o. . .. . .. . .

Note that each line is still ordered.

3. Pick the medians of each line and run once again and exclude 2 * 6 = 12 cars.

4. Race once again among the left 7 cars. Pick the median, which is the 25th fastest car.

Find the intersection of two sets represented by sorted arrays.

1. How to find the common elements in two sorted arrays?2. What if the sizes of arrays are quite different?

General Idea

Follow the steps as in merging two arrays, output the elements that appears at the head of both arrays.

If the sizes are quite different

case 1) If the larger array can be accessed randomly

For each element in the smaller array, search it in the larger one.

case 2) If the larger array is too large to save on one computer

Split the larger array and distribute it to multiple computers. Also split the smaller array according to the lower and higherboundary of the sub-array on each computer. Query the intersection and combine the result.

Young tableau

Given two sorted positive integer arrays A[n] and B[n] (W.L.O.G, let's say they are decreasingly sorted), we define a set . Obviously there are n^2 elements in S. The value of such a pair is defined as Val(a,b) = a + b. Now we

want to get the n pairs from S with largest values. The tricky part is that we need an O(n) algorithm.

submit my answer

Young tableau.

A m*n matrix of integer, all rows and columns are sorted in ascending order. Find the most efficient way to print out allnumbers in ascending order.

Young tableaus. CLRS 6-3.

orig

submit my answer

find the k-th largest number in two sorted lists

find the k-th largest number in two sorted lists

submit my answer

Page 17: Sure interview algorithm-1103

Searchbinary search for a range.

Given a sorted array of float numbers, find the start and end position of a range.

For example,

inputarray : {0, 0.1, 0.1, 0.2, 0.3, 0.3, 0.4}range : 0.1 <= x <= 0.3

output:1 5

General idea

The key point here is to search using the previous result.

First, we should find where the data roughly is, which is done by a binary search to find an element in the range. In the search,the search region will be narrowed down to [posStart,posEnd]. The element in the middle further divide this region into[posStart,mid] and [mid,posEnd].

Then, we can search in these two separate regions for the real starting and ending position.

Be careful not having dead loop in the binary search.

Code

Code can be found at: http://code.google.com/p/sureinterview/source/browse/src/solution/search/BinarySearch.java#30

123456789

101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657

public void findRange(double[] data, double rangeStart, double rangeEnd, Mutable<Integer> pStart, Mutable<Integer> pEnd) { pStart.setValue(-1); pEnd.setValue(-1); if (data == null || data.length == 0) return; int posStart = 0, posEnd = data.length - 1; // find where the data roughly is. int inRange = 0; while (posStart <= posEnd) { inRange = (posStart + posEnd) / 2; if (data[inRange] < rangeStart) { posStart = inRange + 1; } else if (data[inRange] > rangeEnd) { posEnd = inRange - 1; } else { // found: rangeStart <= data[inRange] <= rangeEnd; break; } } // not found if (posStart > posEnd) return; // Now, data[inRange] is in the range of data. // We need to find the index that points to rangeStart. int pEnd2 = inRange; while (posStart <= pEnd2) { int n = (posStart + pEnd2) / 2; if (data[n] < rangeStart) { posStart = n + 1; } else { pEnd2 = n - 1; } // note: there is no break when rangeStart was found. } // and find the end position in [inRange,posEnd] int pStart2 = inRange; while (pStart2 <= posEnd) { int n = (pStart2 + posEnd) / 2; if (data[n] > rangeEnd) { posEnd = n - 1; } else { pStart2 = n + 1; } // note: there is no break; } if (posStart <= posEnd) { pStart.setValue(posStart); pEnd.setValue(posEnd); }}

?

Page 18: Sure interview algorithm-1103

Search

Search

Just a reminder how it works

The binary search is to reduce the search range by divide and conquer. For example, we want to find 5 in a sorted array {0, 1,2, 3, 4, 5, 6, 7, 8, 9}. Using (0+9)/2=4 as the mid value, the target value must be on the right side. so, the search rangebecomes {4, .. 9}.

The code:

Note that the terminate condition can also be 'only one element left'. The code is like:

Sort a partially sorted array.

1. An array is preprocessed so that A[i] < A[i+N]. Sort this array.

2. An array is partially sorted so that A[i] < A[j], when i < j - N. Sort this array.

1. It is shell sort half way done. So, continue the shell sort until the step length (N) is 1.

2. Use a min-heap of size N as a buffer. After the stream passing through this buffer, the data will be sorted.

That is,

... sorted ... [min-heap of size N] ... partially sorted ...

Axis Aligned Rectangles

Describe an algorithm that takes an unsorted array of axis-aligned rectangles and returns any pair of rectangles that overlaps,if there is such a pair.

Axis-aligned means that all the rectangle sides are either parallel or perpendicular to the x and y axis. You can assume thateach rectangle object has two variables in it: the x/y coordinates of the upper-left corner and the bottom-right corner.

1. Hacking_a_Google_Interview_Practice_Questions_Person_A1. http://courses.csail.mit.edu/iap/interview/materials.php

2. Interval tree

123456789

1011

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9} ^ ^ ^ lo mid hi 0, 1, 2, 3, 4, {5, 6, 7, 8, 9} ^ ^ ^ lo mid hi 0, 1, 2, 3, 4, 5,{} 6, 7, 8, 9 ^ ^ hi lo // when the search terminates, the search range is empty

123456789

10

while(lo <= hi){ in mid = (lo + hi) / 2; // or mid = lo + (hi-lo)/2, to avoid overflow. if(arr[mid] == result) return mid; //return when there is a match. if(arr[mid] < lo){ lo = mid + 1; }else{ hi = mid - 1; }}

123456789

10

while(lo < hi){ in mid = (lo + hi) / 2; // or mid = lo + (hi-lo)/2, to avoid overflow. if(arr[mid] == result) return mid; //return when there is a match. if(arr[mid] < lo){ lo = mid; }else{ hi = mid; }}

?

?

?

Page 19: Sure interview algorithm-1103

Search

Search

Search

Search

Search

Search

Manipulate data in a stream

Find a number shows up over half times

Given a stream of integers, at a given time, there is a number appeared more than half time. How to find this number.

submit my answer

find most frequently visited pages (url)

Find k most frequently visited (clicked) pages from a big log file that contains list of time stamp, session ID and Page ID in eachline. The file is too big to fit into memory.

1. Find the most k frequently visited pages within a month.2. Find the most k frequently visited pages within a couple months. Each month should be reported separately.3. Find the most k frequently visited pages with some patten. For example, patten a>b>a is for a user visits page a, page b,

then page a again, The lines are not strictly sorted by time.

submit my answer

find the kth largest number in a list

1. . sorted.2. . unsorted.

limitation of space.

sorted

If the list is sorted, the problem is reduced to find the n-th number to the end, which can be solved by a queue of size k.Return the tail after the last element is scanned.

unsorted

Maintain a min-heap of size k. During the scan, if the current number is larger than the top, throw this number into the min-heap. After the scan, the top of the heap is the k-th largest of the list. The time complexity is O(n log(k)).

In a linked list find the nth node from the end of this list.

You can only scan once.

submit my answer

pick up a object from a stream with equal chance

There is a object steam of unknown length. Limited space to hold only one object. Scan the stream once. At the end of thestream, the place should hold an random object in the stream of equal possibility.

submit my answer

find a subarray with maximum sum in a given array.

variant:

1. . circular array.2. . size <=n

submit my answer

Page 20: Sure interview algorithm-1103

SearchFind the most k frequently visited pages with some patten

Find k most frequently visited (clicked) pages from a big log file that contains list of time stamp, session ID and Page ID in eachline. The file is too big to fit into memory.

Find the most k frequently visited pages with some patten. For example, patten a>b>a is for a user visits page a, page b,then page a again.

submit my answer

Page 21: Sure interview algorithm-1103

Search

Search

Search

Search

Search

String manipulation

Implement atoi; convert a string into integer

Implement following c/c++ function:

Write your test cases.

submit my answer

finds the longest palindrome in a given string

finds the longest palindrome in a given string

submit my answer

URL match

1. Match a input URL to the ones in the list.

For example, given a list:

And the input "/test/test/", print the url "/root/test/test/t.html"

If this routine is frequently used, how to improve the performance.

submit my answer

How to detect and remove near duplicate files among large amount files.

How to detect and remove near duplicate files among large amount files. For example, web pages are only different inadvertisement part.

submit my answer

Ransom Notes

Ransom Notes is a note that each word are cut and paste from a magazine. [1]

Given a paragraph and a sentence, check if the words of the sentence are all in the paragraph. Each word in the paragraph canbe used only once.

references

1. ↑ Hacking a Google Interview Handout

String manipulation is very tricky. Before putting down any code, make sure

1. you understand the question well, and2. discuss some test cases with the interviewer.

Analysis

1 int atoi ( const char * str );

123456

http://www.example.comexample.org/testtest/test2test2/test3/root/test/test.html/root/test/test/t.html

?

?

Page 22: Sure interview algorithm-1103

The ransom notes problem checks if one set contains another set. We can use BST or hash table/map to help reduce the timecomplexity in looking up.

Because it is a string manipulation problem, the tricky part actually comes from how to collect all words in the paragraph ratherthan how to use the HashMap/BST.

For example, given "AA BB CC", the word is not only delimited by the blanks, but also implicitly by the begin and end of thestring. One way to solve this problem is to pad it with blank so that the words are uniformly delimited by blanks.

Another way to extends the "isAlpha" function and treat all out of boundary characters are non alphabetical.

The next step is to extract words from the notes. The the words can be identified by the alternating of non-alphabetical -->alphabetical, which is the starting point of a word, and alphabetical --> non-alphabetical, which is the end of a word.

So, we have the code:

One last thing, since the note tends to be smaller than the paragraph, the words in ransom note are saved in a HashMap forbetter space/time efficiency.

Code

Code can be found at: http://code.google.com/p/sureinterview/source/browse/test/solution/string/WordsSentence.java#24

reference

1 "AA BB CC" --> " AA BB CC "

123456789

10

boolean isAlpha(int i) { // Note: the code takes advantage of this definition. if (i < 0 || i >= sbuf.length) //all chars out of boundary are considered to be non alphabetical. return false; int c = sbuf[i]; if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z') return true; return false;}

123456789

10111213141516171819202122232425262728293031323334353637383940414243

boolean checkRansomNote(String paragraph, String notes) { if (paragraph == null) return false; if (notes == null) return true; // get the needed words for ransom. Map<String, Integer> wordCol = getWords(notes); sbuf = paragraph.toCharArray(); int convStart = -1, convEnd = -1; for (int i = 0; i <= sbuf.length; i++) { if (!isAlpha(i - 1) && isAlpha(i)) { // find ...@A... convStart = i; } else if (!isAlpha(i) && isAlpha(i - 1)) { // find ...A@... convEnd = i - 1; String wd = String.valueOf(sbuf, convStart, convEnd - convStart + 1); // check the current word. if (!wordCol.containsKey(wd)) { continue; } // the current word is useful, update the collection int count = wordCol.get(wd) - 1; if (count == 0) { wordCol.remove(wd); if (wordCol.isEmpty()) return true; } else { wordCol.put(wd, count); } } } // NOTE: make sure all words are handled, especially the first word // and the last word. return false;}

?

?

?

Page 23: Sure interview algorithm-1103

Search

Search

min cover window

Given some chars and a string. Find the shortest substring that contains all given chars.

For example, given {'a', 'b', 'c'} and string "aabbcc", the shortest substring should be "abbc".

The key to this problem is to maintain a substring so that [startPos .. endPos] contains all keywords.

Following pseudo code sketches the process moving endPos and startPos alternatively to find the min cover window.

In this Java implementation, keyCount bookmarks the number of occurrence of the keywords in the substringpara[startPos..endPos].

Code

Code can be found at: http://code.google.com/p/sureinterview/source/browse/src/solution/string/MinCoverWindow.java

Design the data structure storing dictionary

cases:

1. the dictionary size is small.2. the size is too large to fit in memory.3. the size is too large to fit in one computer.4. optimize to improve the performance. What is the bottle neck of your system.

orig

submit my answer

1234567

do{ endPos = move endPos to right so that all keywords are found; startPos = move startPos to right so that the substring still has all keywords but cannot be shorter; //[startPos],[endPos] are keywords show up only once in the substring. move startPos right to skip this left-most keyword for a new window; } while( more words to scan );

123456789

1011121314151617181920212223242526272829303132333435

int numMissingKey = keys.length, minLen = para.length + 1;for (int startPos = 0, endPos = 0; endPos < para.length; endPos++) { // move endPos to include all keywords if (keyCount.containsKey(para[endPos])) { int cnt = keyCount.get(para[endPos]); keyCount.put(para[endPos], cnt + 1); if (cnt == 0) { numMissingKey--; // find one missing keyword. } } if (numMissingKey > 0) continue; // move startPos to find a min cover window, which has all keywords but // cannot be shorter for (; numMissingKey == 0 && startPos <= endPos; startPos++) { if (!keyCount.containsKey(para[startPos])) { continue; } int cnt = keyCount.get(para[startPos]); keyCount.put(para[startPos], cnt - 1); if (cnt > 1) continue; // this keyword is the only one in the substring. This // keyword will be missing by moving startPos. // so, [startPos..endPos] is a candidate for min cover. numMissingKey++; if (endPos - startPos < minLen) { minLen = endPos - startPos; start.setValue(startPos); end.setValue(endPos); } }}

?

?

Page 24: Sure interview algorithm-1103

Search

Search

Search

implement readline using read

Implement readline using API read. The signature is defined as:

int read(char* buffer, int size); // return chars read in buffer

For example, given input stream "abcd\nefgh",

read(buffer,3) returns 3 and next char is d.read(buffer,7) returns 7 and next char is g.readline(buffer,3) returns 3 and next char is d.readline(buffer,7) returns 4 and next char is e.

submit my answer

reverse a long list of words

There is a very long list of words delimited by blank. Output the words in reversed order.

submit my answer

split string to words

A dictionary has n words.

Given a string, find how many ways to split the string into words so that all words are in the dictionary.

Code

Code can be found at: http://code.google.com/p/sureinterview/source/browse/src/solution/dp/SplitStringToWords.java#67

123456789

1011121314151617181920212223242526272829303132333435363738394041424344454647484950

public int splitWords(String longString) { wdArr = longString.toCharArray(); // cache the result. the top-down DP comes into play cache = new int[wdArr.length]; Arrays.fill(cache, -1); // initialize the cache. // for words being split out q = new LinkedList<String>(); return splitWords_rec(0);} /** * split string into words staring from 'pos' * * @param pos * @return */int splitWords_rec(int pos) { if (pos >= wdArr.length) return 0; // if it is already cached, don't bother calculate it again. if (cache[pos] >= 0) return cache[pos]; int splits = 0; for (int len = 1; len <= dict.getMaxLen() && len + pos <= wdArr.length; len++) { // check if current string starts with a word in the dictionary String wd = String.valueOf(wdArr, pos, len); if (!dict.hasWord(wd)) { continue; } // if this word ends the whole string, we have found one split if (pos + len == wdArr.length) { q.add(wd); // System.out.println(StringUtils.join(q, ",")); q.remove(q.size() - 1); splits++; continue; } // go ahead and split the string to the end. q.add(wd); splits += splitWords_rec(pos + len); q.remove(q.size() - 1); } cache[pos] = splits;

?

Page 25: Sure interview algorithm-1103

Search

Search

Search

Search

Search

Search

Remove all duplicated chars in a string

Implement an algorithm

For instance, change "abbcccdda" to "abcda" and return 4(the number of characters deleted).

Count the number of breakpoints, where a char is different from the previous one.

a|bb|cc|dd|a| 1 2 3 4 5

Pay attention to the last char. (Or just +1 after counting the internal breakpoints).

log file processing

A large time-stamped log files, how to find the logs within a time range.

submit my answer

Find whether one string is a subset of another string

Find whether one string is a subset of another string (not need to be contiguous, but the order should match).

General Idea

By examine following test cases, the question above is just a special case of string matching.

The solution is to consume one char in pattern when it matches against the target. If there is any char unmatched, the patternis not a substring.

given a string check if the string is cycles of some pattern. O(nlgn)

For example, "abcabcabc" is "abc" repeated three times. or, "(abc){3}".

orig

submit my answer

How to remove duplicate url from search engine crawler

How to remove duplicate url from search engine crawler

submit my answer

find phone numbers in files

A large file contains phone numbers. Each line has at most one phone number. How do you process and return the totalnumber of phone numbers.

Check command grep, sed, and wc.

505152

cache[pos] = splits; return splits;}

1 int removeDuplicate(char[] s)

12345678

a b b c d d e f //<- targeta b c //<- pattern = a b c a b b c d d e fa d ~ c //<- pattern = a d c a b b c d d e fa b b f //<- pattern = a b b f

?

?

Page 26: Sure interview algorithm-1103

Search

Search

Search

Search

Search

implement the command "cd" or "dir"; simplify directory path.

Implement the command line "cd", or "dir".

Given a path, output the equivalent path with "." or ".." removed.

1. ".." is for parent directory2. "." is for current directory

For example, given a path like: dir1/dir2/../dir3/./dir4, the result is dir1/dir3/dir4.

submit my answer

design a data structure for wildcard match.

1. * only, once. e.g., a*b, *a, b*, or *2. pattern is not restricted. including ? and *3. target is zillions of url stored on multiple computer.

submit my answer

given a large text file,find all the anagrams.

given a large text file,find all the anagrams.

submit my answer

Implement putlong, itoa, atoi

Implement putlong, itoa, and atoi.

test cases:

1. . 02. . positive/negative number3. . overflow

JSON prettier

format the JSON format data by proper indent and new line.

For example, given

{"id":"id-123","woe_id":[123,456,789],"attribute":{"title":"a","desc":"b"}}

output:

{ "id":"id-123", "woe_id":[123,456,789], "attribute":{ "title":"a", "desc":"b" } }

General Idea

Scan the stream, take action on corresponding char:

current char action{ print char; indent+=2; insert("\n"); insertIndent();, print char; insert("\n"); insertIndent();} indent-=2; insert("\n"); insertIndent(); print char;

Page 27: Sure interview algorithm-1103

Search

Search

Find links/urls from one html page

Find links/urls from one html page using C++.How do you store those links.

submit my answer

Given an arbitrarily long string, design an algorithm to find the longest repetitive substring.

Given an arbitrarily long string, design an algorithm to find the longest repetitive substring. For example, the longest repetitivesubstring of "abcabcabc" is "abcabc".

submit my answer