Housing Location Optimizer

By Monica Metro and Joren McSorley

Abstract

Suppose you are relocating to a new city for work. You already know where you will be working and what price you can afford but are unfamiliar with the new city. You know how important being close to work and local amenities are to you. These algorithms will show you the closest living location within your budget, and consider your importance of proximity to work and amenities.

Given an individual’s minimum and maximum acceptable rent price, work location, and importance of amenities; find the living location that minimizes the distance between an individual’s home, place of work, and desirable amenities. The user will select the importance of distance to work and amenities. Amenities are based on the individuals work location.

Formal Explanation

Given constraints:

·  H - the set of apartments / houses with comparable mortgages (which will be referred to as simply apartments from here on out)

·  N - the total number of apartments,

·  Weights for importance of constraints to the user

o  Such that integers 1 ≤ weight ≤ 10

o  1 is for least important and 10 is for most important

o  repeats are allowed

·  The user’s work location

·  Work distance (wd) - a maximum inclusive distance from work

·  A hard exclusive price range for the apartments

The global optimum of the solution set S is the minimum of the computed solutions f(a) for a unique apartment ‘a’:

f(a) = q w(a) + r p(a) + t z(a)

where w(a), p(a), and z(a) are the solution functions that output a value ≥ 1 determined by the ranking of the apartment ‘a’ in that local solution. w(a) corresponds to distance from the work location, p(a) to price of the apartment, and z(a) to the distance from the apartment to amenities. q, r, and z are their user given weights of importance.

Why start at 1 instead of 0:

For all three solution functions, the lowest assigned ranking value is 1. The lowest possible given weight value is also 1. This is because if we allowed the values to equal 0, errors would result. If the weights could equal 0, it wouldn’t matter what the ranking of the work, price, or amenity function it corresponded to as 0 times any integer = 0. For this same reason, the rankings must not be allowed to equal 0 or the weight for that function would not matter. For example: low importance = weight of 0, high importance = weight of 10, low ranking = N-1, high ranking = 0 à low importance * low rank = 0 and high importance * high rank = 0.

Brute Force

The brute force algorithm is the most straightforward way to find the global optimum. To find the apartments with the minimum solution values, all of the apartments are exhaustively searched. A barebones pseudocode representation is as follows:

·  Work function, w(a) for N apartments

·  Price function, p(a) for Nr apartments

·  Amenity function, z(a) for n apartments

·  Solve for final solution value of each apartment

·  Sort apartments from least to greatest solution value to find optimums

Work Function - w(a):

In all three algorithms, the work function was the first function the data was sent through and the first value to be calculated in order to determine the final solution. The first step of this function is to calculate the Pythagorean distance between the work location and each apartment. Pythagorean distance = (x1- x2)2+ (y1- y2)2 (Weisstein). If the distance is ≤ the given maximum allowed distance (wd), then the apartment/house is kept. Otherwise, it is filtered out. Then, the value of the distance from the apartment to the work location is saved. After all apartments have been filtered, they are sorted from shortest to longest distance and are assigned an integer value that is equal to k + 1, where 0 ≤ k ≤ N-1 and k is the position of the apartment in the sorted array such that the lowest assigned value will be 1 and the greatest will equal Nr (N-reduced) .

Both calculating the Pythagorean distance for each apartment and saving the distance to the apartment object is bounded by O(N). Sorting the apartments based on this distance value is expected to be O(Nlog(N)); expected because Java’s built in java function was used. Again, N is the total number of apartments in the initial apartment set H. The big-O of the work function then is O(Nlog(N)) as the bound is superior to O(N), even though there are multiple O(N)’s.

Price Function – p(a):

Similar to the work function, the price function operated the same in all three algorithms and is called after the work function. This way, after the work function filters out impossible solutions (apartments that are too far away), the price function can take the reduce Nr array of apartments and possibly filter it out some more. This is done by checking whether the apartments are in the given price range. The apartments that are in range are kept and sorted again this time from least to greatest price and assigned an integer value k + 1, where 0 ≤ k ≤ Nr-1 and k is the position of the apartment in the sorted array such that the lowest assigned value will be 1 and the greatest will equal n.

For the same reasons that the work function’s big –o is O(Nlog(N)), the price function is bounded by O(Nr log(Nr)) where Nr is the number of apartments after N apartments are filtered out by the work function. The difference is, instead of calculating Pythagorean distance, an if statement is used to check whether the apartment is in the desired price range.

Amenity Function – z(a):

The algorithms differ at the amenity function. This was purposely designed to improve accuracy results. Since the output values produced by the solution functions are determined by the ranking of the apartments, any algorithm that does not incorporate every apartment at the same time will deviate from the brute force solution. Keeping the work and price functions the same in every algorithm and thus incorporating every possible solution apartment was determined to be worth the extra time in return for better results. The amenity function was chosen to differ because it does not natively include a filtering out ability like the other two solution functions do and also possesses the greatest big-o time. The backtracking and simulated annealing algorithms were designed with the hope of bringing the time for this function down while maintain accuracy.

Given n number of apartments, where n is the final filtered out number of apartments in the brute force application, the Pythagorean distance was calculated from each apartment to each amenity. For each amenity group (grocery stores, gyms, movie theatres…etc.) the minimum distance from an amenity in that group to the apartment was used to calculate a total amenity value or ‘A’. A = e*d1 + f*d2 + g*d3 … Where e, f, and g are the respective weights for that amenity group and d1, d2, d3...are the arrays of the groups containing the amenities and their locations. An apartment does not receive a higher ranking for having multiple amenities from the same group in close proximity because we could not determine a competent way to represent this mathematically. To reduce complexity, we used grocery stores, gyms, and movie theatres. This could easily be broken down further into companies and so on to increase the accuracy of finding apartments the user would be interested in. A is saved to each apartment and the apartments are sorted once again from least to greatest and assigned an integer value k + 1, where 0 ≤ k ≤ n-1 and k is the position of the apartment in the sorted array such that the lowest assigned value will be 1 and the greatest will equal n.

The brute force implementation of the amenity function requires looping over n number of apartments and for each apartment, the amenity groups must be looped over to calculate the minimum distance from each group to the apartment. Technically, the big-o for this would be O(n x g) where g is the number of items in the largest group. The largest group for our testing purposes was always the grocery store group, therefore it would be O(n x number of grocery stores). However, it is safe to assume that the number of items in the largest group will never surpass N – the total number of apartments in set H. If a user wanted to check over a very confined area this event could be possible, but then it would unnecessary to use this algorithm to determine which apartments to choose as the number of apartments would already be extremely low. Also, since n = N being filtered out twice, it is possible that N = n if no filtering is done through the work and price functions. Calculating the solution set for each apartment takes O(n) time and re-sorting the solutions to find the global optimums is bounded by O(nlog(n)). O(N2) > O(n) and (nlog(n)) therefore, the worst case big-O for brute force is realistically O(N2).

Accuracy Analysis of Brute Force:

Below are two identical data samples. One was run through a non-weighted brute force such that everything was weighted the same. The other was weighted with work being the most important thing (10) and everything else was set at the least important value (1).

Since all the apartments in the non-weighted graph were weighted at the same importance, the solution set contains the minimized values of w(a) + p(a) + z(a) as displayed in the table of results. The top 5 results make sense as they all either well rounded with values being in the middle of the pack for each function, or well ranked in one function and not so well ranked in another thus balancing out.

Ranking / ID / Work Value / Price Value / Amenity Value / Solution
1 / 5 / 2 / 14 / 3 / 19
2 / 19 / 5 / 5 / 11 / 21
3 / 7 / 12 / 9 / 1 / 22
4 / 9 / 7 / 15 / 2 / 24
5 / 6 / 4 / 8 / 13 / 25

When weighting work as high importance and everything else at a low importance, the global optimum was the apartment that was the closest to work, as intended, which is why “Rank 1” seems to be missing from the graph. The other optimal solutions in the solution set were also ranked in order of closest distance from work as expected, regardless of how they fared with price or amenity ranking.

Ranking / ID / Work Value / Price Value / Amenity Value / Solution
1 / 1 / 1 / 13 / 12 / 35
2 / 5 / 2 / 14 / 3 / 37
3 / 14 / 3 / 18 / 8 / 56
4 / 6 / 4 / 8 / 13 / 61
5 / 19 / 5 / 5 / 11 / 66

Backtracking

The backtracking algorithm finds the solution set by computing a partial solution and ‘backtracking’ and abandoning a particular apartment if its partial solution is determined to be impossible as a solution. In this case, all solutions generated by the brute force algorithm are indeed possible, but most of them are unnecessary. An impossible solution then is defined here as a solution that is not optimal. When search for an apartment or house, assuming one isn’t searching for a long period of time, most people only look at a few before making a decision. This is why the backtracking algorithm only looks at the 20 most optimal solutions. This is slightly excessive, but helpful when comparing all three algorithms. During testing, all three seemed to be almost always identical with the several most optimal solutions, therefore 20 was adopted in order to see where the results diverged. The solutions are defined as being partial because they are calculated without every single one of the possible apartments as non-optimal ones are discarded. Also because apartments can be discarded before every amenity group has been accounted for. For example, if f(a) = (weight x work) + (weight x price) + (weight x partial amenity value) for apartment a is already greater than the 20th solution, there is no need to continue checking the distance between the other amenity groups as the solution value will only get larger as the amenity rank gets higher as it gets worse. A barebones pseudocode representation is as follows:

·  Work function, w(a) for N apartments

·  Price function, p(a) for Nr apartments

·  Amenity function, z(a) for n apartments

·  Calculate partial solutions for first 20 apartments

·  For each apartment while apartment count > 20

o  For each amenity group, calculate a partial z(a) (amenity ranking) and use it to find f(a)

§  If f(a) > f(20th) , abandon apartment a

o  If apartment was never abandoned, compare partial solution ( f(a) ranking ) to the 20th

§  If it’s better, replace the 20th apartment and re-sort solution set

§  Else, abandon

Big-o for the work and price functions is the same for all three algorithms as the functions don’t change. The amenity function in the backtracking algorithm is a bit better than O(N2) for two reasons. One, because the solution set is limited to only 20 apartments. Even though the backtracking algorithm has to sort more often in order to calculate the partial amenity solutions 3 times in the worst case for each apartment, since only 20 apartments is being sorted a lot of time is saved. For larger values of N, sorting 20log(20) repeatedly is still significantly better than sorting Nlog(N) twice – once while determining the z(a) values and again when sorting the solution set to find the global optimum. Secondly, the backtracking algorithm may only have to calculate the minimum distance for the first amenity group before it is discarded as an impossible solution, reducing 3n2 time for that single apartment in the brute force implementation to n2. This is a small reduction but it can add up over time. The backtracking algorithm is still bounded by O(N2) worst case time as finding the minimum distance for the largest amenity group is N in the worst case for the same reasons.