CSC148 sLog

Thursday, March 27, 2014

Week Eleven - Sorting and Efficiency

Efficiency has not really become a huge concern to me this term, but this is mainly because of the fact that the data we are dealing with right now is relatively small. However, in real world, this is definitely not the case. In real world, the data structures are often huge, and if the program is not efficient enough, it is can take more longer time to run and reproduce the result. Therefore, in my opinion to optimize the run-time should always be a goal to achieve for a good programmer. This week, we discussed several sorting algorithms, some are more efficient than other, and I will talk about them below:

Selection Sort

Selection sort is very easy to code and understand, in selection sort you have an index 'i', which split the list into two parts, all the indices smaller than 'i' are the sorted part, and the indices equal or greater to 'i' are the unsorted part. You would first make 'i' equal to index 0 (which means the sorted part of the list is empty and the unsorted part is the whole list), and you would move 'i' to the end of the list. Each time you move i down by one, you can find the smallest item in the unsorted part and swap it with the object at 'i' (this can be done using tuple unpacking). There is no best, average, or worst case because no matter how you arrange list it will have the same number of operations with respect of the lengths of the list. The run-time is always in O(n^2) where n being the number of items in the list. (when i is 0, you will make n-1 comparisons, then n-2, n-3, n-4 and so on. so at the end when you sum up all the comparisons it will be O(n^2)).

Insertion Sort

Insertion sort is similar to selection sort in some ways. It has an index 'i' and a unsorted and sorted part of the list. But in insertion sort you would "insert" the item at index i to the correct place in the sorted part. (To do this you might have to shift some of the item in the sorted part in order to insert i). Different than selection sort, insertion sort actually have a best case and a worst case. The best case is when the list is already sorted, therefore no shift is required to make. However, the worst case is when the list is in exact reverse order, then you have have to shift every insert. In the worst case the run-time is also big O(n^2), similar logic with the selection sort can be applied here.

Quick Sort

Quick Sort is a new algorithm we learned in this course, I personally really liked this sort because the algorithm is straight forward and the code is extremely easy to write. This sort shows how powerful recursion can be sometimes (also the merge sort below). In the quick sort, you can pick a pivot, then you compare all the other elements to the pivot. All the elements smaller than the pivot will be in one list and all the elements greater or equal to the pivot will be in another list. Then you call the quick sort on those two list separately. The worst case can occur when all the items are always greater or equal than the pivots or all smaller than the pivots (i.e. one sub-list is created). In that way the run-time is going to be in O(n^2), the problem is however able to be avoid by using the random number generator to generate a new pivot each time, then the run-time is going to be O(nlog(n)) where n accounts for the comparison.

Merge Sort

Merge Sort is also uses recursion, the idea is that you keep cutting the list a half, then you sort the left side of the list and the right side of the list, then you would put them together. (kind of like binary search) There is not much difference between the worst and average case of merge sort. Similar to quick sort the run-time is O(nlog(n)), one thing to note is the worst case run-time of merge sort is actually the best/average run-time of quick sort.

Saturday, March 22, 2014

Week Ten - Assignment 2

I have spent a relative long period of time on assignment two, especially part two, it was giving me a lot trouble. The first design part was quite easy and straightforward, you can write it in literally one day if you understand the features of the regex expression pretty well. However, I did not my design, I created a class to classify leaf and a class to have one children and one class to have two children. This way works, but I don't think the relationship between different regex expression was characterized well using this approach. On the other hand, the instructor created several inheritance relationship to link the classes. I think this approach is much more clearer than what I did.

However, part two was the challenge part in this assignment, I did this part twice. When I first did it, I did not look at the hint on piazza. So I break on the regex expression on brackets to produce a nested list. And use the nested list to check validity of the expression. This approach worked, however it only works on valid strings. So an exception can raise if you try to check something like this ")2.(1|1))", since the order of the brackets is incorrect, the method that converts string to list will go out of bound. Moreover the runtime was extremely slow on the function all_regex_permutations when the length of the string get larger. It actually resulted in a OutOfMemoryrError when the length get to greater than 6, I think there are just too many recursions for the computer to handle and eventually caused it use up all the memory. So, I decide to try a different approach for the assignment (it is really hard to give the nested list approach up, because I spent so long to figure it out). The second time, I used the way instructor assignments. I have to say the hint was so useful for the assignments, it kind of give the solution away already. If I were to figure this algorithm out by myself, I think it will take a long time to do so. I think I really need to improve on those type of skills, but that is computer science really about. I used the approach where I look at each individual bar or dot, see whether it split into two valid regex expressions. If all of them doesn't, return false. This approach did not cause any exceptions, and it was much much quicker for the all_regex_permutations methods. Regex_match is little hard to deal with, but in my point of view, if you really understand the algorithm the instructor was talking about in class, it is actually not that band. I think the way to break a string into two parts to check for ‘*‘ is such a smart idea. If I were to do it, I probably will choose to do a loop on it or something.

Sunday, March 16, 2014

Week Nine - Binary Search Tree and Big O

This week I started to work on assignment two, part two with my partner. Although we have worked out a solution for all_regex_permutations, it was very inefficient as the size of the string grows larger. It can take upto minutes to run the method. I think to improve that method will be my main goal in CSC148 this week.

Speaking of efficiency, we also talked a little about efficiency, runtime and Big O in the lecture. The process time of an algorithm can be described using big O notation. For example, if an algorithms has a linear running time with respect to the size of the input, we can express this as O(n). (we normally would consider a very large input size to observe the runtime. Because in real-world problems, input size are normally fairly big. That is also the constant and breakpoint in the big O definition considered for). An example of the O(n) algorithm can be a linear search algorithm. There are other types of running time, for example, the following will be an example of having O(n^2)
for i in len(L):
for j in len(L):
a = i * j
Or binary search, which will be an example of O(lg n). However, in some recursive functions the running time can increase exponentially or even factorially. (you really want to avoid this for larger input. I used several recursion method to do the all_regex_permutations, that is why it tooks so long to calculate). We looked at selection sort this week, I am really excited to more efficient sorting algorithm like merge sort this week.

We also talked about binary search tree this week. We even did a lab on it to enhance the understanding. I personally found the lab pretty easy and straight forward. In some method, it is possible to use the characteristics of a binary search tree to make the method more efficient. Because in binary search trees, the first children contains all the object smaller than the root while the second children contains all the object larger than the root. Using this characteristics, many unnecessary comparison can be avoided.

Saturday, March 8, 2014

Week Eight - LinkedList

We have been talking about linked list in the past two weeks of class. Two ways of implementing linked list is discussed and they both have a recursive structure. In the first way of implementing linked list, it contains a head (called item in the lab) and a rest. A head is a object (a value) and head is "linked" to rest which is either another linked list or None. The "rest" part makes this class a recursive structure, and we need to write some of the functions using recursion. I personally find this way of implement a bit confusing, it took me some time to fully understand this class and write functions for it. (recursion is required in some functions, which makes it harder and less straightforward to implement.) However, this data structure does have some advantages: i.e. in the lab we did last week, we re-implemented the Queue class using this structure, and some functions in Queue implemented this way turns out to be more effective than implementing using the list Object.

The second way of implementing Linked List is to use nodes, each nodes has a value and the reference to another node or None. This way of implementing is a way of using a wrapper class. I am still a bit confused about the structure of linked list this way. In my opinion, I think the class implemented in such a way that it is using the class LListNode to store all the informations. One thing I really like about this implement is that it keeps track of the size of the list. So, instead of writing a method and call it to compute the size, the size is automatically recorded, which save a lot work than compute it every time.

I found the lab this week is easier than last week, probably due to the fact that I am more familiar with LinkedList now. When I finished the lab this week I was able to go back and rework on the lab last week.

Wednesday, February 26, 2014

Week Seven - Recursion

Recursion is very powerful programming technique, in recursive functions, the function will call on itself in its function definition (recursive call). Since the function has to stop and return at some point (or else, if the function just keeps calling on itself, a RuntimeError will occur saying maximum recursion depth exceeded ), that point where the function itself is no longer being called is know as the base case. In this sense, recursion is very similar to induction in mathematics.
Recursion works well often when a big problem can be broken into the small version of the same problem. For example, in A1 the way to move cheeses using 3 stools is a recursive function because as long as n is greater than one you will be doing the same moving strategy: first move n-1 cheese to the intermediate and everything to destination. You will be keep calling this function until you get to the case n = 1, where there is only one cheese, then you will simply move this cheese from the source to destination. Recursion is also extremely useful when you want to do operation on a nested list or any other recursive data structure (consist smaller instance of themselves), i.e Trees, LinkedList. In class we did the example that returns the sum of a nested list, the same task can possibly be achieved without recursion, but the code would be much more complicated to write. Recursive structure is also possible to optimize efficiency, for example in the week six lab, we implemented the class Queue using the recursive structure LinkedList. When the Queue is implemented correctly, as described in the handout, when run the testqueue.py it will take the same time no matter how many element is in Queue. However, when we implemented the class using list, the run time growths and more elements are added.
I also want to also say that although recursion is a powerful method, it should be avoid in some cases:
consider the following methods:
def sum_up_to1(n):
"""
return the sum from 0 up to n using for loop
"""
sum = 0
for i in range (n+1):
sum += i
return sum

def sum_up_to2(n):
"""
return the sum from 0 up to n using recursion
"""
if n == 1:
return n
return n + sum_up_to2(n-1)
I used a timer to test the efficiency of those two functions. To add up to 500, the first method took 0.0s while the second one took 0.000999927520751953s. One thing really interesting is when I did sum_up_to2(1000) it raised a RuntimeError saying the maximum recursion depth exceeded in comparison, while sum_up_to1 still took 0 second. I think it is because you need to call on the function too many times to get to the base case, and the computer just ran out of space. So, if there is not a nested structure and loops can be easily applied, you might not want to use recursion, as it might takes much longer as shown in the example above

Thursday, February 13, 2014

Week Five - Trees

Week Six - Trees

I believe the most important thing beside the lecture this week was assignment 1. So I wish I could spend some time talk about that first.. At first, the assignment looked very complicated - three pages of assignment sheets, five files, hundreds of codes and the codes in GUI files were not making any sense (I later discovered that it is totally unnecessary to understand the GUI files to do the assignment). However, as I started to with through the assignment step by step with my partner, it was surprisingly not that bad. The first three steps were almost review from CSC108 (i.e. choose a data structure, write method for class). The fourth step required some knowledge with exception, however, in my point of view, exercise 2a and 2b really prepared us in order to complete this step. Step 5 is the hardest step in assignment 1, the method of moving three cheeses we discussed in class, and the hints in the assignment sheet really helped us a lot. Me and my partner first thought i is simply n//2 (it worked fined with the n<7). However when we finished implementing the whole method in Tour, we manually calculated more minimum steps for greater number of n, the theory fell apart. So we decided we are going to try all values of i from 1 to n-1 (using a for loop), and compare the number of steps associated with each i. The i with the minimum step among all i for a particular n will be the "i" we want. Since the algorithm involved a lot of recursions, it took so long to run as n becomes bigger. But, the method worked fine (although it is probably not the most efficient way of implementing it).

This week in class we implemented a class called tree, and it is through recursion. I think now we are moving the problem from tracing and understanding recursion to actually writing recursive codes ourselves. Since everything will break down eventually to a base case, I found it much more helpful to think about the base cases first. I think I really got a better understanding on recursion after assignment 1, so this week's lecture material was fine for me to understand. However, I do wish to include two Python build-in functions here, I think they can be really helpful on list comprehensions:
filter(function, iterable):
Construct an iterator from those elements of iterable for which function returns true.
any(iterable):
Return True if all elements of the iterable are true (or if the iterable is empty)

Finally, in the lab this week, there is one part that you have to convert a list comprehension to a for loop, I think it would be a better exercise, if it had been the other way around. Because we are much more familiar with writing a for loop, than a list comprehension.

Thursday, February 6, 2014

Week Five - Scopes and Namespaces

This week's first lecture was about how to solve the cheese problem using only three stools. I am really happy that the solution was discussed in class, because I think it would be much hard if I was to trace and understand that recursive code myself. After the lecture on Monday, I finally fully understood the four-stool algorithm described in the assignment sheet.

The second lecture was about scopes and namespaces. I really enjoyed the topic and we did some really interesting examples in class. One thing I found Java very special is that everything is a reference to a memory address somewhere in the computer. Surprisingly, it is also possible for two module to have the same function name. But if both module is imported by the user, and if he/she wants to use the method, they must prefix it with the module name. In the lecture, we also talked about how to search for names: it will first look in the inner most scope (e.g. if you define a function inside in a function, python will first look in there), then it will look in the enclosing scope, then the current module or __main__, finally it will look in the build-ins. If they can't find the name in the build-in, an error will be raised. A name can also be declared nonlocal or global and if you don't put anything infront of the name, it will be local (which is the default). Finally, the example we discussed in class was quite tricky, I would probably did most of it wrong if I was tracing it myself. Namespaces and scopes is a new topic, I think I need to spend more time on the concept in order to grasp the concept.

The lab this week is about recursion, the handout first asked to trace two recursive functions. I found that pretty helpful, because I knew how to trace it in my head, but I was unsure the format of writing it down on paper. The programming part of lab went surprising well this week, me and my partner figured both method out. And we even tried some extra problems. I think I am improving in writing recursive functions, but I still need more practice to master recursion.