Friday, February 14, 2014

Week 5

This week we really dove into differenciating generators, lists, dictionaries, and sets; similarities and differences. This definitely helped in the process of developing our Netflix project, which I will get to in a second. I think the most interesting features I found were that generators, for example, are exhaustible, meaning that once you ask for its contents, they get automatically erased. It seems to me that they (generators) are a data structure with which you want to be very careful in case you wanted to use it. Lists are your typical array but as my partner and I found during our project, they are relatively slow to traverse (especially compared to dictionaries). And finally, I remember from my Java classes that the map data structure is one of the fastest ones out there, given that it uses either a hash table or a binary search tree as internal container (yielding an average of O(1) and O(logN) respectively for most basic operations). So as soon as I realized that dictionaries in Python are the equivalent to maps in Java, I knew that they had to be among the fastest data structures, and after testing we definitely saw it.

Our Netflix project had two basic requirements: Getting it to run under a minute, and yielding an root mean square error of less than one. My partner and I kept running into the problem of getting a runtime of over a minute. So, after much debugging, we came to two conclusions. One, we needed to use dictionaries for everything. Lists are definitely very slow. The second conclusion was that we were reading in too many cache files; about five or six. We kept taking them out as long as we were still getting a root mean square error of less than one, and, as it turns out, we ended up only using two caches. Now we have a running time of less that one minute and a root mean square error of less than 1. It has definitely been one of the most interesting and relevant projects I have worked on.

No comments:

Post a Comment