List vs. Array vs. Dictionnary

_CPC_Naity · December 18, 2015, 7:19pm

Hello,

I am sorry if the question sounds stupid but I am kind of learning programming on my own using c#.

Q1: I am wondering what are the typical use cases for array, list and dictionnary?

I used a bit of each so far, but I am not very confident when it comes to chose one over the other. My undertstanding so far is that array are “stupid” storage places of “infinite dimentions” (2^64-ish), Lists are intelligent 1D storage places and Dictionnary intelligent “1,5D” storage where I can link a key with a value.
Q2: is that right?

Q3: what is the advantage of dictionary over lists?
Until today I thought dictionary is used when I want to pair 2 pieces of data, with the drawback that it is easy to find a value based on a key but not to find back a key based on the value. I used it for example in the “MIME Type” challenge, where it worked well, and in the Scrabble challenge where it worked, I think, less well
(I created a dictionnary with all the words and their associated points value, and ended up with this construct which might go against the idea of a dictionary:

foreach(KeyValuePair<string, int> entry in dictWordValues) 
{
    if(IsWordPlayable (entry.Key, LETTERS) & entry.Value > bestValue)
    { ... }}

Today, while solving “Skynet: the Virus” (Ambush: check \o/), I decided to use lists for the first time. I made a list of links. I imagined the list to be just some pimped up arrays, and then I discovered I was (partially) wrong.

if (linkList.Exists(agentLink)) {...}

This did not compile. I took me a little to understand what is this Predicate the MSDN help wants me to give as datatype. And now it seems both very convenient and very “powerful”. Even more powerful then dictionary since I can access and filter the object of my list based on all public properties and method I define for them. I am not limited to 2. For instance, with that new knowledge in mind, I would approach the problem differently (I used 2 lists of strings, one for the links and the other for the removed links. Instead I could define a link class with both nodes as datas, as well as the string representation of the link, a boolean telling weither the link has been severed and any other informations (# of neighbours link, if it connects to a gateway, if it could be used to “trap” the agent and shorten the puzzle… etc) and just access them from the list).
Q4: Is this best practice?
Q5: Should I manipulate an object stored in a list (in the Skynet challenge, set a “link cut” boolean to true when cuting a link)? If yes, what is the best way to do it?

Thanks in advance, and sorry for the long “narrative” post. It is easier for me to reflect my experience “in the first person” to define the frame of my questions

Cheers

chr-m · December 18, 2015, 8:36pm

Not at all a stupid question. Choosing the right datastructure is very important.

Lets start at the beginning. When you declare a variable you are actually just giving a name to a location in memory. Everytime you use the variable, the cpu looks at the same memory location and uses the value that’s stored there. This lookup takes practically no time.

An array is a continuous block of memory. That’s why you have to know the type and the number of elements when you declare it. The lookup is nearly the same, because the cpu knows the location of the first element, the size of the elements and the number of the element you want to access.

Arrays are good when your algorithm has to acces the elements out of order (aka. random access). They are bad when you don’t know the number of elements beforehand, because you can’t just make the block of memory larger. The memory behind the array might already be allocated for something else. You would have to allocate a new, larger block and copy the values from the smaller array.

A list is not a continuous block of memory. The elements are scattered over the memory. Every element has two parts, the value of the element and a link to the next element. To get to a specific element the cpu has to access all elements before it and follow the links.

That means lists are bad for random access and good when you can’t predict the number of elements, or when the collection has to grow/shrink repeatedly.

What’s better when you need random acces and can’t predict the number of elements depends on your algorithm, but today you have so much memory that most of the time you can just make a fat array with enough room at the end.

Dictionaries (aka. hash-tables, associative arrays) are not so easy to explain in two sentences. Important is that you want to access the elements with a key and not a number. If that’s how you need to access the elements hash-tables can make your code very clean looking, easy to understand and fast.

If you want to choose the right datastructure think about the following.

Do you know the number of elements beforehand? If not, do you know the maximum number of elements and can you make an array big enough to hold them all?

Does the collection need to grow/shrink? When yes, how often?

Do you access the elements out of order, or always from the first element to the last?

_CPC_Naity · December 18, 2015, 8:52pm

Thanks for clearing up the part about arrays. It raises up a few questions but I’ll wait a little before posting them, to prevent the thread to turn too messy

chr-m · December 18, 2015, 10:02pm

The MIME type puzzle is the perfect example for good use of a hash-table. Just try to solve that puzzle without a hash-table. It will probably be a lot slower and definitely a lot uglier.

If I remember correctly, you don’t need any collection in the Scrabble puzzle. You only need to iterate over the elements once, so you don’t need to save them.

I believe the thing you find so convenient about lists is not specific to lists. I don’t know much about C#, but List is probably a subclass of Collection and implements some kind of Compareable interface like all the other Collections. You can do the same thing with the other collections.

_CPC_Naity · December 19, 2015, 4:55am

Thanks chrm,

As I mentioned yesterday, this rises a few new questions. I am going to keep mumeroting them as not all from the first post are answered

Regarding Arrays, you said

They are bad when you don’t know the number of elements beforehand, because you can’t just make the block of memory larger.

Q6: : Is this really a bad feature?
I suppose this could make the behavior of a program very predictable: if I aim to deploy on different platforms, I suppose I could take advantage of this to define a maximum number of element I want in my array. This way I can predict my array max size and react accordingly if it gets bigger (I suppose it is nicer to catch an exception on an array reaching its limit size than having a program crash for an Out Of Memory or a Memory Access Violöation. I could even react to the exception (shut down program, replace last element, build a ring buffer … \o/ )

Regarding list, if I understood correctly, the short version is that lists are a list of element which are not continuous in memory. This means browsing them needs more computation but adding new elements will never lead to reallocate big memory blocs.
Regarding dictionnaries, I could compare them to lists of Values with an index (Key) which isn’t necessary an integer. They should be used when browsing a list is always done with the same key (now I understand what you said the MIME Puzzle is a perfect example for Disctionary use. I chould define a DataType class with properties such as “extension”, “MIME”, “recommended opening program”, “?multimedia”…etc Use a list and browse it like

DataTypeObject = myDatatypeList.Find( DT where DT.extension == fileExtension);

but this isn’t considered as best practice. This is a case where one should take advantage fo dictionaries.
Q7: Did I summarized your posts correctly?

FYI: Regarding the Scrabble puzzle, I “needed” one for the approach I chose (might approach it differently now). I chose to nake a dictionnary<char, int> with the value of each letter and another dictionary<string, int> with all the availables words. The idea was to calculate the value of each words beforehand and, if a word is playable, compare its value with the max playable value so far

CvxFous · December 19, 2015, 1:38pm

@Naity what you’re describing right now is a common problem in choosing the appropriate data type.

Array are bad for unknown size because you have to either allocate a large size to be sure that it contain all your datas, or reallocate every time you need more space.

Allocating a lot of space is not a good practice because it’s not scalable, is may work on a specific case or for now, but with the motto “never trust user input”, you can have a user that’ll purposely try to break your limit, so then you’ll have two choices, either make a bigger array, or reallocate the array.

Reallocating the array is tremendously bad, if you have an array of 50 elements and you reallocate, what you do is copy the 50 elements, allocate 50+x elements in another array, and push back all elements into the new array.

Lists are useful when adding a lot of informations because each node is separated from each other, which means reallocate is just creating a new node and linking it to the previous end. That’s why list are often called linked list. You may look on how to do it in basic C to see how it’s convenient for this use case.

Bad use case for list is when you need to access it really often, because you sometimes need to go through the whole list to access one element.

That’s where the map (or dictionary) is useful. It’s a kind of associative array which is useful for accessing in limited time. I dunno how it is costly in creation though, might be the same problem as arrays.

There are other useful data types, like trie, which are an improvement of linked list.

chr-m · December 19, 2015, 1:57pm

If you don’t know the number of elements beforehand and want to use an array regardless, because of the fast lookup, you have several problems.

You have to make an assumption about the maximum number of elements. That assumption can be wrong and someone might not be able to run your program at all. To handle that you can add code to shrink/grow the array on demand. But that just adds ugly boilerplate code that has nothing to do with the algorithm. Also you still have to guess by how much your array has to grow each time. If that assumption is off, your algorithm either needs a lot of time to copy the array around, or consumes a lot of memory for nothing.

Your summary about lists and dictionaries is good.

Your algorithm in the Scrabble puzzle seems to be: make a dictionary with each word/points pair and then look up each pair at most once. Instead you could just compute the points for a word when you need them.

edit: Well, I didn’t see the post from @CvxFous, but it seems we agree.

Aries · December 19, 2015, 5:09pm

When writing a program, you should always make the distinction between the ADTs you need (in term of properties and operations) and the implementations you pick for them. Your algorithm should not rely nor be dependant on the seconds. Choosing the right implementation could significantly enhance how your algo performance, but not so much as devising a better algorithm.

Of course, changing an ADT implementation eventually comes down to changing the underlying algorithm, but it implies reasoning at a lower level and by only focusing on it, you will end with a fast slow program, that is a program which is fast at doing slow things.

In practise, for all the puzzles I’ve resolved in CG, I have only been using naive ADT implementations. Per instance, for the MIME puzzle, I’ve conceptually used a map (or dictionary), but my implementation was just a linked list of key-value pairs and I didn’t bother using faster implementation relying on tree or hash map.

Thus, as a beginner, you should spend time playing with ADT implementations and learn their advantages and weakness, but to the point of missing that, in the end, it is how they are used which truly matters.

_CPC_Naity · December 19, 2015, 7:02pm

Thanks all for the answers and comments.

Florent, thanks for showing me the terminology for Abstract Data Types.

One last question, from the first ones, regarding lists: Is it a good idea to modify an element from a list, an what is the best way to do so?

Cheers

chr-m · December 19, 2015, 7:43pm

It depends. In Lisp the answer would be: OMG NO!

If you work with lists and functions that consume lists to produce new lists you will probably end up with lists that share structure (have elements pointing to the same memory locations), and changing an element might affect not only one list.

The best way to work with lists is to only insert/drop elements from the front, which works instantly!

Think about what happens when you add a new element to the front of a list. The Cpu creates a new pair somewhere in memory. The value part of the pair will be whatever you set it to. The link will point to the beginning of the old list. That’s blazingly fast compared to reallocating an array that can’t hold another element, but the two lists will share structure.

_CPC_Naity · December 19, 2015, 8:09pm

Why does changing an element necessarly imply to change my memory organisation? I supposed that as long as I change the value inside a “non-dynamic” (not sure about the terminology) data structure everything would be fine memory-wise.

Example: from the Skynet Virus challenge, I was asking myself if / how to create a Link class with a boolean Cut property. Seting this property to true would not - so far I know - change the addresses where my datas are stored. I just replace a byte 0x00 with 0x01 (… right? ^^).

To get back to the original question… I started the Winamax challenge. I defined a Card class containing the color as a char, a numeric representation of the card strength and an override to the ToString(); method.
As both players hand size will vary over time, I should not work with a fixed size array and I do not need to index it in a “non numerical” way thus I should use a List… Am I correct?
I can think of a different implementation that could work with every datatype. I am just trying to put the theory discussed in the previous 10 posts into praxis. ^^

chr-m · December 19, 2015, 8:29pm

You can change the elements, as long as you make sure that you don’t have multiple lists pointing to the same elements.

A list is the right idea, but a Queue is an even better fit for a deck of cards. https://en.wikipedia.org/wiki/Queue_(abstract_data_type)

Aries · December 19, 2015, 8:47pm

For the sake of clarity and sanity of mind of our gentle readers: in C#, which seems to be the language used by Naity, a List implements the IList interface (which support indexing) using an array internally. Thus, it is basically what C++ and everybody else call a Vector (including Java, even if Vector has been deprecated in favor of ArrayList most of the time… and C# also has an ArrayList but which has been deprecated for other reasons). Moreover, C# also offers a LinkedList which is not implementing IList because it does not support indexing. From this point of view, I prefer the Java LinkedList which fully implements indexing with “bad” performance. SNAFU.

chr-m · December 19, 2015, 11:55pm

Yes, I was thinking about a more basic list, like pointers in C or cons cells in Lisp. Sorry if that was confusing.

_CPC_Naity · December 20, 2015, 8:12pm

@Florent : Thanks for your additional inputs. This was very instructive.

@chrm : Wait… Oo. I hate myself. Having a FIFO for a deck of card is so obvious that… well… I suppose I did not think about it, In the other hand, I don’t know why, but I was not thinking of queus as an ADT but as a thread-safe transfer mechanism for asynchron processes. I did not think I could use it within one single thread. But when you mentioned it, it made so much sense. Thanks a lot

lingmaaki · October 7, 2016, 6:38am

Hope this will help you to know more about…C# List

Ling