# Introduction to Dictionaries

Dictionaries are sort of like lists, except that we access them with a key, rather than with the index. A key can be a number of different objects: a string, a number, or even a tuple (which we will talk about in a moment).

Dictionaries are within "curly braces"-- `{}` -- and each key is separated by the value with a colon.

The following creates a new dictionary, and then shows how to add or edit entries.

In [1]:
basketball_wins = {'Purdue': 5,
                   'IU': 2,
                   'Northwestern': 0}

# To add a new entry
basketball_wins['Michigan'] = 5

# The same syntax updates an existing entry
basketball_wins['Purdue'] = 6

print(basketball_wins)

{'Purdue': 6, 'IU': 2, 'Northwestern': 0, 'Michigan': 5}


In [2]:
basketball_wins.items()

dict_items([('Purdue', 6), ('IU', 2), ('Northwestern', 0), ('Michigan', 5)])

You access the data in a dictionary by the key. Unlike a list, you can't access an item in a dictionary by an index number (because the index number could also be a key!)

In [None]:
basketball_wins['Purdue']

In [None]:
# But you get an KeyError if it doesn't exist

basketball_wins['Wisconsin']

Dictionary keys can be any "immutable" object in Python, but they are most often strings or numbers.

While the keys must be unique and don't change, the values can change. The following code takes in a string and counts how often each letter appears.

In [None]:
string = """
I have been one acquainted with the night.
I have walked out in rain—and back in rain.
I have outwalked the furthest city light.

I have looked down the saddest city lane.
I have passed by the watchman on his beat
And dropped my eyes, unwilling to explain.

I have stood still and stopped the sound of feet
When far away an interrupted cry
Came over houses from another street,

But not to call me back or say good-bye;
And further still at an unearthly height,
One luminary clock against the sky

Proclaimed the time was neither wrong nor right. 
I have been one acquainted with the night.
"""
string = string.lower()
letter_dict = {}
for letter in string:
    # Don't count new lines or spaces
    if letter in ['\n',' ']:
        continue
    if letter in letter_dict: # Check if the key exists
        letter_dict[letter] = letter_dict[letter] + 1
    else: # If it doesn't exist, then create it with the value 1
        letter_dict[letter] = 1
        
print(letter_dict)


### Excercise 1 

See if you can modify the code above to count how often each _word_ appears instead.

In [None]:
string = """
I have been one acquainted with the night.
I have walked out in rain—and back in rain.
I have outwalked the furthest city light.

I have looked down the saddest city lane.
I have passed by the watchman on his beat
And dropped my eyes, unwilling to explain.

I have stood still and stopped the sound of feet
When far away an interrupted cry
Came over houses from another street,

But not to call me back or say good-bye;
And further still at an unearthly height,
One luminary clock against the sky

Proclaimed the time was neither wrong nor right. 
I have been one acquainted with the night.
"""

string = string.lower()
## Your code here


## Approaches for dealing with missing dictionary keys

This pattern of either modifying an existing dictionary entry or creating a new key is very common, and there are a few approaches for handling it.

1. The first is what I did above - using an if statement to check if the entry exists, e.g.:

```python
if letter in letter_dict:
    letter_dict[letter] = letter_dict[letter] + 1
else:
    letter_dict[letter] = 1
```

2. A very similar approach is used below: instead of `if`, we use a `try...except` clause, e.g.

```python
try:
    letter_dict[letter] = letter_dict[letter] + 1
except keyError:
    letter_dict[letter] = 1
```

3. A shorter, but slightly less readable approach is to use the `get` method of a dictionary. In the code below, `letter_dict.get(letter, 0)` will return the value for the key `letter` if it exists, or it will return `0` if the key doesn't exist 

```python
letter_dict[letter] = letter_dict.get(letter, 0) + 1
```

4. Finally, the `collections` package has a [defualtdict](https://docs.python.org/3.7/library/collections.html#collections.defaultdict) which lets you create a dictionary with a built in default.

```python
import collections
letter_dict = collections.defaultdict(int)
...
letter_dict[letter] = letter_dict[letter] + 1
```

For most things in python, the language tries to have one right way to do things. In this case, I think that any of these options are just fine and are basically equivalent. Use whichever makes the most sense to you.

## Tuples

Tuples are very similar to lists. They are created with parentheses -- `()` -- rather than with square brackets. 

In [None]:
my_tuple = (4,13,'hello')

Like lists, items in a tuple can be accessed by indexing.

In [None]:
my_tuple[1]

However, tuples are "immutable", meaning that they can't be changed after they are created. So, things like "append" and "pop" won't work.

This immutability is (for complicated reasons) an important attribute of dictionary keys, and tuples can be used as dictionary keys. For example, let's say you wanted to store the population of cities in the US. You might create a dictionary like this:

In [None]:
population_dict = {('Georgia', 'Atlanta'): 498000,
              ('Illinois', 'Atlanta'): 1692,
              ('Illinois', 'Chicago'): 2750000
             }

## Example

The following code takes a csv table of city populations that I grabbed from the US Census bureau API and saved [here](https://raw.githubusercontent.com/jdfoote/Intro-to-Programming-and-Data-Science/master/resources/data/uscities.csv). The first few lines below downloads the file. The next bit of code converts the file into a dictionary that looks like the above.

In [None]:
import csv
import requests
import codecs

# This downloads the file and then opens it. You could also save it to your computer, and open it in the normal way
f = requests.get('https://raw.githubusercontent.com/jdfoote/Intro-to-Programming-and-Data-Science/master/resources/data/uscities.csv')
f_csv = csv.reader(codecs.iterdecode(f.iter_lines(), 'utf-8'))
next(f_csv) # This just skips the header row, so it isn't in our data

population_dict = {}
for row in f_csv:
     # To get these numbers, I just opened the CSV file and looked at which columns had this data
    city = row[1]
    state = row[2]
    population = int(row[0])
    if (state, city) in population_dict: # Check for the same city twice in the same state
        print(f"{(state, city)} appears twice in the data")
    else:
        population_dict[(state, city)] = population
        
# This code prints the first few items in the dictionary, to make sure it looks like it's right
print(list(population_dict.items())[:5])

It looks right, so let's press on.

By using tuples as keys, you can do things like summarize by one or the other entries in the tuple.

In [None]:
state_populations = {}
for state_city in population_dict:
    state = state_city[0] # Extract the state from the key
    city_pop = population_dict[state_city] # Extract the population from the value
    try: # If the key exists, then add the population
        state_populations[state] = state_populations[state] + city_pop
    except KeyError: # Otherwise set the value to the population
        state_populations[state] = city_pop
    
print(state_populations)

## Excercise 2

Reuse and modify the code above so that it prints a dictionary of the total population of cities that start with each letter of the alphabet. The output should look something like:

`{'A': 10205308, 'B': 12556367, ...}` 