# Opening, closing, reading, and writing to files

This notebook will introduce the idea of loading, reading, and writing to files in Python.

Manipulating files in python is typically a two step process. We first use `open` to create a "file handle". This gives us a variable that we can then use to access the file.

When we are done with the file, we can use `close` to close it.

In [None]:
f = open('./test.txt', 'w') # The 'w' string says that we are opening this file to write

In [None]:
f.write('Hello, world\n') # The \n is the newline character
f.write('How are you?\n')
f.writelines(['Line 3\n', 'Line 4\n'])

In [None]:
f.close()

In [None]:
# Now that it's closed, writing won't work any more
f.write('test')

The one problem with that approach is that we have to remember to close the file. By using `with` we avoid this and the file closes automatically. This code does the same thing as above:

In [None]:
with open('./test.txt', 'w') as f:
    f.write('Hello, world\n')
    f.write('How are you?\n')
    f.writelines(['Line 3\n', 'Line 4\n'])

We do a similar operation to read a file

In [None]:
with open('./test.txt', 'r') as f: # The 'r' is to open for reading
    for line in f.readlines():
        print(line)

## Libraries that make reading and writing easier

It is rare that we will operate on files this directly; the `csv` module helps when reading and writing csv files, and `pandas`, which we will come to soon, does a great job of making reading and writing to files simple.

In [None]:
# CSV example
import csv
import random

with open('./csv_test.csv', 'w', newline='') as fh:
    f = csv.writer(fh)
    f.writerow(['Y', 'X1'])
    for i in range(10):
        x1 = random.random()
        y = x1 + random.random()
        f.writerow([y,x1])

In [None]:
with open('csv_test.csv', 'r') as fh:
    f = csv.reader(fh)
    header = next(f)
    for row in f:
        print(f"Y = {row[0]} and X = {row[1]}")

In [None]:
## Pandas example; going from raw data to a plot in two lines

import pandas as pd
import seaborn as sns

df = pd.read_csv('csv_test.csv')
df.plot.scatter(x = 'X1', y = 'Y')