Introduction to Python¶

for scientific computing

Jupyter¶

We'll use "Jupyter Notebook" to interact with Python.
Like Matlab's 'Live Editor'; Maple's and Mathematica's notebooks.
Runs in a web browser.

To get started:

`mas-jupyter.ncl.ac.uk`¶

Login with your usual university details
Open language.ipynb

Screenshot showing the jupyter home page

You can edit the code samples from the slides live and run them as you please.

Double-click a cell to edit it.
To run a cell's contents, use Control-Enter.
You can also use Shift-Enter to run and move to the next cell.

In [1]:

x = 1 + 1
10 * x

Out[1]:

Today's course has two parts:

Morning: the Python language¶

Why, what, how?
Basic data types and operations
Control flow

Afternoon: Python tools for scientists¶

NumPy: working with large data grids
SciPy: common numerical functions
matplotlib: in-depth plotting library

Plus advice, links to resources, exercises, ...

What is Python?¶

Interpreted, object-oriented programming language
Works on PC, Mac and Linux
Open source: free (speech, lunch)

Why Python?¶

Neat and friendly syntax

In [2]:

print("Hello, world!")

Hello, world!

Newbie-friendly
Quick to write code and quick (enough) to run

'Batteries included': e.g. JSON data handling

In [3]:

import json, random
#Data obtained from http://www.imdb.com/interfaces
with open("data/top_250_imdb.json") as data_file:
    films = json.load(data_file)

In [4]:

random.sample(films.items(), 3)

Out[4]:

[('Yôjinbô (1961)', 8.2),
 ('Batman Begins (2005)', 8.2),
 ('Das Leben der Anderen (2006)', 8.4)]

In [5]:

from statistics import mean
#This mean is just from the top 250!
mean(films.values())

Out[5]:

8.2636

In [6]:

max(films.values())

Out[6]:

9.2

In [7]:

print([name for name, score in films.items() if score == 9.2])

['The Shawshank Redemption (1994)', 'The Godfather (1972)']

More pros and cons discussed at the SciPy tutorial.

What can Python do?¶

Work with large datasets (Pandas dataframes and NumPy arrays)

In [8]:

import pandas #Data from Thomas Bland
df = pandas.read_csv("data/soliton_collision.csv", index_col=0)
df.shape

Out[8]:

(450, 1021)

In [9]:

df.head()

Out[9]:

	0	0.98	1.96	2.94	3.92	4.9	5.88	6.86	7.84	8.82	...	990.78	991.76	992.74	993.72	994.7	995.68	996.66	997.64	998.62	999.6
-22.5	1.0	0.99992	0.99991	0.99998	0.99944	0.99935	0.99995	0.99853	1.00030	1.0019	...	0.99888	1.00010	0.99949	0.99871	0.99616	0.99866	0.99587	0.99769	0.99823	1.0014
-22.4	1.0	0.99994	0.99992	1.00010	0.99947	0.99951	1.00000	0.99873	1.00030	1.0018	...	0.99885	1.00000	0.99935	0.99860	0.99643	0.99857	0.99613	0.99769	0.99813	1.0015
-22.3	1.0	0.99995	0.99993	0.99976	1.00000	0.99972	0.99986	0.99892	0.99978	1.0019	...	0.99873	0.99983	0.99903	0.99840	0.99670	0.99842	0.99643	0.99770	0.99792	1.0016
-22.2	1.0	0.99996	0.99994	0.99969	1.00010	1.00010	1.00000	0.99941	0.99972	1.0015	...	0.99851	0.99944	0.99880	0.99835	0.99725	0.99816	0.99681	0.99766	0.99771	1.0018
-22.1	1.0	0.99997	0.99995	0.99995	1.00040	1.00030	0.99980	0.99974	0.99997	1.0010	...	0.99824	0.99916	0.99843	0.99825	0.99759	0.99808	0.99702	0.99763	0.99771	1.0018

5 rows × 1021 columns

Data processing and visualisation (matplotlib and MayaVi)

In [10]:

subset = df[-7:7]

import matplotlib.pyplot as plt
plt.imshow(subset,                 #Like Matlab's pcolor()
           aspect='auto',
           extent=(0, 1000, -7, 7))

colorbar = plt.colorbar()
colorbar.ax.set_ylabel('Density $|\psi|^2$', labelpad=20, rotation=270)

plt.xlabel('time $t$')
plt.ylabel('position $z$')
plt.show()

General purpose programming language (e.g. Python runs websites)
Got a boring task to do? Automate it!

How do I get Python?¶

Won't always have this notebook interface!

At home: use the Anaconda distribution or vanilla Python
At uni: talk to your Computing officer.

Python 2 or 3?¶

Unless you're using someone else's code, use Python 3.
Some blogs might tell you it's not supported by big packages but that's not true any more.

Can try an IDE e.g. Spyder

Screenshot of Spyder from https://github.com/spyder-ide/spyder

Numeric types¶

Integers: indexing or counting:¶

In [11]:

1 + 2

Out[11]:

In [12]:

300 - 456

Out[12]:

-156

Floats: measuring continuous things.¶

In [13]:

0.1 + 0.2    #limited precision

Out[13]:

0.30000000000000004

In [14]:

0.5 - 0.3

Out[14]:

0.2

Different data types for different jobs¶

Python's numbers are friendly¶

In [15]:

-2 ** 1000            # No problems with sign or under/overflow

Out[15]:

-10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376

In [16]:

type(-2 ** 1000)

Out[16]:

int

In [17]:

1 + 1.5              # Mix int and float: result is float

Out[17]:

2.5

In [18]:

type(12 + 24.0)      #Can check types explicitly

Out[18]:

float

Golden rule: if one part of an expression is a float, the entire expression will be a float¶

Other operations¶

In [19]:

23 - 7.0

Out[19]:

16.0

In [20]:

2 * 4

Out[20]:

In [21]:

3 / 2               # division always returns a float in Python 3

Out[21]:

1.5

In [22]:

3 // 2              # double-slashes force integer division

Out[22]:

In [23]:

2 ** 3.0

Out[23]:

8.0

In [24]:

2 ^ 6               #Bitwise or -- not very useful for scientists

Out[24]:

Even more operations¶

In [25]:

(1 + 2) * (3 + 4)   #Brackets work as normal

Out[25]:

In [26]:

3 - 2*4             #Order of operations (BODMAS) as normal

Out[26]:

-5

In [27]:

27 % 5              #Modulo (remainder) operation

Out[27]:

In [28]:

abs(-2)             #Modulus (absolute value) function

Out[28]:

Advice for working with floats¶

Floats accumlate rounding errors
Testing equality is tricky (should use math.isclose)

In [29]:

x = 0.1 + 0.2
y = 0.15 + 0.15
print("%.20f\n%.20f" % (x, y))
from math import isclose
isclose(x, y)

0.30000000000000004441
0.29999999999999998890

Out[29]:

True

See Floating point guide or What every computer scientist should know about floating-point arithmetic

`complex` type¶

Python uses j for the imaginary unit $i$.
Has to have a number before it, to distinguish from a variable called j.

In [30]:

1j * 1j

Out[30]:

(-1+0j)

In [31]:

z = 2 - 4j
z + z.conjugate()  # Twice the real part

Out[31]:

(4+0j)

use cmath functions when working with complex numbers.

In [32]:

import cmath
cmath.sin(0.1 + 2j)

Out[32]:

(0.37559284993485376+3.6087412126897433j)

In [33]:

abs(cmath.exp(2j))

Out[33]:

1.0

Exercises¶

What are the types and values of the following expressions? Try to work it out by hand; then check in the notebook.

23 + 2 * 17 - 9
23 + 2 * (17 - 9.0)
5 * 6 / 7
5 * 6 // 7
5 * 6.0 // 7

2.0 ** (3 + 7 % 3) // 2
2 ** (3 + 7 % 3) / 2
4 ** 0.5
-4 ** 0.5
(1 + 1/1000) ** 1000

int: 48
float: 39.0
float: 30/7 == 4.28571...6
int: 30 // 7 == 4
float: 30.0 // 7 == 4.0

float: 8.0
float: 8.0
float: 2.0
float: -2.0
float: 2.71692... $\approx e$

Control flow: variables¶

Variables are names which refer to values.

In [34]:

x = 10
2 * x + 4

Out[34]:

In [35]:

#Prefer descriptive names over shorthand
import math
planck = 6.63e-36
red_planck = planck / (2 * math.pi)
red_planck

Out[35]:

1.0551972726992662e-36

In [36]:

name = 'Dr. John Smith' #not just numbers: more data types later
len(name)

Out[36]:

In [37]:

thing1 = 3.142   #numbers okay in variable names
thing2 = 1.618

In [38]:

3rdthing = 2.718 #except at the start

  File "<ipython-input-38-e4d50dee3627>", line 1
    3rdthing = 2.718 #except at the start
           ^
SyntaxError: invalid syntax

Some keywords are forbidden.

In [41]:

del = 'boy'

  File "<ipython-input-41-6e337587edb8>", line 1
    del = 'boy'
        ^
SyntaxError: invalid syntax

To compare variables and/or values, use two equals signs ==. More on this later.

In [39]:

t = 2

In [40]:

t + t = 4

  File "<ipython-input-40-c6ff51bde1a1>", line 1
    t + t = 4
             ^
SyntaxError: can't assign to operator

In [42]:

t + t == 4

Out[42]:

True

Quick quiz: what happens here?¶

In [43]:

x = 1
y = x
x = x * 5

What's $y$ equal to: $1$ or $5$?

In [44]:

Out[44]:

When we say y = x, we mean

Make y refer to whatever x refers to

and not

Make y refer to x

If in doubt: try experimenting!

Control flow: functions¶

Packages and the standard library have many useful functions
Still useful to write your own: reuse code, break program into smaller problems

In [45]:

def discriminant(a, b, c):
    print("a =", a, "b =", b, "c =", c)
    return b ** 2 - 4 * a * c

def keyword (define)
function name (same rules as variables)
argument list
colon to mark indentation
statements: indented with four spaces
return expression

In [46]:

discriminant(2, 3, 4)       #Give arguments values by position...

a = 2 b = 3 c = 4

Out[46]:

-23

In [47]:

discriminant(b=3, c=4, a=2) #...or explicitly by name

a = 2 b = 3 c = 4

Out[47]:

-23

Python will complain if you don't give a function the right arguments.

In [48]:

discriminant()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-48-dc883d99b76f> in <module>()
----> 1 discriminant()

TypeError: discriminant() missing 3 required positional arguments: 'a', 'b', and 'c'

In [49]:

discriminant(0, 0)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-49-05674ee3aefb> in <module>()
----> 1 discriminant(0, 0)

TypeError: discriminant() missing 1 required positional argument: 'c'

In [50]:

discriminant(a=1, a=2, a=3)

  File "<ipython-input-50-cf03c5c67bda>", line 1
    discriminant(a=1, a=2, a=3)
                     ^
SyntaxError: keyword argument repeated

Arguments can be made optional by giving them default values.

In [51]:

def greet(greeting='Hello', name='stranger'):
    print(greeting, 'to you,', name)

In [52]:

greet()

Hello to you, stranger

In [53]:

greet('David')

David to you, stranger

In [54]:

greet(name='David')

Hello to you, David

Can return more than one value at once:

In [55]:

def consecutive_squares(n):
    return n**2, (n + 1)**2

In [56]:

consecutive_squares(5)

Out[56]:

(25, 36)

The function returns a tuple (more on these later). Can unpack to get at the individual values

In [57]:

a, b = consecutive_squares(10)
a

Out[57]:

In [58]:

Out[58]:

Variable scope: context matters¶

In [59]:

a = 3
def double(a):
    a = 2 * a
    return a

In [60]:

double(6)

Out[60]:

Function arguments and variables defined in a function are local to the function body.

If there's a name conflict, stuff outside is unaffected.

In [61]:

Out[61]:

See the Python tutorial for more tips, tricks and examples---including functions that take a variable number of arguments.

Cheeky challenge¶

Write a function implements the quadratic formula.

Arguments: three numbers $a$, $b$, and $c$
Return both solutions to $ax^2 + bx + c = 0$
Return the smaller one first

Reminder: the quadratic formula is $$x = \frac {-b \pm \sqrt{b^2 - 4ac}} {2a}$$

Use math.sqrt for computing square roots. Don't forget to import!

Let's do a few tests.

$(x-4)(x+2) = x^2 + 2x - 8$ has roots $x=4, x=-2$.
$2(x-10)^2 = 2x^2 -40x + 400$ has a repeated root $x=10$.

print( quadratic_roots(1, 2, -8) )
#assert statements will error if the condition is False.
assert quadratic_roots(1, 2, -8) == (-2, 4)
assert quadratic_roots(2, -40, 400) == (10, 10)

Control flow: loops¶

Basic looping has two important parts:

for variable in ...:
range function

In [62]:

for i in range(5):
    print("Hello!")

Hello!
Hello!
Hello!
Hello!
Hello!

loop body indented with four spaces (like functions)
colon to denote indentation

Python's indexing convention¶

Something of length $N$ uses indices from $0$ to $N-1$ inclusive.

In [63]:

for i in range(5):
    print("Here's a number:")
    print(i)

Here's a number:
0
Here's a number:
1
Here's a number:
2
Here's a number:
3
Here's a number:
4

unlike Matlab, Fortran or R (where indexing starts from 1).
like C, C++, Java, Javascript

EWD831 discusses different indexing systems
Wikipedia compares across languages.

Controlling integer ranges¶

The most general form of the range function is

range(start, stop, step)

Where step has default value of 1 when it's missing.

In [64]:

for i in range(5, 10):
    print(i)

In [65]:

for i in range(10, 20, 2):
    print(i)

Python assumes that start ≤ stop.

In [66]:

for thing in range(50, 40): #can use any loop variable
    print(thing)

If you want a descending loop you need a negative step.

In [67]:

for thing in range(50, 40, -3):
    print(thing)

Cheeky challenge¶

Use a loop to compute $$5^2 + 10^2 + 15^2 + 20^2 + \dotsb + 200^2$$

#Again here's a template for you
total = 0
for ... in ...:
    total = total + ...
total

#Here's the answer you should have got:
assert total == 553500

We'll see later that we can loop over all sorts of objects---not just ranges.

In [68]:

for character in "David Matthew Robertson":
    print(character, end=".")

D.a.v.i.d. .M.a.t.t.h.e.w. .R.o.b.e.r.t.s.o.n.

This makes looping a really powerful tool in Python. It enables

Shortcuts like the enumerate function
Generator expressions and generator functions
List/Set/Dictionary comprehensions
Efficient looping techniques

Just like other languages, there are while loops and break and continue statements which are a bit less intuitive.

There's too much to go over here---but there are links in the notebook if you're curious.

Control flow: conditionals¶

A very important tool in the programmer's toolkit is the ability to do different things in different circumstances.

Enter the if statement:

In [69]:

i = 10
if i % 2 == 0:
    print(i, "is even")

10 is even

Colon, then four spaces before body statements

Main expression usually a boolean: True or False
Use comparisons like <, <=, ==, !=, >=, > to make booleans

In [70]:

1 < 2    #less than

Out[70]:

True

In [71]:

2 <= 0.2   #less than or equal

Out[71]:

False

In [72]:

3 == 3.0   #equal

Out[72]:

True

In [73]:

"cat" != "dog" #not equal

Out[73]:

True

In [74]:

x = 10
1 < x < 15 #Mathematical notation for "(1 < x) and (x < 15)"

Out[74]:

True

Let's take our previous if statement and put it in a loop.

Whenever we start a new block (line ending in a colon), we have to indent an extra four spaces.

In [75]:

for i in range(5):
    if i % 2 == 0:
        print(i, "is even")

0 is even
2 is even
4 is even

We can handle the False case with an else statement.

In [76]:

for i in range(5):
    if i % 2 == 0:
        print(i, "is even")
    else:
        print(i, "is odd")

0 is even
1 is odd
2 is even
3 is odd
4 is even

For finer control, use an if... elif... else... chain.

Here elif is short for "else if".

In [77]:

import datetime
now = datetime.datetime.now()
print("The time is", now, "and the hour is:", now.hour)
if 6 <= now.hour < 12:
    print("Good morning!")
elif now.hour < 18:
    print("Good afternoon!")
elif now.hour < 20:
    print("Good evening!")
else:
    print("Good night!")

The time is 2017-04-11 12:48:23.167619 and the hour is: 12
Good afternoon!

else is optional and always comes last.
Need to have if before any elifs.
Can have as many elifs as you like.

Cheeky challenge¶

The sign or signum function is defined by $$\operatorname{sign}(x) = \begin{cases} \phantom{-}1 & \text{if $x>0$} \\ \phantom{-}0 & \text{if $x=0$} \\ -1 & \text{if $x<0$} \end{cases}$$

Implement this as a Python function.

#And some tests:
assert sign(10) == 1
assert sign(0) == 0
assert sign(-23.4) == -1

Quick mention: can perform logical operations on booleans with and, or, and not.

In [78]:

True and False

Out[78]:

False

In [79]:

True or False

Out[79]:

True

In [80]:

not False

Out[80]:

True

In [81]:

not False and False    #careful with order of operations

Out[81]:

False

In [82]:

not (False and False)

Out[82]:

True

Data types: strings¶

Any textual data: plot labels, file names, ...
Enclosed by single (') or double quotes (")
Any Unicode character okay

In [83]:

supercal = "Supercalifragilisticexpialidocious"
starwars = 'No, I am your father'  # spaces okay
greeting = "こんにちは (Konnichiwa)" # non-Latin characters okay

Use \n to stand for a newline
Use \' or \" for literal quotes
Use \\ for a literal backslash
Spaces preserved

In [84]:

print("A short 'quote'\n     a double quote char: \"\n and newlines!")

A short 'quote'
     a double quote char: "
 and newlines!

Python is pedantic when comparing¶

In [85]:

'2' == 2            #different types!

Out[85]:

False

In [86]:

type('2'), type(2)

Out[86]:

(str, int)

In [87]:

'True' == True

Out[87]:

False

In [88]:

type('True'), type(True)

Out[88]:

(str, bool)

String methods¶

A list of handy funtions for working with strings. Full reference online.

In [89]:

vowels = "aeiou"
vowels.upper()

Out[89]:

'AEIOU'

In [90]:

vowels.lower() #already lowercase

Out[90]:

'aeiou'

In [91]:

vowels.capitalize()

Out[91]:

'Aeiou'

In [92]:

len(supercal)   #length function

Out[92]:

In [93]:

supercal.count("a")

Out[93]:

Silly example: a function which processes a yes/no prompt (y/n)

In [94]:

def handle_response(response):
    if response.startswith("y"):
        return "positive response"
    elif response.startswith("n"):
        return "negative response"
    else:
        return "unclear response"

In [95]:

handle_response("yes")

Out[95]:

'positive response'

In [96]:

handle_response("no way man that's unreasonable")

Out[96]:

'negative response'

What happens when we call with these arguments? Guess, then check in the notebook.

handle_response()
handle_response("")
handle_response("YES")
handle_response(" yes ")

handle_response()

TypeError: missing argument

handle_response("")

Unclear response: the empty string "" doesn't start with anything!

handle_response("YES")

Unclear response: upper/lowercase matters for comparison

In [97]:

'Y' == 'y'

Out[97]:

False

handle_response(" yes ")

Unclear response: first char is a space

Often useful to normalise strings to a sensible form, especially if they come from user input.

In [98]:

response = "    YeS   "
response = response.lower()
print( repr(response) )      # explicitly representation with repr()
response = response.strip()  # remove whitespace from start and end
print( repr(response) )

'    yes   '
'yes'

Also handy: str.replace:

In [99]:

x = "The news media reported today that no news is in fact good news"
x.replace("news", "FAKE NEWS!!")

Out[99]:

'The FAKE NEWS!! media reported today that no FAKE NEWS!! is in fact good FAKE NEWS!!'

Slicing¶

Remember that indexing works from $0$ to $N - 1$:

In [100]:

supercal[0]

Out[100]:

'S'

In [101]:

supercal[5]

Out[101]:

'c'

In [102]:

supercal[0:5] #like range, slicing excludes upper limit

Out[102]:

'Super'

In [103]:

supercal[-1]  #Last char

Out[103]:

's'

In [104]:

supercal[:5] + "..." + supercal[-4:] #first five, then last 4

Out[104]:

'Super...ious'

Concatenation¶

Glue strings together with "+".
For complicated gluings, or gluings of arbitrary length, use the print function or str.join

In [105]:

name = "David"
"Good morning, " + name + "."

Out[105]:

'Good morning, David.'

Use * as shorthand for repitition.

In [106]:

'thank you ' * 10

Out[106]:

'thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you '

Even more complicated string handling available:

Looping over strings¶

Awkward way:

In [107]:

example = "demo"
for i in range(len(example)):
    print(example[i])

d
e
m
o

Slick way:

In [108]:

for character in "demo":
    print(character)

d
e
m
o

Cheeky Challenge¶

Write a function to count the number of vowels in a string. Assume that we're just working with the Roman alphabet---so don't worry about variants like ë, è, é, and ê.

For bonus points, try using a loop to write this function.

In [109]:

#Here's a space to write your function

#and some tests to run
assert your_function("Hello") == 2
assert your_function(" xyz HEllO") == 2
assert your_function("Hello, sailor") == 5

Data types: lists and tuples¶

Lists: a sequence of arbitrary Python objects

In [110]:

greek_letters = ["alpha", "beta", "gamma", "delta"]
greek_letters[1] #Index just like strings: 0 to N-1.

Out[110]:

'beta'

Lists can be modified in-place

In [111]:

greek_letters[1] = "BETA (β)"
greek_letters

Out[111]:

['alpha', 'BETA (β)', 'gamma', 'delta']

Lists can contain objects of different types

In [112]:

things = ["uno", "dos", 3, supercal, 2.718]

Unless they're modified, lists have a fixed length

In [113]:

len(things)

Out[113]:

Lists are objects, so lists can even contain lists!

In [114]:

names_by_parts = [ ["David", "Robertson"], ["Cetin", "Can", "Evirgen"] ]
print( names_by_parts[0] )
print( names_by_parts[0][1] )

['David', 'Robertson']
Robertson

Quick Quiz¶

What is len(names_by_parts)?

In [115]:

len(names_by_parts)

Out[115]:

A list doesn't know anything special about what it contains

Can't access or add new list items by accident

In [117]:

greek_letters[4]

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-117-85a9bd08274d> in <module>()
----> 1 greek_letters[4]

IndexError: list index out of range

In [118]:

greek_letters[4] = 'EPSILON (ε)'

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-118-233d7087fd08> in <module>()
----> 1 greek_letters[4] = 'EPSILON (ε)'

IndexError: list assignment index out of range

Use list methods to modify the list itself, rather than its contents.
See also full list description

In [119]:

greek_letters.append("EPSILON (ε)")
greek_letters

Out[119]:

['alpha', 'BETA (β)', 'gamma', 'delta', 'EPSILON (ε)']

Other useful list methods and idoms:

In [120]:

empty_list = []
print(empty_list, len(empty_list))

[] 0

In [121]:

numbers = [5, 2, 64, 41, 27, -2, 11, 32]

In [122]:

numbers.sort()     #modifies list in place
numbers

Out[122]:

[-2, 2, 5, 11, 27, 32, 41, 64]

In [123]:

["ab", 1].sort()  # Can't compare text with numbers

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-123-c281d25ec96a> in <module>()
----> 1 ["ab", 1].sort()  # Can't compare text with numbers

TypeError: '<' not supported between instances of 'int' and 'str'

In [124]:

x = list(range(10, 20))
x

Out[124]:

[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [125]:

del x[2]  #Delete the entry with index 2 (third entry)
x

Out[125]:

[10, 11, 13, 14, 15, 16, 17, 18, 19]

In [126]:

print("POP:", x.pop(), x)

POP: 19 [10, 11, 13, 14, 15, 16, 17, 18]

In [127]:

x.reverse() #modifies in place
x

Out[127]:

[18, 17, 16, 15, 14, 13, 11, 10]

In [128]:

x.insert(4, "surprise")
x

Out[128]:

[18, 17, 16, 15, 'surprise', 14, 13, 11, 10]

NB: It's quick to extend lists at the end, but inserting or delete near the start is slower. If your list is HUGE then this can become a problem.

See also the Python wiki or these course notes.

Looping over lists is just like strings.

Warning: don't modify list structure when looping! (Modifying list values is fine)

In [129]:

colours = ["red", "orange", "yellow", "green", "blue", "indigo", "violet"]
for colour in colours:
    print(colour, "has", len(colour), "letters" )

red has 3 letters
orange has 6 letters
yellow has 6 letters
green has 5 letters
blue has 4 letters
indigo has 6 letters
violet has 6 letters

In [130]:

for colour in colours:
    colours.pop()
colours

Out[130]:

['red', 'orange', 'yellow']

In [131]:

for i, colour in enumerate(colours): #avoids range(len(colours))
    colours[i] = colour.upper()
colours

Out[131]:

['RED', 'ORANGE', 'YELLOW']

Cheeky challenge¶

The following lines will read a list of words from a data file. Use Python to find:

The first, middle and last word in the list
The percentage of words containing an e
- Hint: use str.find; or better the in operator
All two-letter words in the list (good for Scrabble)

In [132]:

with open('data/en-GB-words.txt', 'rt') as f:
    words = [line.strip() for line in f]
print(len(words), "words. Number 2001 is", words[2000])

99156 words. Number 2001 is Booth

In [133]:

N = len(words)
print(words[0], words[N//2], words[-1], sep=", ")

A, harks, études

In [134]:

count = 0
for word in words:
    if 'e' in word:
        count = count + 1
print(count, 100 * count / len(words))

63152 63.689539715196254

In [135]:

two_letter_words = []
for word in words:
    if len(word) == 2:
        two_letter_words.append(word)
print(two_letter_words)

['Ac', 'Ag', 'Al', 'Am', 'Ar', 'As', 'At', 'Au', 'Av', 'Ba', 'Be', 'Bi', 'Bk', 'Br', 'Ca', 'Cd', 'Cf', 'Ci', 'Cl', 'Cm', 'Co', 'Cr', 'Cs', 'Cu', 'Di', 'Dr', 'Ed', 'Er', 'Es', 'Eu', 'Fe', 'Fm', 'Fr', 'GE', 'Ga', 'Gd', 'Ge', 'He', 'Hf', 'Hg', 'Ho', 'Hz', 'In', 'Io', 'Ir', 'It', 'Jo', 'Jr', 'Kr', 'La', 'Le', 'Li', 'Ln', 'Lr', 'Lt', 'Lu', 'Mb', 'Md', 'Mg', 'Mn', 'Mo', 'Mr', 'Ms', 'Mt', 'Na', 'Nb', 'Nd', 'Ne', 'Ni', 'Np', 'OK', 'Ob', 'Os', 'Oz', 'Pa', 'Pb', 'Pd', 'Pl', 'Pm', 'Po', 'Pt', 'Pu', 'Ra', 'Rb', 'Rd', 'Re', 'Rh', 'Rn', 'Ru', 'Rx', 'Sb', 'Sc', 'Se', 'Si', 'Sm', 'Sn', 'Sq', 'Sr', 'St', 'Ta', 'Tb', 'Tc', 'Th', 'Ti', 'Tl', 'Tm', 'Ty', 'Ur', 'Va', 'Wm', 'Wu', 'Xe', 'Yb', 'Zn', 'Zr', 'ad', 'ah', 'am', 'an', 'as', 'at', 'ay', 'be', 'by', 'cs', 'dB', 'do', 'eh', 'em', 'es', 'ex', 'fa', 'go', 'gs', 'ha', 'he', 'hi', 'ho', 'id', 'if', 'in', 'is', 'it', 'kW', 'kc', 'ks', 'la', 'lo', 'ls', 'ma', 'me', 'mi', 'ms', 'mu', 'my', 'no', 'nu', 'of', 'oh', 'on', 'or', 'ow', 'ox', 'pH', 'pa', 'pi', 're', 'rs', 'sh', 'so', 'ti', 'to', 'ts', 'uh', 'um', 'up', 'us', 'vs', 'we', 'ye', 'yo']

Tuples¶

The same as a list, except can't be modified after creation.
Created with round brackets, not square
Still indexed from $0$ to $N-1$

In [136]:

coordinate = (1, 2, 3)
coordinate

Out[136]:

(1, 2, 3)

In [137]:

coordinate[0]

Out[137]:

In [138]:

coordinate[0] = 10

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-138-9acc16226b5e> in <module>()
----> 1 coordinate[0] = 10

TypeError: 'tuple' object does not support item assignment

In [139]:

x, y, z = coordinate      #tuple unpacking
print(x, y, z, x + y + z)

1 2 3 6

In fact, when you say return a, b from a function, what gets returned is the tuple (a, b)!

Data types: dictionaries¶

Unordered collection of pairs key -> value
Keys usually strings
"Hashmap", "Associative array"

In [140]:

david = dict(
    surname = "Robertson",
    given_names = ["David", "Matthew"],
    age = 24,
    dob = "26/06/1992",
    height = 190
)
david

Out[140]:

{'age': 24,
 'dob': '26/06/1992',
 'given_names': ['David', 'Matthew'],
 'height': 190,
 'surname': 'Robertson'}

Index by key to get/set values

In [141]:

david['age'] = "very very very very very very old"
david['age']

Out[141]:

'very very very very very very old'

Three ways to loop:¶

In [142]:

for key in david:
    print(key, end=", ")

surname, given_names, age, dob, height,

In [143]:

for value in david.values():
    print(value, end=", ")

Robertson, ['David', 'Matthew'], very very very very very very old, 26/06/1992, 190,

In [144]:

for key, value in david.items():
    print(key, "->", value)

surname -> Robertson
given_names -> ['David', 'Matthew']
age -> very very very very very very old
dob -> 26/06/1992
height -> 190

Dictionaries have a length too:

In [145]:

len(david)

Out[145]:

Python will complain if you ask for a missing key:

In [146]:

david['weight']

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-146-01a69e534c10> in <module>()
----> 1 david['weight']

KeyError: 'weight'

Can check if a key is present with in:

In [147]:

'surname' in david

Out[147]:

True

NB: The Python community tends to prefer dict.get() or exception handling when keys might be missing.

Cheeky challenge¶

The following data file contains the periodic table as a dictionary. We're going to load it into a list, and each entry of that list will be a dictionary.

In [148]:

import os
import json
with open("data/PeriodicTable.json", "rt") as f:
    table = json.loads(f.read())['elements']

In [149]:

table[0]

Out[149]:

{'appearance': 'colorless gas',
 'atomic_mass': 1.008,
 'boil': 20.271,
 'category': 'diatomic nonmetal',
 'color': None,
 'density': 0.08988,
 'discovered_by': 'Henry Cavendish',
 'melt': 13.99,
 'molar_heat': 28.836,
 'name': 'Hydrogen',
 'named_by': 'Antoine Lavoisier',
 'number': 1,
 'period': 1,
 'phase': 'Gas',
 'shells': [1],
 'source': 'https://en.wikipedia.org/wiki/Hydrogen',
 'spectral_img': 'https://en.wikipedia.org/wiki/File:Hydrogen_Spectra.jpg',
 'summary': 'Hydrogen is a chemical element with chemical symbol H and atomic number 1. With an atomic weight of 1.00794 u, hydrogen is the lightest element on the periodic table. Its monatomic form (H) is the most abundant chemical substance in the Universe, constituting roughly 75% of all baryonic mass.',
 'symbol': 'H',
 'xpos': 1,
 'ypos': 1}

Your challenges:

Which element is densest?
Create a new dictionary mapping elements' symbols to their names. For example, if D is the dictionary, D['H'] == 'Hydrogen'.
Sorted alphabetically, what's the first and last element symbol?
Sorted alphabetically, what's the first and last element name?
How many elements' symbols have a different first letter to their name?

In [150]:

max_density = 0
max_density_name = ""
for element in table:
    if element['density'] != None and element['density'] > max_density:
        max_density = element['density']
        max_density_name = element['name']
max_density, max_density_name

Out[150]:

(40.7, 'Hassium')

In [151]:

shorthand = {}
for element in table:
    symbol = element['symbol']
    name = element['name']
    shorthand[symbol] = name
shorthand

Out[151]:

{'Ac': 'Actinium',
 'Ag': 'Silver',
 'Al': 'Aluminium',
 'Am': 'Americium',
 'Ar': 'Argon',
 'As': 'Arsenic',
 'At': 'Astatine',
 'Au': 'Gold',
 'B': 'Boron',
 'Ba': 'Barium',
 'Be': 'Beryllium',
 'Bh': 'Bohrium',
 'Bi': 'Bismuth',
 'Bk': 'Berkelium',
 'Br': 'Bromine',
 'C': 'Carbon',
 'Ca': 'Calcium',
 'Cd': 'Cadmium',
 'Ce': 'Cerium',
 'Cf': 'Californium',
 'Cl': 'Chlorine',
 'Cm': 'Curium',
 'Cn': 'Copernicium',
 'Co': 'Cobalt',
 'Cr': 'Chromium',
 'Cs': 'Cesium',
 'Cu': 'Copper',
 'Db': 'Dubnium',
 'Ds': 'Darmstadtium',
 'Dy': 'Dysprosium',
 'Er': 'Erbium',
 'Es': 'Einsteinium',
 'Eu': 'Europium',
 'F': 'Fluorine',
 'Fe': 'Iron',
 'Fl': 'Flerovium',
 'Fm': 'Fermium',
 'Fr': 'Francium',
 'Ga': 'Gallium',
 'Gd': 'Gadolinium',
 'Ge': 'Germanium',
 'H': 'Hydrogen',
 'He': 'Helium',
 'Hf': 'Hafnium',
 'Hg': 'Mercury (element)',
 'Ho': 'Holmium',
 'Hs': 'Hassium',
 'I': 'Iodine',
 'In': 'Indium',
 'Ir': 'Iridium',
 'K': 'Potassium',
 'Kr': 'Krypton',
 'La': 'Lanthanum',
 'Li': 'Lithium',
 'Lr': 'Lawrencium',
 'Lu': 'Lutetium',
 'Lv': 'Livermorium',
 'Mc': 'Moscovium',
 'Md': 'Mendelevium',
 'Mg': 'Magnesium',
 'Mn': 'Manganese',
 'Mo': 'Molybdenum',
 'Mt': 'Meitnerium',
 'N': 'Nitrogen',
 'Na': 'Sodium',
 'Nb': 'Niobium',
 'Nd': 'Neodymium',
 'Ne': 'Neon',
 'Nh': 'Nihonium',
 'Ni': 'Nickel',
 'No': 'Nobelium',
 'Np': 'Neptunium',
 'O': 'Oxygen',
 'Og': 'Oganesson',
 'Os': 'Osmium',
 'P': 'Phosphorus',
 'Pa': 'Protactinium',
 'Pb': 'Lead',
 'Pd': 'Palladium',
 'Pm': 'Promethium',
 'Po': 'Polonium',
 'Pr': 'Praseodymium',
 'Pt': 'Platinum',
 'Pu': 'Plutonium',
 'Ra': 'Radium',
 'Rb': 'Rubidium',
 'Re': 'Rhenium',
 'Rf': 'Rutherfordium',
 'Rg': 'Roentgenium',
 'Rh': 'Rhodium',
 'Rn': 'Radon',
 'Ru': 'Ruthenium',
 'S': 'Sulfur',
 'Sb': 'Antimony',
 'Sc': 'Scandium',
 'Se': 'Selenium',
 'Sg': 'Seaborgium',
 'Si': 'Silicon',
 'Sm': 'Samarium',
 'Sn': 'Tin',
 'Sr': 'Strontium',
 'Ta': 'Tantalum',
 'Tb': 'Terbium',
 'Tc': 'Technetium',
 'Te': 'Tellurium',
 'Th': 'Thorium',
 'Ti': 'Titanium',
 'Tl': 'Thallium',
 'Tm': 'Thulium',
 'Ts': 'Tennessine',
 'U': 'Uranium',
 'V': 'Vanadium',
 'W': 'Tungsten',
 'Xe': 'Xenon',
 'Y': 'Yttrium',
 'Yb': 'Ytterbium',
 'Zn': 'Zinc',
 'Zr': 'Zirconium'}

In [152]:

symbols = list(shorthand)
symbols.sort()
symbols[0], symbols[-1]

Out[152]:

('Ac', 'Zr')

In [153]:

names = list(shorthand.values())
names.sort()
names[0], names[-1]

Out[153]:

('Actinium', 'Zirconium')

In [154]:

oddballs = {}
for symbol, name in shorthand.items():
    if name[0] != symbol[0]:
        oddballs[symbol] = name
oddballs

Out[154]:

{'Ag': 'Silver',
 'Au': 'Gold',
 'Fe': 'Iron',
 'Hg': 'Mercury (element)',
 'K': 'Potassium',
 'Na': 'Sodium',
 'Pb': 'Lead',
 'Sb': 'Antimony',
 'Sn': 'Tin',
 'W': 'Tungsten'}

After lunch:¶

Can will give a crash course in Python's scientific libraries
Some more exercises, chances to practise

Introduction to Python¶

Jupyter¶

mas-jupyter.ncl.ac.uk¶

Morning: the Python language¶

Afternoon: Python tools for scientists¶

What is Python?¶

Why Python?¶

What can Python do?¶

How do I get Python?¶

Python 2 or 3?¶

Numeric types¶

Integers: indexing or counting:¶

Floats: measuring continuous things.¶

Different data types for different jobs¶

Python's numbers are friendly¶

Golden rule: if one part of an expression is a float, the entire expression will be a float¶

Other operations¶

Even more operations¶

Advice for working with floats¶

complex type¶

Exercises¶

Control flow: variables¶

Quick quiz: what happens here?¶

Control flow: functions¶

Variable scope: context matters¶

Cheeky challenge¶

Control flow: loops¶

Python's indexing convention¶

Controlling integer ranges¶

Cheeky challenge¶

Control flow: conditionals¶

Cheeky challenge¶

Data types: strings¶

Python is pedantic when comparing¶

String methods¶

Slicing¶

Concatenation¶

Looping over strings¶

Cheeky Challenge¶

Data types: lists and tuples¶

Quick Quiz¶

Cheeky challenge¶

Tuples¶

Data types: dictionaries¶

Three ways to loop:¶

Cheeky challenge¶

After lunch:¶

That's all folks!¶

Feedback, comments, questions?¶

d.m.robertson@newcastle.ac.uk¶

`mas-jupyter.ncl.ac.uk`¶

`complex` type¶