Introduction to Python

for scientific computing

The Python logo

Jupyter

  • We'll use "Jupyter Notebook" to interact with Python.
  • Like Matlab's 'Live Editor'; Maple's and Mathematica's notebooks.
  • Runs in a web browser.

To get started:

mas-jupyter.ncl.ac.uk

  • Login with your usual university details
  • Open language.ipynb

Screenshot showing the jupyter home page

You can edit the code samples from the slides live and run them as you please.

  • Double-click a cell to edit it.
  • To run a cell's contents, use Control-Enter.
  • You can also use Shift-Enter to run and move to the next cell.
In [1]:
x = 1 + 1
10 * x
Out[1]:
20

Today's course has two parts:

Morning: the Python language

  • Why, what, how?
  • Basic data types and operations
  • Control flow

Afternoon: Python tools for scientists

  • NumPy: working with large data grids
  • SciPy: common numerical functions
  • matplotlib: in-depth plotting library

Plus advice, links to resources, exercises, ...

What is Python?

  • Interpreted, object-oriented programming language
  • Works on PC, Mac and Linux
  • Open source: free (speech, lunch)

Why Python?

  • Neat and friendly syntax
In [2]:
print("Hello, world!")
Hello, world!
  • Newbie-friendly
  • Quick to write code and quick (enough) to run
In [3]:
import json, random
#Data obtained from http://www.imdb.com/interfaces
with open("data/top_250_imdb.json") as data_file:
    films = json.load(data_file)
In [4]:
random.sample(films.items(), 3)
Out[4]:
[('Yôjinbô (1961)', 8.2),
 ('Batman Begins (2005)', 8.2),
 ('Das Leben der Anderen (2006)', 8.4)]
In [5]:
from statistics import mean
#This mean is just from the top 250!
mean(films.values())
Out[5]:
8.2636
In [6]:
max(films.values())
Out[6]:
9.2
In [7]:
print([name for name, score in films.items() if score == 9.2])
['The Shawshank Redemption (1994)', 'The Godfather (1972)']

More pros and cons discussed at the SciPy tutorial.

What can Python do?

  • Work with large datasets (Pandas dataframes and NumPy arrays)
In [8]:
import pandas #Data from Thomas Bland
df = pandas.read_csv("data/soliton_collision.csv", index_col=0)
df.shape
Out[8]:
(450, 1021)
In [9]:
df.head()
Out[9]:
0 0.98 1.96 2.94 3.92 4.9 5.88 6.86 7.84 8.82 ... 990.78 991.76 992.74 993.72 994.7 995.68 996.66 997.64 998.62 999.6
-22.5 1.0 0.99992 0.99991 0.99998 0.99944 0.99935 0.99995 0.99853 1.00030 1.0019 ... 0.99888 1.00010 0.99949 0.99871 0.99616 0.99866 0.99587 0.99769 0.99823 1.0014
-22.4 1.0 0.99994 0.99992 1.00010 0.99947 0.99951 1.00000 0.99873 1.00030 1.0018 ... 0.99885 1.00000 0.99935 0.99860 0.99643 0.99857 0.99613 0.99769 0.99813 1.0015
-22.3 1.0 0.99995 0.99993 0.99976 1.00000 0.99972 0.99986 0.99892 0.99978 1.0019 ... 0.99873 0.99983 0.99903 0.99840 0.99670 0.99842 0.99643 0.99770 0.99792 1.0016
-22.2 1.0 0.99996 0.99994 0.99969 1.00010 1.00010 1.00000 0.99941 0.99972 1.0015 ... 0.99851 0.99944 0.99880 0.99835 0.99725 0.99816 0.99681 0.99766 0.99771 1.0018
-22.1 1.0 0.99997 0.99995 0.99995 1.00040 1.00030 0.99980 0.99974 0.99997 1.0010 ... 0.99824 0.99916 0.99843 0.99825 0.99759 0.99808 0.99702 0.99763 0.99771 1.0018

5 rows × 1021 columns

  • Data processing and visualisation (matplotlib and MayaVi)
In [10]:
subset = df[-7:7]

import matplotlib.pyplot as plt
plt.imshow(subset,                 #Like Matlab's pcolor()
           aspect='auto',
           extent=(0, 1000, -7, 7))

colorbar = plt.colorbar()
colorbar.ax.set_ylabel('Density $|\psi|^2$', labelpad=20, rotation=270)

plt.xlabel('time $t$')
plt.ylabel('position $z$')
plt.show()
  • General purpose programming language (e.g. Python runs websites)
  • Got a boring task to do? Automate it!

How do I get Python?

Won't always have this notebook interface!

Python 2 or 3?

  • Unless you're using someone else's code, use Python 3.
  • Some blogs might tell you it's not supported by big packages but that's not true any more.

Can try an IDE e.g. Spyder

Screenshot of Spyder from https://github.com/spyder-ide/spyder

Numeric types

Integers: indexing or counting:

In [11]:
1 + 2
Out[11]:
3
In [12]:
300 - 456
Out[12]:
-156

Floats: measuring continuous things.

In [13]:
0.1 + 0.2    #limited precision
Out[13]:
0.30000000000000004
In [14]:
0.5 - 0.3
Out[14]:
0.2

Different data types for different jobs

Python's numbers are friendly

In [15]:
-2 ** 1000            # No problems with sign or under/overflow
Out[15]:
-10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376
In [16]:
type(-2 ** 1000)
Out[16]:
int
In [17]:
1 + 1.5              # Mix int and float: result is float
Out[17]:
2.5
In [18]:
type(12 + 24.0)      #Can check types explicitly
Out[18]:
float

Golden rule: if one part of an expression is a float, the entire expression will be a float

Other operations

In [19]:
23 - 7.0
Out[19]:
16.0
In [20]:
2 * 4
Out[20]:
8
In [21]:
3 / 2               # division always returns a float in Python 3
Out[21]:
1.5
In [22]:
3 // 2              # double-slashes force integer division
Out[22]:
1
In [23]:
2 ** 3.0
Out[23]:
8.0
In [24]:
2 ^ 6               #Bitwise or -- not very useful for scientists
Out[24]:
4

Even more operations

In [25]:
(1 + 2) * (3 + 4)   #Brackets work as normal
Out[25]:
21
In [26]:
3 - 2*4             #Order of operations (BODMAS) as normal
Out[26]:
-5
In [27]:
27 % 5              #Modulo (remainder) operation
Out[27]:
2
In [28]:
abs(-2)             #Modulus (absolute value) function
Out[28]:
2

Advice for working with floats

  • Floats accumlate rounding errors
  • Testing equality is tricky (should use math.isclose)
In [29]:
x = 0.1 + 0.2
y = 0.15 + 0.15
print("%.20f\n%.20f" % (x, y))
from math import isclose
isclose(x, y)
0.30000000000000004441
0.29999999999999998890
Out[29]:
True

complex type

  • Python uses j for the imaginary unit $i$.
  • Has to have a number before it, to distinguish from a variable called j.
In [30]:
1j * 1j
Out[30]:
(-1+0j)
In [31]:
z = 2 - 4j
z + z.conjugate()  # Twice the real part
Out[31]:
(4+0j)
  • use cmath functions when working with complex numbers.
In [32]:
import cmath
cmath.sin(0.1 + 2j)
Out[32]:
(0.37559284993485376+3.6087412126897433j)
In [33]:
abs(cmath.exp(2j))     
Out[33]:
1.0

Exercises

What are the types and values of the following expressions? Try to work it out by hand; then check in the notebook.

  • 23 + 2 * 17 - 9
  • 23 + 2 * (17 - 9.0)
  • 5 * 6 / 7
  • 5 * 6 // 7
  • 5 * 6.0 // 7
  • 2.0 ** (3 + 7 % 3) // 2
  • 2 ** (3 + 7 % 3) / 2
  • 4 ** 0.5
  • -4 ** 0.5
  • (1 + 1/1000) ** 1000
  • int: 48
  • float: 39.0
  • float: 30/7 == 4.28571...6
  • int: 30 // 7 == 4
  • float: 30.0 // 7 == 4.0
  • float: 8.0
  • float: 8.0
  • float: 2.0
  • float: -2.0
  • float: 2.71692... $\approx e$

Control flow: variables

Variables are names which refer to values.

In [34]:
x = 10
2 * x + 4
Out[34]:
24
In [35]:
#Prefer descriptive names over shorthand
import math
planck = 6.63e-36
red_planck = planck / (2 * math.pi)
red_planck
Out[35]:
1.0551972726992662e-36
In [36]:
name = 'Dr. John Smith' #not just numbers: more data types later
len(name)
Out[36]:
14
In [37]:
thing1 = 3.142   #numbers okay in variable names
thing2 = 1.618
In [38]:
3rdthing = 2.718 #except at the start
  File "<ipython-input-38-e4d50dee3627>", line 1
    3rdthing = 2.718 #except at the start
           ^
SyntaxError: invalid syntax
In [41]:
del = 'boy'
  File "<ipython-input-41-6e337587edb8>", line 1
    del = 'boy'
        ^
SyntaxError: invalid syntax

To compare variables and/or values, use two equals signs ==. More on this later.

In [39]:
t = 2
In [40]:
t + t = 4
  File "<ipython-input-40-c6ff51bde1a1>", line 1
    t + t = 4
             ^
SyntaxError: can't assign to operator
In [42]:
t + t == 4
Out[42]:
True

Quick quiz: what happens here?

In [43]:
x = 1
y = x
x = x * 5

What's $y$ equal to: $1$ or $5$?

In [44]:
y
Out[44]:
1

When we say y = x, we mean

  • Make y refer to whatever x refers to

and not

  • Make y refer to x

If in doubt: try experimenting!

Control flow: functions

  • Packages and the standard library have many useful functions
  • Still useful to write your own: reuse code, break program into smaller problems
In [45]:
def discriminant(a, b, c):
    print("a =", a, "b =", b, "c =", c)
    return b ** 2 - 4 * a * c
  • def keyword (define)
  • function name (same rules as variables)
  • argument list
  • colon to mark indentation
  • statements: indented with four spaces
  • return expression
In [46]:
discriminant(2, 3, 4)       #Give arguments values by position...
a = 2 b = 3 c = 4
Out[46]:
-23
In [47]:
discriminant(b=3, c=4, a=2) #...or explicitly by name
a = 2 b = 3 c = 4
Out[47]:
-23

Python will complain if you don't give a function the right arguments.

In [48]:
discriminant()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-48-dc883d99b76f> in <module>()
----> 1 discriminant()

TypeError: discriminant() missing 3 required positional arguments: 'a', 'b', and 'c'
In [49]:
discriminant(0, 0)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-49-05674ee3aefb> in <module>()
----> 1 discriminant(0, 0)

TypeError: discriminant() missing 1 required positional argument: 'c'
In [50]:
discriminant(a=1, a=2, a=3)
  File "<ipython-input-50-cf03c5c67bda>", line 1
    discriminant(a=1, a=2, a=3)
                     ^
SyntaxError: keyword argument repeated

Arguments can be made optional by giving them default values.

In [51]:
def greet(greeting='Hello', name='stranger'):
    print(greeting, 'to you,', name)
In [52]:
greet()
Hello to you, stranger
In [53]:
greet('David')
David to you, stranger
In [54]:
greet(name='David')
Hello to you, David

Can return more than one value at once:

In [55]:
def consecutive_squares(n):
    return n**2, (n + 1)**2
In [56]:
consecutive_squares(5)
Out[56]:
(25, 36)

The function returns a tuple (more on these later). Can unpack to get at the individual values

In [57]:
a, b = consecutive_squares(10)
a
Out[57]:
100
In [58]:
b
Out[58]:
121

Variable scope: context matters

In [59]:
a = 3
def double(a):
    a = 2 * a
    return a
In [60]:
double(6)
Out[60]:
12

Function arguments and variables defined in a function are local to the function body.

If there's a name conflict, stuff outside is unaffected.

In [61]:
a
Out[61]:
3

See the Python tutorial for more tips, tricks and examples---including functions that take a variable number of arguments.

Cheeky challenge

Write a function implements the quadratic formula.

  • Arguments: three numbers $a$, $b$, and $c$
  • Return both solutions to $ax^2 + bx + c = 0$
  • Return the smaller one first

Reminder: the quadratic formula is $$x = \frac {-b \pm \sqrt{b^2 - 4ac}} {2a}$$

  • Use math.sqrt for computing square roots. Don't forget to import!

Let's do a few tests.

  • $(x-4)(x+2) = x^2 + 2x - 8$ has roots $x=4, x=-2$.
  • $2(x-10)^2 = 2x^2 -40x + 400$ has a repeated root $x=10$.
print( quadratic_roots(1, 2, -8) )
#assert statements will error if the condition is False.
assert quadratic_roots(1, 2, -8) == (-2, 4)
assert quadratic_roots(2, -40, 400) == (10, 10)

Control flow: loops

Basic looping has two important parts:

  • for variable in ...:
  • range function
In [62]:
for i in range(5):
    print("Hello!")
Hello!
Hello!
Hello!
Hello!
Hello!
  • loop body indented with four spaces (like functions)
  • colon to denote indentation

Python's indexing convention

Something of length $N$ uses indices from $0$ to $N-1$ inclusive.

In [63]:
for i in range(5):
    print("Here's a number:")
    print(i)
Here's a number:
0
Here's a number:
1
Here's a number:
2
Here's a number:
3
Here's a number:
4
  • unlike Matlab, Fortran or R (where indexing starts from 1).
  • like C, C++, Java, Javascript
  • EWD831 discusses different indexing systems
  • Wikipedia compares across languages.

Controlling integer ranges

The most general form of the range function is

range(start, stop, step)

Where step has default value of 1 when it's missing.

In [64]:
for i in range(5, 10):
    print(i)
5
6
7
8
9
In [65]:
for i in range(10, 20, 2):
    print(i)
10
12
14
16
18

Python assumes that start ≤ stop.

In [66]:
for thing in range(50, 40): #can use any loop variable
    print(thing)

If you want a descending loop you need a negative step.

In [67]:
for thing in range(50, 40, -3):
    print(thing)
50
47
44
41

Cheeky challenge

Use a loop to compute $$5^2 + 10^2 + 15^2 + 20^2 + \dotsb + 200^2$$

#Again here's a template for you
total = 0
for ... in ...:
    total = total + ...
total
#Here's the answer you should have got:
assert total == 553500

We'll see later that we can loop over all sorts of objects---not just ranges.

In [68]:
for character in "David Matthew Robertson":
    print(character, end=".")
D.a.v.i.d. .M.a.t.t.h.e.w. .R.o.b.e.r.t.s.o.n.

This makes looping a really powerful tool in Python. It enables

Just like other languages, there are while loops and break and continue statements which are a bit less intuitive.

There's too much to go over here---but there are links in the notebook if you're curious.

Control flow: conditionals

A very important tool in the programmer's toolkit is the ability to do different things in different circumstances.

Enter the if statement:

In [69]:
i = 10
if i % 2 == 0:
    print(i, "is even")
10 is even
  • Colon, then four spaces before body statements
  • Main expression usually a boolean: True or False
  • Use comparisons like <, <=, ==, !=, >=, > to make booleans
In [70]:
1 < 2    #less than
Out[70]:
True
In [71]:
2 <= 0.2   #less than or equal
Out[71]:
False
In [72]:
3 == 3.0   #equal
Out[72]:
True
In [73]:
"cat" != "dog" #not equal
Out[73]:
True
In [74]:
x = 10
1 < x < 15 #Mathematical notation for "(1 < x) and (x < 15)"
Out[74]:
True

Let's take our previous if statement and put it in a loop.

Whenever we start a new block (line ending in a colon), we have to indent an extra four spaces.

In [75]:
for i in range(5):
    if i % 2 == 0:
        print(i, "is even")
0 is even
2 is even
4 is even

We can handle the False case with an else statement.

In [76]:
for i in range(5):
    if i % 2 == 0:
        print(i, "is even")
    else:
        print(i, "is odd")
0 is even
1 is odd
2 is even
3 is odd
4 is even

For finer control, use an if... elif... else... chain.

Here elif is short for "else if".

In [77]:
import datetime
now = datetime.datetime.now()
print("The time is", now, "and the hour is:", now.hour)
if 6 <= now.hour < 12:
    print("Good morning!")
elif now.hour < 18:
    print("Good afternoon!")
elif now.hour < 20:
    print("Good evening!")
else:
    print("Good night!")
The time is 2017-04-11 12:48:23.167619 and the hour is: 12
Good afternoon!
  • else is optional and always comes last.
  • Need to have if before any elifs.
  • Can have as many elifs as you like.

Cheeky challenge

The sign or signum function is defined by $$\operatorname{sign}(x) = \begin{cases} \phantom{-}1 & \text{if $x>0$} \\ \phantom{-}0 & \text{if $x=0$} \\ -1 & \text{if $x<0$} \end{cases}$$

Implement this as a Python function.

#And some tests:
assert sign(10) == 1
assert sign(0) == 0
assert sign(-23.4) == -1
  • Quick mention: can perform logical operations on booleans with and, or, and not.
In [78]:
True and False
Out[78]:
False
In [79]:
True or False
Out[79]:
True
In [80]:
not False
Out[80]:
True
In [81]:
not False and False    #careful with order of operations
Out[81]:
False
In [82]:
not (False and False)
Out[82]:
True

Data types: strings

  • Any textual data: plot labels, file names, ...
  • Enclosed by single (') or double quotes (")
  • Any Unicode character okay
In [83]:
supercal = "Supercalifragilisticexpialidocious"
starwars = 'No, I am your father'  # spaces okay
greeting = "こんにちは (Konnichiwa)" # non-Latin characters okay
  • Use \n to stand for a newline
  • Use \' or \" for literal quotes
  • Use \\ for a literal backslash
  • Spaces preserved
In [84]:
print("A short 'quote'\n     a double quote char: \"\n and newlines!")
A short 'quote'
     a double quote char: "
 and newlines!

Python is pedantic when comparing

In [85]:
'2' == 2            #different types!
Out[85]:
False
In [86]:
type('2'), type(2)
Out[86]:
(str, int)
In [87]:
'True' == True
Out[87]:
False
In [88]:
type('True'), type(True)
Out[88]:
(str, bool)

String methods

A list of handy funtions for working with strings. Full reference online.

In [89]:
vowels = "aeiou"
vowels.upper()
Out[89]:
'AEIOU'
In [90]:
vowels.lower() #already lowercase
Out[90]:
'aeiou'
In [91]:
vowels.capitalize()
Out[91]:
'Aeiou'
In [92]:
len(supercal)   #length function
Out[92]:
34
In [93]:
supercal.count("a")
Out[93]:
3

Silly example: a function which processes a yes/no prompt (y/n)

In [94]:
def handle_response(response):
    if response.startswith("y"):
        return "positive response"
    elif response.startswith("n"):
        return "negative response"
    else:
        return "unclear response"
In [95]:
handle_response("yes")
Out[95]:
'positive response'
In [96]:
handle_response("no way man that's unreasonable")
Out[96]:
'negative response'

What happens when we call with these arguments? Guess, then check in the notebook.

  • handle_response()
  • handle_response("")
  • handle_response("YES")
  • handle_response(" yes ")

handle_response()

  • TypeError: missing argument

handle_response("")

  • Unclear response: the empty string "" doesn't start with anything!

handle_response("YES")

  • Unclear response: upper/lowercase matters for comparison
In [97]:
'Y' == 'y'
Out[97]:
False

handle_response(" yes ")

  • Unclear response: first char is a space

Often useful to normalise strings to a sensible form, especially if they come from user input.

In [98]:
response = "    YeS   "
response = response.lower()
print( repr(response) )      # explicitly representation with repr()
response = response.strip()  # remove whitespace from start and end
print( repr(response) )
'    yes   '
'yes'

Also handy: str.replace:

In [99]:
x = "The news media reported today that no news is in fact good news"
x.replace("news", "FAKE NEWS!!")
Out[99]:
'The FAKE NEWS!! media reported today that no FAKE NEWS!! is in fact good FAKE NEWS!!'

Slicing

Remember that indexing works from $0$ to $N - 1$:

In [100]:
supercal[0]
Out[100]:
'S'
In [101]:
supercal[5]
Out[101]:
'c'
In [102]:
supercal[0:5] #like range, slicing excludes upper limit
Out[102]:
'Super'
In [103]:
supercal[-1]  #Last char
Out[103]:
's'
In [104]:
supercal[:5] + "..." + supercal[-4:] #first five, then last 4
Out[104]:
'Super...ious'

Concatenation

  • Glue strings together with "+".
  • For complicated gluings, or gluings of arbitrary length, use the print function or str.join
In [105]:
name = "David"
"Good morning, " + name + "."
Out[105]:
'Good morning, David.'
  • Use * as shorthand for repitition.
In [106]:
'thank you ' * 10
Out[106]:
'thank you thank you thank you thank you thank you thank you thank you thank you thank you thank you '

Even more complicated string handling available:

Looping over strings

Awkward way:

In [107]:
example = "demo"
for i in range(len(example)):
    print(example[i])
d
e
m
o

Slick way:

In [108]:
for character in "demo":
    print(character)
d
e
m
o

Cheeky Challenge

Write a function to count the number of vowels in a string. Assume that we're just working with the Roman alphabet---so don't worry about variants like ë, è, é, and ê.

For bonus points, try using a loop to write this function.

In [109]:
#Here's a space to write your function
#and some tests to run
assert your_function("Hello") == 2
assert your_function(" xyz HEllO") == 2
assert your_function("Hello, sailor") == 5

Data types: lists and tuples

  • Lists: a sequence of arbitrary Python objects
In [110]:
greek_letters = ["alpha", "beta", "gamma", "delta"]
greek_letters[1] #Index just like strings: 0 to N-1.
Out[110]:
'beta'
  • Lists can be modified in-place
In [111]:
greek_letters[1] = "BETA (β)"
greek_letters
Out[111]:
['alpha', 'BETA (β)', 'gamma', 'delta']
  • Lists can contain objects of different types
In [112]:
things = ["uno", "dos", 3, supercal, 2.718]
  • Unless they're modified, lists have a fixed length
In [113]:
len(things)
Out[113]:
5
  • Lists are objects, so lists can even contain lists!
In [114]:
names_by_parts = [ ["David", "Robertson"], ["Cetin", "Can", "Evirgen"] ]
print( names_by_parts[0] )
print( names_by_parts[0][1] )
['David', 'Robertson']
Robertson

Quick Quiz

What is len(names_by_parts)?

  • 2
  • 3
  • 4
  • 5
In [115]:
len(names_by_parts)
Out[115]:
2
  • A list doesn't know anything special about what it contains
  • Can't access or add new list items by accident
In [117]:
greek_letters[4]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-117-85a9bd08274d> in <module>()
----> 1 greek_letters[4]

IndexError: list index out of range
In [118]:
greek_letters[4] = 'EPSILON (ε)'
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-118-233d7087fd08> in <module>()
----> 1 greek_letters[4] = 'EPSILON (ε)'

IndexError: list assignment index out of range
In [119]:
greek_letters.append("EPSILON (ε)")
greek_letters
Out[119]:
['alpha', 'BETA (β)', 'gamma', 'delta', 'EPSILON (ε)']
  • Other useful list methods and idoms:
In [120]:
empty_list = []
print(empty_list, len(empty_list))
[] 0
In [121]:
numbers = [5, 2, 64, 41, 27, -2, 11, 32]
In [122]:
numbers.sort()     #modifies list in place
numbers
Out[122]:
[-2, 2, 5, 11, 27, 32, 41, 64]
In [123]:
["ab", 1].sort()  # Can't compare text with numbers
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-123-c281d25ec96a> in <module>()
----> 1 ["ab", 1].sort()  # Can't compare text with numbers

TypeError: '<' not supported between instances of 'int' and 'str'
In [124]:
x = list(range(10, 20))
x
Out[124]:
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
In [125]:
del x[2]  #Delete the entry with index 2 (third entry)
x
Out[125]:
[10, 11, 13, 14, 15, 16, 17, 18, 19]
In [126]:
print("POP:", x.pop(), x)
POP: 19 [10, 11, 13, 14, 15, 16, 17, 18]
In [127]:
x.reverse() #modifies in place
x
Out[127]:
[18, 17, 16, 15, 14, 13, 11, 10]
In [128]:
x.insert(4, "surprise")
x
Out[128]:
[18, 17, 16, 15, 'surprise', 14, 13, 11, 10]

NB: It's quick to extend lists at the end, but inserting or delete near the start is slower. If your list is HUGE then this can become a problem.

See also the Python wiki or these course notes.

Looping over lists is just like strings.

Warning: don't modify list structure when looping! (Modifying list values is fine)

In [129]:
colours = ["red", "orange", "yellow", "green", "blue", "indigo", "violet"]
for colour in colours:
    print(colour, "has", len(colour), "letters" )
red has 3 letters
orange has 6 letters
yellow has 6 letters
green has 5 letters
blue has 4 letters
indigo has 6 letters
violet has 6 letters
In [130]:
for colour in colours:
    colours.pop()
colours
Out[130]:
['red', 'orange', 'yellow']
In [131]:
for i, colour in enumerate(colours): #avoids range(len(colours))
    colours[i] = colour.upper()
colours
Out[131]:
['RED', 'ORANGE', 'YELLOW']

Cheeky challenge

The following lines will read a list of words from a data file. Use Python to find:

  • The first, middle and last word in the list
  • The percentage of words containing an e
    • Hint: use str.find; or better the in operator
  • All two-letter words in the list (good for Scrabble)
In [132]:
with open('data/en-GB-words.txt', 'rt') as f:
    words = [line.strip() for line in f]
print(len(words), "words. Number 2001 is", words[2000])
99156 words. Number 2001 is Booth
In [133]:
N = len(words)
print(words[0], words[N//2], words[-1], sep=", ")
A, harks, études
In [134]:
count = 0
for word in words:
    if 'e' in word:
        count = count + 1
print(count, 100 * count / len(words))
63152 63.689539715196254
In [135]:
two_letter_words = []
for word in words:
    if len(word) == 2:
        two_letter_words.append(word)
print(two_letter_words)
['Ac', 'Ag', 'Al', 'Am', 'Ar', 'As', 'At', 'Au', 'Av', 'Ba', 'Be', 'Bi', 'Bk', 'Br', 'Ca', 'Cd', 'Cf', 'Ci', 'Cl', 'Cm', 'Co', 'Cr', 'Cs', 'Cu', 'Di', 'Dr', 'Ed', 'Er', 'Es', 'Eu', 'Fe', 'Fm', 'Fr', 'GE', 'Ga', 'Gd', 'Ge', 'He', 'Hf', 'Hg', 'Ho', 'Hz', 'In', 'Io', 'Ir', 'It', 'Jo', 'Jr', 'Kr', 'La', 'Le', 'Li', 'Ln', 'Lr', 'Lt', 'Lu', 'Mb', 'Md', 'Mg', 'Mn', 'Mo', 'Mr', 'Ms', 'Mt', 'Na', 'Nb', 'Nd', 'Ne', 'Ni', 'Np', 'OK', 'Ob', 'Os', 'Oz', 'Pa', 'Pb', 'Pd', 'Pl', 'Pm', 'Po', 'Pt', 'Pu', 'Ra', 'Rb', 'Rd', 'Re', 'Rh', 'Rn', 'Ru', 'Rx', 'Sb', 'Sc', 'Se', 'Si', 'Sm', 'Sn', 'Sq', 'Sr', 'St', 'Ta', 'Tb', 'Tc', 'Th', 'Ti', 'Tl', 'Tm', 'Ty', 'Ur', 'Va', 'Wm', 'Wu', 'Xe', 'Yb', 'Zn', 'Zr', 'ad', 'ah', 'am', 'an', 'as', 'at', 'ay', 'be', 'by', 'cs', 'dB', 'do', 'eh', 'em', 'es', 'ex', 'fa', 'go', 'gs', 'ha', 'he', 'hi', 'ho', 'id', 'if', 'in', 'is', 'it', 'kW', 'kc', 'ks', 'la', 'lo', 'ls', 'ma', 'me', 'mi', 'ms', 'mu', 'my', 'no', 'nu', 'of', 'oh', 'on', 'or', 'ow', 'ox', 'pH', 'pa', 'pi', 're', 'rs', 'sh', 'so', 'ti', 'to', 'ts', 'uh', 'um', 'up', 'us', 'vs', 'we', 'ye', 'yo']

Tuples

  • The same as a list, except can't be modified after creation.
  • Created with round brackets, not square
  • Still indexed from $0$ to $N-1$
In [136]:
coordinate = (1, 2, 3)
coordinate
Out[136]:
(1, 2, 3)
In [137]:
coordinate[0]
Out[137]:
1
In [138]:
coordinate[0] = 10
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-138-9acc16226b5e> in <module>()
----> 1 coordinate[0] = 10

TypeError: 'tuple' object does not support item assignment
In [139]:
x, y, z = coordinate      #tuple unpacking
print(x, y, z, x + y + z)
1 2 3 6

In fact, when you say return a, b from a function, what gets returned is the tuple (a, b)!

Data types: dictionaries

  • Unordered collection of pairs key -> value
  • Keys usually strings
  • "Hashmap", "Associative array"
In [140]:
david = dict(
    surname = "Robertson",
    given_names = ["David", "Matthew"],
    age = 24,
    dob = "26/06/1992",
    height = 190
)
david
Out[140]:
{'age': 24,
 'dob': '26/06/1992',
 'given_names': ['David', 'Matthew'],
 'height': 190,
 'surname': 'Robertson'}
  • Index by key to get/set values
In [141]:
david['age'] = "very very very very very very old"
david['age']
Out[141]:
'very very very very very very old'

Three ways to loop:

In [142]:
for key in david:
    print(key, end=", ")
surname, given_names, age, dob, height, 
In [143]:
for value in david.values():
    print(value, end=", ")
Robertson, ['David', 'Matthew'], very very very very very very old, 26/06/1992, 190, 
In [144]:
for key, value in david.items():
    print(key, "->", value)
surname -> Robertson
given_names -> ['David', 'Matthew']
age -> very very very very very very old
dob -> 26/06/1992
height -> 190
  • Dictionaries have a length too:
In [145]:
len(david)
Out[145]:
5
  • Python will complain if you ask for a missing key:
In [146]:
david['weight']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-146-01a69e534c10> in <module>()
----> 1 david['weight']

KeyError: 'weight'
  • Can check if a key is present with in:
In [147]:
'surname' in david
Out[147]:
True

Cheeky challenge

The following data file contains the periodic table as a dictionary. We're going to load it into a list, and each entry of that list will be a dictionary.

In [148]:
import os
import json
with open("data/PeriodicTable.json", "rt") as f:
    table = json.loads(f.read())['elements']
In [149]:
table[0]
Out[149]:
{'appearance': 'colorless gas',
 'atomic_mass': 1.008,
 'boil': 20.271,
 'category': 'diatomic nonmetal',
 'color': None,
 'density': 0.08988,
 'discovered_by': 'Henry Cavendish',
 'melt': 13.99,
 'molar_heat': 28.836,
 'name': 'Hydrogen',
 'named_by': 'Antoine Lavoisier',
 'number': 1,
 'period': 1,
 'phase': 'Gas',
 'shells': [1],
 'source': 'https://en.wikipedia.org/wiki/Hydrogen',
 'spectral_img': 'https://en.wikipedia.org/wiki/File:Hydrogen_Spectra.jpg',
 'summary': 'Hydrogen is a chemical element with chemical symbol H and atomic number 1. With an atomic weight of 1.00794 u, hydrogen is the lightest element on the periodic table. Its monatomic form (H) is the most abundant chemical substance in the Universe, constituting roughly 75% of all baryonic mass.',
 'symbol': 'H',
 'xpos': 1,
 'ypos': 1}

Your challenges:

  • Which element is densest?
  • Create a new dictionary mapping elements' symbols to their names. For example, if D is the dictionary, D['H'] == 'Hydrogen'.
  • Sorted alphabetically, what's the first and last element symbol?
  • Sorted alphabetically, what's the first and last element name?
  • How many elements' symbols have a different first letter to their name?
In [150]:
max_density = 0
max_density_name = ""
for element in table:
    if element['density'] != None and element['density'] > max_density:
        max_density = element['density']
        max_density_name = element['name']
max_density, max_density_name
Out[150]:
(40.7, 'Hassium')
In [151]:
shorthand = {}
for element in table:
    symbol = element['symbol']
    name = element['name']
    shorthand[symbol] = name
shorthand
Out[151]:
{'Ac': 'Actinium',
 'Ag': 'Silver',
 'Al': 'Aluminium',
 'Am': 'Americium',
 'Ar': 'Argon',
 'As': 'Arsenic',
 'At': 'Astatine',
 'Au': 'Gold',
 'B': 'Boron',
 'Ba': 'Barium',
 'Be': 'Beryllium',
 'Bh': 'Bohrium',
 'Bi': 'Bismuth',
 'Bk': 'Berkelium',
 'Br': 'Bromine',
 'C': 'Carbon',
 'Ca': 'Calcium',
 'Cd': 'Cadmium',
 'Ce': 'Cerium',
 'Cf': 'Californium',
 'Cl': 'Chlorine',
 'Cm': 'Curium',
 'Cn': 'Copernicium',
 'Co': 'Cobalt',
 'Cr': 'Chromium',
 'Cs': 'Cesium',
 'Cu': 'Copper',
 'Db': 'Dubnium',
 'Ds': 'Darmstadtium',
 'Dy': 'Dysprosium',
 'Er': 'Erbium',
 'Es': 'Einsteinium',
 'Eu': 'Europium',
 'F': 'Fluorine',
 'Fe': 'Iron',
 'Fl': 'Flerovium',
 'Fm': 'Fermium',
 'Fr': 'Francium',
 'Ga': 'Gallium',
 'Gd': 'Gadolinium',
 'Ge': 'Germanium',
 'H': 'Hydrogen',
 'He': 'Helium',
 'Hf': 'Hafnium',
 'Hg': 'Mercury (element)',
 'Ho': 'Holmium',
 'Hs': 'Hassium',
 'I': 'Iodine',
 'In': 'Indium',
 'Ir': 'Iridium',
 'K': 'Potassium',
 'Kr': 'Krypton',
 'La': 'Lanthanum',
 'Li': 'Lithium',
 'Lr': 'Lawrencium',
 'Lu': 'Lutetium',
 'Lv': 'Livermorium',
 'Mc': 'Moscovium',
 'Md': 'Mendelevium',
 'Mg': 'Magnesium',
 'Mn': 'Manganese',
 'Mo': 'Molybdenum',
 'Mt': 'Meitnerium',
 'N': 'Nitrogen',
 'Na': 'Sodium',
 'Nb': 'Niobium',
 'Nd': 'Neodymium',
 'Ne': 'Neon',
 'Nh': 'Nihonium',
 'Ni': 'Nickel',
 'No': 'Nobelium',
 'Np': 'Neptunium',
 'O': 'Oxygen',
 'Og': 'Oganesson',
 'Os': 'Osmium',
 'P': 'Phosphorus',
 'Pa': 'Protactinium',
 'Pb': 'Lead',
 'Pd': 'Palladium',
 'Pm': 'Promethium',
 'Po': 'Polonium',
 'Pr': 'Praseodymium',
 'Pt': 'Platinum',
 'Pu': 'Plutonium',
 'Ra': 'Radium',
 'Rb': 'Rubidium',
 'Re': 'Rhenium',
 'Rf': 'Rutherfordium',
 'Rg': 'Roentgenium',
 'Rh': 'Rhodium',
 'Rn': 'Radon',
 'Ru': 'Ruthenium',
 'S': 'Sulfur',
 'Sb': 'Antimony',
 'Sc': 'Scandium',
 'Se': 'Selenium',
 'Sg': 'Seaborgium',
 'Si': 'Silicon',
 'Sm': 'Samarium',
 'Sn': 'Tin',
 'Sr': 'Strontium',
 'Ta': 'Tantalum',
 'Tb': 'Terbium',
 'Tc': 'Technetium',
 'Te': 'Tellurium',
 'Th': 'Thorium',
 'Ti': 'Titanium',
 'Tl': 'Thallium',
 'Tm': 'Thulium',
 'Ts': 'Tennessine',
 'U': 'Uranium',
 'V': 'Vanadium',
 'W': 'Tungsten',
 'Xe': 'Xenon',
 'Y': 'Yttrium',
 'Yb': 'Ytterbium',
 'Zn': 'Zinc',
 'Zr': 'Zirconium'}
In [152]:
symbols = list(shorthand)
symbols.sort()
symbols[0], symbols[-1]
Out[152]:
('Ac', 'Zr')
In [153]:
names = list(shorthand.values())
names.sort()
names[0], names[-1]
Out[153]:
('Actinium', 'Zirconium')
In [154]:
oddballs = {}
for symbol, name in shorthand.items():
    if name[0] != symbol[0]:
        oddballs[symbol] = name
oddballs
Out[154]:
{'Ag': 'Silver',
 'Au': 'Gold',
 'Fe': 'Iron',
 'Hg': 'Mercury (element)',
 'K': 'Potassium',
 'Na': 'Sodium',
 'Pb': 'Lead',
 'Sb': 'Antimony',
 'Sn': 'Tin',
 'W': 'Tungsten'}

After lunch:

  • Can will give a crash course in Python's scientific libraries
  • Some more exercises, chances to practise

That's all folks!

Feedback, comments, questions?

d.m.robertson@newcastle.ac.uk