{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
1Department of Geography, King's College London, Strand Campus, Bush House (North East Wing), 40 Aldwych, London WC2B 4BG, United Kingdom
\n", "The proliferation of large, complex spatial data sets presents\n", " challenges to the way that regional science—and geography more\n", " widely—is researched and taught. Increasingly, it is not 'just'\n", " quantitative skills that are needed, but computational ones. However,\n", " the majority of undergraduate programmes have yet to offer much more\n", " than a one-off 'GIS programming' class since such courses are seen as\n", " challenging not only for students to take, but for staff to deliver.\n", " Using the evaluation criterion of minimal complexity, maximal\n", " flexibility, interactivity, utility, and maintainability, we show how\n", " the technical features of Jupyter notebooks—particularly when combined\n", " with the popularity of Anaconda Python and Docker—enabled us to\n", " develop and deliver a suite of three 'geocomputation' modules to\n", " Geography undergraduates, with some progressing to data science and\n", " analytics roles.
\n", "\n", "1. It should be noted that, technically, Docker containers are not virtual machines in the traditional sense.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Houston, We Have a Problem\n", "\n", "Of course, no single solution is without drawbacks and Jupyter is no\n", "exception; it's worth noting that there *are* quite specific technical,\n", "conceptual, and development issues raised by Jupyter that are difficult\n", "to circumvent without both know-how and some careful thinking about\n", "assessment and teaching. The principal technical challenge relates to\n", "user permissions on managed machines (*e.g.* in computer clusters) since\n", "Python, Jupyter, and Docker all struggle to different degrees with 'locked down' Windows\n", "systems. Indeed, Docker does not currently run at all without\n", "administrator privileges. We worked closely with university-level IT\n", "staff to install and provision Anaconda Python and Jupyter. Provision of the [YAML\n", "configuration script](https://github.com/kingsgeocomp/gsa_env/blob/master/gsa.yml) assisted with both installation and isolation of our teaching environment from their existing installation, easing institutional barriers to adoption.\n", "\n", "From a teaching standpoint, an additional issue is that [Git](https://git-scm.com/)—the\n", "dominant version control software that we use to manage and share\n", "notebook changes—sees notebooks in a way that means just re-running\n", "code registers as a local modification of the file that needs to be\n", "committed to the version control system. So although '[GitHub](https://github.com/)'\n", "provides support for the online display of Jupyter notebooks, the use of Git can lead to a large\n", "number of essentially meaningless commits. This can make tracking\n", "meaningful content changes over time more difficult, and it means that\n", "we've shied away from teaching students about version control on the basis that they may not perceive the value of commits that seem to record little of value.\n", "\n", "A final, and rather unexpected, disbenefit was uncovered the year after\n", "we moved from the [Spyder IDE](https://www.spyder-ide.org/) to Jupyter: weaker student understanding of\n", "execution flow. Unlike a traditional script that clearly executes from\n", "top-to-bottom (typically in its entirety), Jupyter notebooks freely\n", "intermingle code blocks and text/rich media blocks allowing—and even\n", "encouraging—the user both to jump between widely separated blocks\n", "without executing intervening code and to edit and re-run earlier\n", "blocks. This leads to difficult-to-diagnose bugs because the code\n", "*looks* like it should execute properly but doesn't, and to a weaker\n", "student understanding of system 'state' in terms of instantiated\n", "variables, loaded libraries, and available functions. We typically seek\n", "to cultivate this understanding by stressing that the _real_ test—whether directly assessed or not—of\n", "whether their code 'works' is that a notebook can be run in full\n", "(`Restart Kernel and Run All Cells`) without user intervention. \n", "\n", "We should also note that, in the absence of an Integrated Development Environment (IDE), students are also unlikely to benefit from test suites and other tools that support developer best-practice. While knowledge of such tools and practices is desirable, we nonetheless feel that these kinds of ideas and issues are best tackled when students have progressed further with their studies and are motivated to tackle more abstract challenges. To put it another way: \"Because learning in computer science and programming is challenged by numerous barriers, students need to be motivated about the purpose, value, and utility of concepts within course work\" ([Bowlick et al. 2017](#bfj2017))." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion: Back Here on Earth\n", "\n", "In order to understand why the practical benefits of teaching with Jupyter notebooks outweigh the technical and conceptual challenges encountered, it is worth returning to the evaluation criteria outlined near the start of this work. Table 1 summarises the pros and cons observed across the five dimensions identified by our review of the state-of-the-art nearly six years ago.\n", "\n", "##### Table 1. Evaluating Jupyter\n", "| | Pros | Cons |\n", "|-----------|------|------|\n", "| **Minimal Complexity** | Deploying a full geographic data science 'stack' requires installing one application (Docker or Anaconda Python) and running two lines of code in a Terminal/Shell to install and configure Jupyter, its dependencies, and the analytical libraries. Environment requires no configuration. | Persistent challenges with student understanding of file system interaction and paths. Some confusion around multiple Python instances manifesting as different 'kernels' in notebooks. |\n", "| **Maximal Flexiblity** | Combination of Binder, Docker, and Anaconda Python allows us to install on nearly any hardware/operating system mix. Docker uses same YAML configuration script as Anaconda Python so maintaining compatibility and consistency straightforward. | Students cannot update Docker containers and do not gain understanding of package management or dependency conflict resolution. |\n", "| **Interactivity** | Students can view/edit/add rich media, code, and other content directly within the Jupyter notebook environment. Textual and graphical outputs from code cells in notebooks are saved between restarts of Jupyter. | Students do not develop a strong understanding of execution flow and system state. |\n", "| **Utility** | Growth of Jupyter has made it the 'tool of choice' for data scientists, and students are able to continue working with a fully functioning development environment. Students can edit installation and configuration scripts icnrementally, as expertise grows. | Relative ease of installation may not prepare students for managing their own development and production environments. Students remain unfamiliar with IDEs and code-completion in Jupyter is not as responsive (yet?). |\n", "| **Maintainability** | Docker and Anaconda update mechanisms are straightforward. GitHub works well for distribution, previewing, and (to a lesser extent) version control. | Nature of notebooks makes it harder for instructors to track incremental changes in version control, and for students to see value of such an approach. | \n", "\n", "From this, the principal technical recommendation is that a flexible mix of platforms should be used to deliver Jupyter-based learning. We recommend Binder to deliver foundational material using few non-core Python libraries, and now strongly recommend that students use Docker in subsequent modules. However, a critical issue is that Windows 10 Home Edition does not support Docker, and it is therefore *still* necessary to support direct installation of [Anaconda\n", "Python](https://www.anaconda.com/distribution/) and associated configuration of the 'kernel' using a YAML text file. We are also investigating the use of a [containerised JupyterHub](https://github.com/conjuring) running on our own hardware: this would allow students to work _as if_ using Binder while benefiting from the ability to save work and make full use of Python's capabilities. All of the code supporting these configurations is available as a [Github repository](https://github.com/kingsgeocomp/gsa_env/)), as is Arribas-Bel's [resource](https://github.com/darribas/gds_env)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### And Back to the Future \n", "\n", "A failure to engage directly with computational approaches and tools\n", "poses long-term risks: while ours 'has always been a following\n", "discipline' ([Burton 1963](#bi1963)), what is new is that other disciplines have\n", "now taken an interest in cities and regions ([O'Sullivan and Manson 2015](#osd2015)).\n", "[Ruppert](#re2013) warns, \"if social scientists do not step forward, then\n", "computational social science risks becoming the exclusive domain of\\...\n", "computing scientists\\\" ([2013, p.269](#re2013). However, there is also an\n", "enormous opportunity for students equipped with both domain knowledge\n", "and programming skills to act as 'knowledge brokers' ([Bowlick and Wright 2018](#bfj2018)). As\n", "[Mir et al. (2017, p.25)](#mdj2017) note: \"truly transformative work at the intersection of\n", "computing and...other disciplines requires...people with heterogeneous\n", "skill-sets (both computational and non-computational) who, despite their\n", "differences in training, can work collaboratively.\\\" In other words,\n", "facing the future requires both translators and explorers: individuals\n", "who understand the broader terrains across which knowledge moves and the\n", "frontiers at which new knowledge is generated.\n", "\n", "We have also come to believe that the use of Jupyter-like platforms in\n", "non-STEM disciplines may have a role to play in addressing a deeper\n", "problem: the widening participation challenge in\n", "computationally-oriented disciplines such as data science\n", "([The Royal Society 2019, p.11](#rs2019)). A particular contribution is these other disciplines' capacity to\n", "provide an applied context—and see [Bort (2015)](#bh2015) for a creative\n", "application in literary studies—for computational training that helps\n", "to motivate further study and engagement. It should not be the\n", "responsibility of Geography and allied fields to plug the so-called\n", "'leaky pipeline' ([Berryman 1983](#bse1983)), but they may yet create novel pathways\n", "for a more diverse cohort of students to enter computationally intensive\n", "fields. Such an outcome would not only be to the benefit of Computer\n", "Science, it would very much be to the benefit of an innovative Regional\n", "Science as well." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Acknowledgements\n", "\n", "This work builds on the input of many—staff and students—to the Geocomputation and Spatial Analysis pathway at King’s College London; however, I wish to particularly acknowledge the critical contributions of [Dr. James Millington](https://github.com/jamesdamillington/), [Michele Ferretti](https://github.com/miccferr), [Dr. Chen Zhong](https://github.com/daisy8738), and [Dr. Yijing Li](https://github.com/aolifodaisy). Finally, [Dr. Arribas-Bel](https://github.com/darribas/) has donated many hours of his time—directly and by example—to helping me to develop and migrate our teaching environment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "1. Arribas-Bel D (2014) Accidental, open and everywhere: Emerging data sources for the understanding of cities. _Applied Geography_ 49: 45–53\n", "2. Arribas-Bel D (2019) A course on Geographic Data Science. _The Journal of Open Source Education_ 2\\[14\\]: 42. CrossRef.\n", "3. Arribas-Bel D, Reades J (2018) Geography and computers: Past, present, and future. _Geography Compass_ e12403. CrossRef.\n", "4. Barnes TJ (2013) Big data, little history. _Dialogues in Human Geography_ 3\\[3\\]: 297–302\n", "5. Berryman SE (1983) _Who will do science? trends, and their causes in minority and female representation among holders of advanced degrees in science and mathematics. a special report_. Report, Rockefeller Foundation, New York, NY\n", "6. Bort H, Czarnik M, Brylow D (2015) Introducing computing concepts to non-majors: A case study in gothic novels. In: _Proceedings of the 46th ACM Technical Symposium on Computer Science Education_, 132–137. ACM\n", "7. Bowlick FJ, Goldberg DW, Bednarz SW (2017) Computer science and programming courses in geography departments in the United States. _The Professional Geographer_ 69\\[1\\]: 138–150\n", "8. Bowlick FJ, Wright DJ (2018) Digital data-centric geography: Implications for geography’s frontier. _The Professional Geographer_ 70\\[4\\]: 687–694\n", "9. Bradbeer J (1999) Barriers to interdisciplinarity: Disciplinary discourses and student learning. _Journal of Geography in Higher Education_ 23\\[3\\]: 381–396. CrossRef.\n", "10. Britain S (1999) _A framework for pedagogical evaluation of virtual learning environments_. Report, Joint Information Systems Committee. URL: https://web.archive.org/web/20140709094115/http://www.jisc.ac.uk/media/documents/programmes/jtap/jtap-041.pdf\n", "11. Burton I (1963) The quantitative revolution and theoretical geography. _The Canadian Geographer/Le Géographe Canadien_ 7\\[4\\]: 151–162\n", "12. Chapman L (2010) Dealing with maths anxiety: How do you teach mathematics in a geography department? _Journal of Geography in Higher Education_ 34\\[2\\]: 205–213\n", "13. Cresswell T (2014) Déjà vu all over again: Spatial science, quantitative revolutions and the culture of numbers. _Dialogues in Human Geography_ 4\\[1\\]: 54–58\n", "14. Etherington T (2016) Teaching introductory gis programming to geographers using an open source python approach. _Journal of Geography in Higher Education_ 40\\[1\\]: 117–130\n", "15. González-Bailón S (2013) Big data and the fabric of human geography. _Dialogues in Human Geography_ 3\\[3\\]: 292–296\n", "16. Gorman SP (2013) The danger of a big data episteme and the need to evolve geographic information systems. _Dialogues in Human Geography_ 3[3]: 285–291\n", "17. Guzdial M (2010) Does contextualized computing education help? _ACM Inroads_ 1\\[4\\]: 4–6\n", "18. Hodgen J, McAlinden M, Tomei A (2014) _Mathematical transitions: a report on the mathematical and statistical needs of students undertaking undergraduate studies in various disciplines_. Report, The Higher Education Academy\n", "19. Johnston R, Harris R, Jones K, Manley D, Sabel C, Wang W (2014) Mutual misunderstanding and avoidance, misrepresentations and disciplinary politics: spatial science and quantitative analysis in (United Kingdom) geographical curricula. _Dialogues in Human Geography_ 4\\[1\\]: 3–25\n", "20. Kluyver T, Ragan-Kelley B, Pérez F, Granger B, Bussonnier M, Frederic J, Kelley K, Hamrick J, Grout J, Corlay S, Ivanov P, Avila D, Abdalla S, Willing C, Jupyter Development Team (2016) Jupyter notebooks&8212;a publishing format for reproducible computational workflows. In: Loizides F, Schmidt B (eds), _Positioning and power in academic publishing: Players, agents and agendas_. IOS Press, 97–90\n", "21. Lazer D, Pentland A, Adamic L, Aral S, Barabási AL, Brewer D, Christakis N, Contractor N, Fowler J, Gutmann M, Jebara T, King G, Macy M, Roy D, Van Alstyne M (2009) Life in the Network: the coming age of Computational Social Science. _Science_ 323\\[5915\\]: 721–723\n", "22. Ley D, Braun B, Domosh M, Elliott S, Le Heron R, Peake L, Willekens F, Yeoh B (2013) _International Benchmarking Review of UK Human Geography_. Report, Economic and Social Research Council, in partnership with the Royal Geographical Society (with IBG) and the Art and Humanities Research Council. URL: https://esrc.ukri.org/files/research/research-and-impact-evaluation/international-benchmarking-review-of-uk-human-geography/\n", "23. Lukkarinen A, Sorva J (2016) Classifying the tools of contextualized programming education and forms of media computation. In: _Proceedings of the 16th Koli Calling International Conference on Computing Education Research_, 51–60. ACM\n", "24. Macdonald R, Bailey C (2000) Integrating the teaching of quantitative skills across the geology curriculum in a department. _Journal of Geoscience Education_ 48\\[4\\]: 482–486\n", "25. Mir DJ, Mishra S, Ruvolo P, Pollock L, Engen S (2017) How do faculty partner while teaching interdisciplinary CS+X courses: models and experiences. _Journal of Computing Sciences in Colleges_ 32\\[6\\]: 24–33\n", "26. Muller C, Kidd C (2014) Debugging geographers: teaching programming to non-computer scientists. _Journal of Geography in Higher Education_ 38\\[2\\]: 175–192\n", "27. O’Sullivan D (2014) Don’t panic! the need for change and for curricular pluralism. _Dialogues in Human Geography_ 4\\[1\\]: 39–44\n", "28. O’Sullivan D, Manson S (2015) Do physicists have geography envy? and what can geographers learn from it? _Annals of the Association of American Geographers_ 105\\[4\\]: 704–722\n", "29. Pears A, Seidman S, Malmi L, Mannila L, Adams E, Bennedsen J, Devlin M, Paterson J (2007) A survey of literature on the teaching of introductory programming. _ACM SIGCSE Bulletin_ 39: 204–223\n", "30. Pérez F, Granger BE (2007) IPython: a System for Interactive Scientific Computing. _Computing in Science & Engineering_ 9\\[3\\]: 21–29\n", "31. Reades J, De Souza J, Hubbard P (2019) Understanding urban gentrification through machine learning. _Urban Studies_ 56\\[5\\]: 922–942\n", "32. Reades J, Ferretti M, Millington J (2019) _Code Camp: 2019_. Github repository, King’s College London. CrossRef.\n", "33. Ruppert E (2013) Rethinking empirical social sciences. _Dialogues in Human Geography_ 3[3]: 268–273\n", "34. Singleton A (2014) Learning to code. _Geographical Magazine_ 77\n", "35. Singleton A, Arribas-Bel D (2019) Geographic Data Science. _Geographical Analysis_ 0\\[0\\]:15. CrossRef.\n", "36. Spronken-Smith R (2013) Toward securing a future for geography graduates. _Journal of Geography in Higher Education_ 37\\[3\\]: 315–326\n", "37. Stone B (2013) Differences Between For & While Loops (in Python). Video, YouTube. URL: https://www.youtube.com/watch?v=9AJ0uoxtdCQ\n", "38. The British Academy (2012) _Society counts_. Report, The British Academy. URL: https://www.thebritishacademy.ac.uk/sites/default/files/BA%20Position%20Statement%20-%20Society%20Counts.pdf\n", "39. The Royal Society (2019) _Dynamics of data science skills: How can all sectors benefit from data science talent?_ Report, The Royal Society. URL: https://royalsociety.org/-/media/policy/projects/dynamics-of-data-science/dynamics-of-data-science-skills-report.pdf\n", "40. Torrens P (2010) Geography and computational social science. _GeoJournal_ 75: 133-148\n", "41. Ufford M, Pacer M, Seal M, Kelley K (2018) _Beyond interactive: Notebook innovation at Netflix_. Blog post, Netflix. URL: https://medium.com/netflix-techblog/notebook-innovation-591ee3221233. \\[Last checked: 3 October 2019\\]\n", "42. Wikle TA, Fagin TD (2014) GIS course planning: A comparison of syllabi at US college and universities. _Transactions in GIS_ 18:574–585. CrossRef.\n", "43. Wise NA (2018) Assessing the use of geospatial technologies in higher education teaching. _European Journal of Geography_ 9\\[3\\]\n", "44. Xiao N (2016) _GIS Algorithms: Theory and Applications for Geographic Information Science & Technology_. Research Methods. SAGE" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 5 }