Julian's Blog - On the Nature of Python

JulianHysi 9 April, 2022

On the Nature of Python

Python's popularity has been steadily increasing over the past decade, making it one of the most used programming languages as of the time of writing this article. According to TIOBE's March 2022 report, not only is Python the most popular one, but at the same time the one rising faster, out of the top 20 languages on the list. While that's not necessarily true and different organizations use different metrics to measure popularity, Python always makes the first few spots in such rankings.

What factors account for Python's rise in popularity ? Let's enumerate a few:

- popular in academia and among tech giants
- simple and readable syntax
- beginner-friendly, having a shallow learning curve
- free and open source
- vast number of third-party libraries for basically everything you need
- continuously maintained and updated
- large and helpful community
- versatile (you can use it to build desktop, web, CLI, or machine learning applications)
- easy portability

For the reasons mentioned above, and more, many developers are switching to Python as their primary back-end language, or simply adding it as another powerful tool to their tech stack. Some even just want to play around with it and get a feel of what it's like to code in Python. For this target audience, we'll try to briefly cover some of the properties that define Python as a programming language.

The programming paradigm

Python came to existence in the early 90s, when Guido Van Rossum was looking to create a tool hybrid of C and bash; not as low-level as C, but more flexible and general-purpose than bash (as per Guido himself).
The OOP paradigm had already been well established and popularized by C++ and Smalltalk. As such, Python embraced and supported OOP since the genesis. So yes, Python is an object-oriented language, and you can exercise that paradigm pretty much just like you would in most other OOP languages.

Few things to note here though:

- Python has no means of enforcing strong encapsulation. The language strives to give developers more freedom and responsibility. Like Guido says, we're all adults here!
- Python allows for multiple inheritance, something rather rare in most modern OOP languages today.

Keeping that in mind, writing OOP code in Python shouldn't feel that different, after you get used to the syntax.

However, Python is what is called a multi-paradigm language. What this means is, even though you can construct your application in an OOP way, you are not obliged to do so. While in languages like Java you need to have a class with a static method even for just printing "Hello World", in Python such constructs are not necessary. A simple one-liner print statement will do!
Using Python in a purely procedural way like that, comes in handy for one-off scripts and simple automations. You can as well do functional programming with Python. And you can mix different paradigms together in the same project.
I think this flexibility is awesome, because it allows you as a developer to adjust your approach based on the circumstances of the problem you are facing, rather than being forced to overuse one paradigm. Often times we see code in strictly-OOP languages that attempts to abstract away complexity, but results in increasing it, and where procedural code would've have been more appropriate.

Interpreted vs Compiled

Python is generally regarded as an interpreted language, which means the code is interpreted during execution time.
But everything compiles to something, as the CPU can only execute machine code. What is meant with Python being an interpreted language, is that the compilation step happens implicitly. In compiled languages, you can issue a build of your code now, and run the code a few hours later, or never. The two steps are clearly separated. In Python, both of those steps happen at once when you run your code, in a atomic way.
Python compiles to bytecode, which is similar to CPU instructions, but executed by a Virtual Machine first. You do not manually invoke the compiler though, like you'd do in Java.

What are the implications of this ? Well, compiled languages are usually faster, because the code has already been compiled; it just needs to execute. Also, the build step will catch some types of errors before they happen; in Python all errors are runtime errors.
Interpreted languages though usually come with the benefit of a better and faster development/debugging experience, smaller program sizes, easier portability etc. Trade offs...trade offs everywhere.

One thing we must keep in mind though is that whether a language is considered interpreted or compiled is rather a property of the implementation, than the language itself.
A big deal of languages have just one implementation, therefore the two concepts are used interchangeably, but they're not the same thing! Python has many implementations available. In this article, when I say Python I'm talking about CPython, which is the default and most widespread implementation, and the only one I have ever used. But there are other implementations written in .NET, Java, Python itself, etc which behave differently.
For example, PyPy (a Python implementation in Python) performs just-in-time compilation, as opposed to creating intermediate bytecode like CPython does, thus achieving faster performance. However even PyPy does not produce a standalone binary that you can just execute on demand. Still both steps have to happen at once, but the underlying process is different.

Obviously there is much nuance to this discussion, but we're not going to delve into low level details about how compilers, interpreters, assemblers, linkers and loaders work. It's an interesting world without doubt, though one which I'm not that knowledgeable about, so I'd advise you to dig further yourself if you're curious. We did outline some important characteristics of CPython and where it stands on the compiled vs interpreted matter, which is definitely good to know.

Statically typed vs dynamically typed

All programming languages include some kind of type system that formalizes which categories of objects it can work with, and how those categories are treated.
Python is a dynamically typed language, meaning the types are evaluated only at runtime. In contrast, statically typed languages include (or rather demand) a type at the same line you are declaring the variable. In Python, there is no
way to enforce types when you write the code. Even if you explicitly cast a variable to a str, it will only be evaluated as str (or raise an Exception if not possible) at runtime when the interpreter will go through that line.

Statically typed languages usually also do not allow you to change the type of a variable at runtime (like Python does). That means if you declare a variable as an integer, it will be an integer throughout the entire execution cycle. This aspect can be
tweaked though, and doesn't fall under the definition of being statically typed.
What you need to know about Python is that a variable can be evaluated as str in one line, as int in the line below, and so on, depending on what data you reassign to it.

a = 'lorem ipsum' # will be evaluated as str a = 1984 # a is a int now a = True # and now it's a bool

Language implementations which are interpreted, are always dynamically typed. They cannot be statically typed, because in order to be so, the type error would need to occur before runtime, which isn't a phase that interpreters support.
But implementations which are compiled, are only usually statically typed. There are exceptions: Lisp and Julia have implementations which are compiled yet dynamically typed.

Recent versions of Python include type hints: they're just annotations serving the purpose of making intent more clear, and can as well be used by static code analysis tools or IDEs to warn you about possible errors. Put simply, they make the code more
readable and help other developers, but they have no effect on the interpreter. As long as the interpreter is concerned, type annotations are just another way of writing comments, and as such, are ignored during execution.

Another important concept in dynamically typed languages is Duck Typing.

Strongly typed vs weakly typed

Python is a strongly typed language, meaning it is restrictive about how different types can interact with one another. For example, the statement below:

print('Hello' + 5)

will raise a TypeError, saying you cannot concatenate strings and integers. That's because the addition operator doesn't work on any two types. For new types (i.e. classes) that you as a developer create, you will have to define how objects of that type will handle the addition operation (by overriding the corresponding dunder method), if you want your objects to be able to support addition.

But in a weakly typed language, the statement above could execute just fine, usually treating both types as one that would encompass both. The common type could be a string, converting the number also to a string, resulting in a no-loss conversion.
Or it could treat strings as numbers, which is what PHP (a weakly typed language) does. If you add a string and a number in PHP, it will extract only the number out of the string, if there is any, and add that to the other integer.

The main takeaway from all this is that Python is a strongly typed language. Even though the type is only evaluated at runtime, variables do ultimately have a type and that type carries importance when performing operations.

Is Python slow

This question keeps resurfacing over and over again. The short answer is: it depends.
First of all we need to define what implementation of Python we're talking about, because that affects performance. Let's once again go with CPython which is the most popular one.
For reasons stated in the sections above, a language that is interpreted and dynamically typed, is by definition slower than a compiled and statically typed one. In that regard, we'd generally expect a Python program to run slower than, one written in Java let's say.

But there's a tool and place for everything. I would not use Python to build a performance-critical application. I would not use Python if I'm trying to squeeze the last bit of performance out of my hardware, or have no budget for future vertical scaling. I would not use Python for embedded software, where the resources are quite limited. The vast majority of software development projects though do not fall under these categories.

Python is the de facto language of machine learning, which is a discipline involving some of the most resource-intensive operations on large data sets. How does it do it then ?
Python has a number of scientific and data manipulation libraries that are written in C, which Python simply links to. In these cases, Python serves more as an interface or glue that calls the C code, and yields back the results. In this manner, it is possible to achieve C-like performance, with some obvious but negligible overhead of course.
Most Python developers who use these libraries don't know C at all, and they don't need to. Thanks to the great work of the Python community, we can reap the benefits of C without having to wrestle with its not-so-friendly developer experience.
Offloading to C (which CPython itself is written in) or Rust is not so cumbersome, so it's an excellent alternative for avoiding the performance bottlenecks. If you are well versed with those languages yourself, that's even better, because can you glue them together exactly how you want to.

Another thing we need to address when discussing performance, is execution time versus development time. So far we have only talked about CPU execution cycles. While that is important of course, another thing that often presents itself as even more important is development time.
Python is famous for speeding up development time. The motto of Django, one of the most used Python frameworks, is "The web framework for perfectionists with deadlines". The Python ecosystem comes with a lot of things that work out of the box, and almost 400.000 third-party libraries, meaning you rarely ever have to invent the wheel.
And the language itself, having less boilerplate and an intentionally clear and simple syntax (see The Zen of Python), aids a lot in getting you up and running with your project idea in a relatively short time.
If it takes you 3 months to build something with Python, but 6 or 9 months to achieve the same thing with Go or Java, you have to beg the question of what matters more. I would argue that for a software development business, the most expensive resource is developer time, not servers. Maybe upgrading to faster hardware is not even remotely expensive compared to the monthly payroll.

There's no denying that some back end languages are simply better than Python at concurrency and execution speed, by the most simple definition of it. But Python should be fast enough most of the time. You'd struggle to find examples where the whole project had to be rewritten from scratch, because Python would not cut it. And even if there are, I'm willing to bet the team did not perform proper due diligence before starting the project, to see what fits their needs and what doesn't. On the other hand, some of the most notable figures in the Python community, including Guido himself, are actively working on improving performance, with some ambitious claims on the way.

Conclusion

Python is a flexible and versatile programming language, allowing you to code in multiple paradigms and environments. It has a large community of developers, rich ecosystem of libraries and tools, and significantly facilitates the development process, which is why it has become the language of choice for many start ups. It's not the one tool to rule them all (like with everything else in software development), and due to its popularity, can in fact risk being overused even where it is not suitable.
But overall it is a great language to add to your belt, and a safe bet for most projects, as long as you consider the trade offs laid out in this article.

Comment