Python Compilation/Interpretation Process

The bytecode is not actually interpreted to machine code, unless you are using some exotic implementation such as pypy. Other than that, you have the description correct. The bytecode is loaded into the Python runtime and interpreted by a virtual machine, which is a piece of code that reads each instruction in the bytecode and … Read more

Why is one class variable not defined in list comprehension but another is?

data is the source of the list comprehension; it is the one parameter that is passed to the nested scope created. Everything in the list comprehension is run in a separate scope (as a function, basically), except for the iterable used for the left-most for loop. You can see this in the byte code: >>> … Read more

What algorithm does Python’s built-in sort() method use?

Sure! The code’s here, starting with function islt and proceeding for QUITE a while;-). As Chris’s comment suggests, it’s C code. You’ll also want to read this text file for a textual explanation, results, etc etc. If you prefer reading Java code than C code, you could look at Joshua Bloch’s implementation of timsort in … Read more

Is there anything faster than dict()?

No, there is nothing faster than a dictionary for this task and that’s because the complexity of its indexing (getting and setting item) and even membership checking is O(1) in average. (check the complexity of the rest of functionalities on Python doc https://wiki.python.org/moin/TimeComplexity ) Once you saved your items in a dictionary, you can have … Read more

How is unicode represented internally in Python?

I’m assuming you want to know about CPython, the standard implementation. Python 2 and Python 3.0-3.2 use either UCS2* or UCS4 for Unicode characters, meaning it’ll either use 2 bytes or 4 bytes for each character. Which one is picked is a compile-time option. \u2049 is then represented as either \x49\x20 or \x20\x49 or \x49\x20\x00\x00 … Read more

Identifier normalization: Why is the micro sign converted into the Greek letter mu?

There are two different characters involved here. One is the MICRO SIGN, which is the one on the keyboard, and the other is GREEK SMALL LETTER MU. To understand what’s going on, we should take a look at how Python defines identifiers in the language reference: identifier ::= xid_start xid_continue* id_start ::= <all characters in … Read more

Accessing dictionary items by position in Python 3.6+ efficiently

For an OrderedDict it’s inherently O(n) because the ordering is recorded in a linked list. For the builtin dict, there’s a vector (a contiguous array) rather than a linked list, but pretty much the same thing in the end: the vector contains a few kind of “dummies”, special internal values that mean “no key has … Read more

Python string literal concatenation

Read the reference manual, it’s in there. Specifically: Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. Thus, “hello” ‘world’ is equivalent to “helloworld”. This feature can be used to reduce the number of backslashes needed, to split long … Read more