«Наименьшее удивление» и изменчивый аргумент по умолчанию

Jul 16 2009

Любой, кто достаточно долго возился с Python, был укушен (или разорван) из-за следующей проблемы:

def foo(a=[]):
    a.append(5)
    return a

Python послушники бы ожидать эта функция всегда возвращает список только с одним элементом: [5]. Результат будет совсем другим и очень удивительным (для новичка):

>>> foo()
[5]
>>> foo()
[5, 5]
>>> foo()
[5, 5, 5]
>>> foo()
[5, 5, 5, 5]
>>> foo()

Один из моих менеджеров однажды впервые столкнулся с этой функцией и назвал ее «серьезным недостатком дизайна» языка. Я ответил, что у такого поведения есть фундаментальное объяснение, и это действительно очень загадочно и неожиданно, если вы не понимаете внутренностей. Однако я не смог ответить (себе) на следующий вопрос: какова причина привязки аргумента по умолчанию при определении функции, а не при выполнении функции? Я сомневаюсь, что опытное поведение имеет практическое применение (кто действительно использовал статические переменные в C, не создавая ошибок?)

Редактировать :

Бачек привел интересный пример. Вместе с большинством ваших комментариев и, в частности, с комментариями Утаала, я уточнил:

>>> def a():
...     print("a executed")
...     return []
... 
>>>            
>>> def b(x=a()):
...     x.append(5)
...     print(x)
... 
a executed
>>> b()
[5]
>>> b()
[5, 5]

Мне кажется, что дизайнерское решение было связано с тем, где разместить объем параметров: внутри функции или «вместе» с ней?

Выполнение привязки внутри функции будет означать, что xпри вызове функции она будет эффективно привязана к указанному по умолчанию, а не определена, что представляет собой серьезный недостаток: defстрока будет «гибридной» в том смысле, что часть привязки объект функции) произойдет во время определения, а часть (назначение параметров по умолчанию) - во время вызова функции.

Фактическое поведение более последовательное: все в этой строке оценивается при выполнении этой строки, то есть при определении функции.

Ответы

1658 rob Jul 18 2009 at 04:29

На самом деле, это не недостаток дизайна, и это не из-за внутренних компонентов или производительности.
Это происходит просто из-за того, что функции в Python являются первоклассными объектами, а не только частью кода.

Как только вы начинаете думать таким образом, это становится полностью осмысленным: функция - это объект, оцениваемый по ее определению; параметры по умолчанию являются своего рода «данными-членами», и поэтому их состояние может меняться от одного вызова к другому - точно так же, как и в любом другом объекте.

В любом случае у Effbot есть очень хорошее объяснение причин такого поведения в Значениях параметров по умолчанию в Python .
Мне это показалось очень ясным, и я действительно предлагаю прочитать его, чтобы лучше понять, как работают объекты функций.

278 EliCourtwright Jul 16 2009 at 01:11

Предположим, у вас есть следующий код

fruits = ("apples", "bananas", "loganberries")

def eat(food=fruits):
    ...

Когда я вижу объявление eat, наименее удивительно думать, что если первый параметр не указан, то он будет равен кортежу ("apples", "bananas", "loganberries")

Однако, как предполагается позже в коде, я делаю что-то вроде

def some_random_function():
    global fruits
    fruits = ("blueberries", "mangos")

тогда, если бы параметры по умолчанию были связаны при выполнении функции, а не при объявлении функции, я был бы удивлен (в очень плохом смысле), обнаружив, что фрукты были изменены. Это было бы более удивительным ИМО, чем обнаружение того, что ваша fooфункция выше изменяла список.

Настоящая проблема заключается в изменяемых переменных, и все языки в той или иной степени имеют эту проблему. Вот вопрос: предположим, на Java у меня есть следующий код:

StringBuffer s = new StringBuffer("Hello World!");
Map<StringBuffer,Integer> counts = new HashMap<StringBuffer,Integer>();
counts.put(s, 5);
s.append("!!!!");
System.out.println( counts.get(s) );  // does this work?

Теперь моя карта использует значение StringBufferключа, когда он был помещен на карту, или он хранит ключ по ссылке? В любом случае, кто-то удивлен; либо человек, который пытался получить объект из Mapиспользования, используя значение, идентичное тому, с которым он его вставил, либо человек, который не может получить свой объект, даже если ключ, который они используют, буквально является тем же который использовался для размещения его на карте (именно поэтому Python не позволяет использовать свои изменяемые встроенные типы данных в качестве ключей словаря).

Ваш пример является хорошим примером того, как новички в Python будут удивлены и укушены. Но я бы сказал, что если бы мы «исправили» это, то это только создало бы другую ситуацию, в которой они были бы укушены, и эта ситуация была бы еще менее интуитивной. Более того, это всегда так при работе с изменяемыми переменными; вы всегда сталкиваетесь со случаями, когда кто-то может интуитивно ожидать одного или противоположного поведения в зависимости от того, какой код они пишут.

Мне лично нравится текущий подход Python: аргументы функции по умолчанию оцениваются, когда функция определена, и этот объект всегда является значением по умолчанию. Я полагаю, они могли бы использовать специальный случай с пустым списком, но такой особый корпус вызвал бы еще большее удивление, не говоря уже о обратной несовместимости.

244 glglgl Jul 10 2012 at 21:50

Соответствующая часть документации :

Значения параметров по умолчанию оцениваются слева направо при выполнении определения функции. Это означает, что выражение вычисляется один раз, когда функция определена, и что одно и то же «предварительно вычисленное» значение используется для каждого вызова. Это особенно важно понимать, когда параметром по умолчанию является изменяемый объект, такой как список или словарь: если функция изменяет объект (например, добавляя элемент в список), значение по умолчанию фактически изменяется. Обычно это не то, что было задумано. Способ обойти это - использовать Noneпо умолчанию и явно проверить это в теле функции, например:

def whats_on_the_telly(penguin=None):
    if penguin is None:
        penguin = []
    penguin.append("property of the zoo")
    return penguin
121 Utaal Jul 16 2009 at 06:21

Я ничего не знаю о внутренней работе интерпретатора Python (и я тоже не эксперт в компиляторах и интерпретаторах), поэтому не вините меня, если я предлагаю что-то непонятное или невозможное.

Я думаю, что при условии, что объекты python изменяемы , это следует учитывать при разработке аргументов по умолчанию. Когда вы создаете список:

a = []

вы ожидаете получить новый список, на который ссылается a.

Почему a=[]в

def x(a=[]):

создать новый список при определении функции, а не при вызове? Это похоже на то, как вы спрашиваете: «Если пользователь не предоставляет аргумент, создайте экземпляр нового списка и используйте его, как если бы он был создан вызывающей стороной». Я думаю, что это двусмысленно:

def x(a=datetime.datetime.now()):

пользователь, вы хотите aустановить по умолчанию дату и время, соответствующее тому, когда вы определяете или выполняете x? В этом случае, как и в предыдущем, я сохраню такое же поведение, как если бы аргумент по умолчанию «назначение» был первой инструкцией функции ( datetime.now()вызываемой при вызове функции). С другой стороны, если пользователю нужно отображение времени определения, он может написать:

b = datetime.datetime.now()
def x(a=b):

Знаю, знаю: это закрытие. В качестве альтернативы Python может предоставить ключевое слово для принудительной привязки времени определения:

def x(static a=b):
83 LennartRegebro Jul 16 2009 at 01:54

Причина в том, что привязки выполняются, когда код выполняется, а определение функции выполняется, ну ... когда функции определены.

Сравните это:

class BananaBunch:
    bananas = []

    def addBanana(self, banana):
        self.bananas.append(banana)

Этот код страдает точно такой же неожиданной случайностью. bananas - это атрибут класса, и, следовательно, когда вы добавляете к нему элементы, он добавляется ко всем экземплярам этого класса. Причина точно такая же.

Это просто «Как это работает», и заставить его работать по-другому в случае функции, вероятно, будет сложно, а в случае класса, вероятно, невозможно, или, по крайней мере, сильно замедлить создание экземпляра объекта, так как вам придется сохранить код класса. и выполнять его при создании объектов.

Да, это неожиданно. Но как только копейка падает, она идеально подходит к тому, как работает Python в целом. На самом деле, это хорошее учебное пособие, и как только вы поймете, почему это происходит, вы сможете лучше понять Python.

Тем не менее, он должен занимать видное место в любом хорошем учебнике по Python. Потому что, как вы упомянули, каждый рано или поздно сталкивается с этой проблемой.

69 DimitrisFasarakisHilliard Dec 09 2015 at 14:13

Почему бы тебе не заняться самоанализом?

Я действительно удивлен, что никто не выполнил проницательный самоанализ, предлагаемый Python ( 2и 3применить) к вызываемым объектам.

Учитывая небольшую простую функцию, funcопределенную как:

>>> def func(a = []):
...    a.append(5)

Когда Python встречает его, первое, что он делает, это скомпилирует его, чтобы создать codeобъект для этой функции. Пока выполняется этот шаг компиляции, Python оценивает * и затем сохраняет аргументы по умолчанию (здесь пустой список []) в самом объекте функции . Как упоминалось в верхнем ответе: список aтеперь можно рассматривать как член функции func.

Итак, давайте проведем некоторый самоанализ, до и после, чтобы изучить, как список расширяется внутри объекта функции. Я использую Python 3.xдля этого, для Python 2 применяется то же самое (используйте __defaults__или func_defaultsв Python 2; да, два имени для одного и того же).

Функция перед выполнением:

>>> def func(a = []):
...     a.append(5)
...     

После того, как Python выполнит это определение, он возьмет все указанные параметры по умолчанию ( a = []здесь) и поместит их в __defaults__атрибут для объекта функции (соответствующий раздел: Callables):

>>> func.__defaults__
([],)

Итак, пустой список в качестве единственной записи __defaults__, как и ожидалось.

Функция после выполнения:

Теперь выполним эту функцию:

>>> func()

Теперь посмотрим на них __defaults__снова:

>>> func.__defaults__
([5],)

Пораженный? Значение внутри объекта меняется! Последовательные вызовы функции теперь просто добавляются к этому встроенному listобъекту:

>>> func(); func(); func()
>>> func.__defaults__
([5, 5, 5, 5],)

Итак, вот и все, причина, по которой возникает этот «недостаток» , заключается в том, что аргументы по умолчанию являются частью объекта функции. Здесь нет ничего странного, это все немного удивительно.

Обычное решение для борьбы с этим - использовать Noneпо умолчанию, а затем инициализировать в теле функции:

def func(a = None):
    # or: a = [] if a is None else a
    if a is None:
        a = []

Поскольку тело функции каждый раз выполняется заново, вы всегда получаете новый пустой список, если для него не был передан аргумент a.


Чтобы еще раз убедиться, что список в __defaults__такой же, как и в функции, funcвы можете просто изменить свою функцию, чтобы она возвращала idсписок, aиспользуемый внутри тела функции. Затем сравните его со списком в __defaults__(позиция [0]в __defaults__), и вы увидите, как они действительно относятся к одному и тому же экземпляру списка:

>>> def func(a = []): 
...     a.append(5)
...     return id(a)
>>>
>>> id(func.__defaults__[0]) == func()
True

Все с силой самоанализа!


* Чтобы убедиться, что Python оценивает аргументы по умолчанию во время компиляции функции, попробуйте выполнить следующее:

def bar(a=input('Did you just see me without calling the function?')): 
    pass  # use raw_input in Py2

как вы заметите, input()вызывается до того, как будет выполнен процесс построения функции и привязки ее к имени bar.

59 Brian Jul 16 2009 at 17:05

Раньше я думал, что создание объектов во время выполнения будет лучшим подходом. Сейчас я менее уверен, поскольку вы теряете некоторые полезные функции, хотя, возможно, это того стоит, просто чтобы не запутаться новичкам. Недостатки этого:

1. Performance

def foo(arg=something_expensive_to_compute())):
    ...

If call-time evaluation is used, then the expensive function is called every time your function is used without an argument. You'd either pay an expensive price on each call, or need to manually cache the value externally, polluting your namespace and adding verbosity.

2. Forcing bound parameters

A useful trick is to bind parameters of a lambda to the current binding of a variable when the lambda is created. For example:

funcs = [ lambda i=i: i for i in range(10)]

This returns a list of functions that return 0,1,2,3... respectively. If the behaviour is changed, they will instead bind i to the call-time value of i, so you would get a list of functions that all returned 9.

The only way to implement this otherwise would be to create a further closure with the i bound, ie:

def make_func(i): return lambda: i
funcs = [make_func(i) for i in range(10)]

3. Introspection

Consider the code:

def foo(a='test', b=100, c=[]):
   print a,b,c

We can get information about the arguments and defaults using the inspect module, which

>>> inspect.getargspec(foo)
(['a', 'b', 'c'], None, None, ('test', 100, []))

This information is very useful for things like document generation, metaprogramming, decorators etc.

Now, suppose the behaviour of defaults could be changed so that this is the equivalent of:

_undefined = object()  # sentinel value

def foo(a=_undefined, b=_undefined, c=_undefined)
    if a is _undefined: a='test'
    if b is _undefined: b=100
    if c is _undefined: c=[]

However, we've lost the ability to introspect, and see what the default arguments are. Because the objects haven't been constructed, we can't ever get hold of them without actually calling the function. The best we could do is to store off the source code and return that as a string.

55 LutzPrechelt Mar 30 2015 at 18:18

5 points in defense of Python

  1. Simplicity: The behavior is simple in the following sense: Most people fall into this trap only once, not several times.

  2. Consistency: Python always passes objects, not names. The default parameter is, obviously, part of the function heading (not the function body). It therefore ought to be evaluated at module load time (and only at module load time, unless nested), not at function call time.

  3. Usefulness: As Frederik Lundh points out in his explanation of "Default Parameter Values in Python", the current behavior can be quite useful for advanced programming. (Use sparingly.)

  4. Sufficient documentation: In the most basic Python documentation, the tutorial, the issue is loudly announced as an "Important warning" in the first subsection of Section "More on Defining Functions". The warning even uses boldface, which is rarely applied outside of headings. RTFM: Read the fine manual.

  5. Meta-learning: Falling into the trap is actually a very helpful moment (at least if you are a reflective learner), because you will subsequently better understand the point "Consistency" above and that will teach you a great deal about Python.

53 ymv Jul 16 2009 at 02:15

This behavior is easy explained by:

  1. function (class etc.) declaration is executed only once, creating all default value objects
  2. everything is passed by reference

So:

def x(a=0, b=[], c=[], d=0):
    a = a + 1
    b = b + [1]
    c.append(1)
    print a, b, c
  1. a doesn't change - every assignment call creates new int object - new object is printed
  2. b doesn't change - new array is build from default value and printed
  3. c changes - operation is performed on same object - and it is printed
35 GlennMaynard Jul 16 2009 at 03:18

What you're asking is why this:

def func(a=[], b = 2):
    pass

isn't internally equivalent to this:

def func(a=None, b = None):
    a_default = lambda: []
    b_default = lambda: 2
    def actual_func(a=None, b=None):
        if a is None: a = a_default()
        if b is None: b = b_default()
    return actual_func
func = func()

except for the case of explicitly calling func(None, None), which we'll ignore.

In other words, instead of evaluating default parameters, why not store each of them, and evaluate them when the function is called?

One answer is probably right there--it would effectively turn every function with default parameters into a closure. Even if it's all hidden away in the interpreter and not a full-blown closure, the data's got to be stored somewhere. It'd be slower and use more memory.

35 hynekcer Nov 23 2012 at 01:09

1) The so-called problem of "Mutable Default Argument" is in general a special example demonstrating that:
"All functions with this problem suffer also from similar side effect problem on the actual parameter,"
That is against the rules of functional programming, usually undesiderable and should be fixed both together.

Example:

def foo(a=[]):                 # the same problematic function
    a.append(5)
    return a

>>> somevar = [1, 2]           # an example without a default parameter
>>> foo(somevar)
[1, 2, 5]
>>> somevar
[1, 2, 5]                      # usually expected [1, 2]

Solution: a copy
An absolutely safe solution is to copy or deepcopy the input object first and then to do whatever with the copy.

def foo(a=[]):
    a = a[:]     # a copy
    a.append(5)
    return a     # or everything safe by one line: "return a + [5]"

Many builtin mutable types have a copy method like some_dict.copy() or some_set.copy() or can be copied easy like somelist[:] or list(some_list). Every object can be also copied by copy.copy(any_object) or more thorough by copy.deepcopy() (the latter useful if the mutable object is composed from mutable objects). Some objects are fundamentally based on side effects like "file" object and can not be meaningfully reproduced by copy. copying

Example problem for a similar SO question

class Test(object):            # the original problematic class
  def __init__(self, var1=[]):
    self._var1 = var1

somevar = [1, 2]               # an example without a default parameter
t1 = Test(somevar)
t2 = Test(somevar)
t1._var1.append([1])
print somevar                  # [1, 2, [1]] but usually expected [1, 2]
print t2._var1                 # [1, 2, [1]] but usually expected [1, 2]

It shouldn't be neither saved in any public attribute of an instance returned by this function. (Assuming that private attributes of instance should not be modified from outside of this class or subclasses by convention. i.e. _var1 is a private attribute )

Conclusion:
Input parameters objects shouldn't be modified in place (mutated) nor they should not be binded into an object returned by the function. (If we prefere programming without side effects which is strongly recommended. see Wiki about "side effect" (The first two paragraphs are relevent in this context.) .)

2)
Only if the side effect on the actual parameter is required but unwanted on the default parameter then the useful solution is def ...(var1=None): if var1 is None: var1 = [] More..

3) In some cases is the mutable behavior of default parameters useful.

31 Ben May 23 2011 at 11:24

This actually has nothing to do with default values, other than that it often comes up as an unexpected behaviour when you write functions with mutable default values.

>>> def foo(a):
    a.append(5)
    print a

>>> a  = [5]
>>> foo(a)
[5, 5]
>>> foo(a)
[5, 5, 5]
>>> foo(a)
[5, 5, 5, 5]
>>> foo(a)
[5, 5, 5, 5, 5]

No default values in sight in this code, but you get exactly the same problem.

The problem is that foo is modifying a mutable variable passed in from the caller, when the caller doesn't expect this. Code like this would be fine if the function was called something like append_5; then the caller would be calling the function in order to modify the value they pass in, and the behaviour would be expected. But such a function would be very unlikely to take a default argument, and probably wouldn't return the list (since the caller already has a reference to that list; the one it just passed in).

Your original foo, with a default argument, shouldn't be modifying a whether it was explicitly passed in or got the default value. Your code should leave mutable arguments alone unless it is clear from the context/name/documentation that the arguments are supposed to be modified. Using mutable values passed in as arguments as local temporaries is an extremely bad idea, whether we're in Python or not and whether there are default arguments involved or not.

If you need to destructively manipulate a local temporary in the course of computing something, and you need to start your manipulation from an argument value, you need to make a copy.

27 Stéphane Mar 27 2015 at 06:14

Already busy topic, but from what I read here, the following helped me realizing how it's working internally:

def bar(a=[]):
     print id(a)
     a = a + [1]
     print id(a)
     return a

>>> bar()
4484370232
4484524224
[1]
>>> bar()
4484370232
4484524152
[1]
>>> bar()
4484370232 # Never change, this is 'class property' of the function
4484523720 # Always a new object 
[1]
>>> id(bar.func_defaults[0])
4484370232
25 JasonBaker Jul 16 2009 at 06:18

It's a performance optimization. As a result of this functionality, which of these two function calls do you think is faster?

def print_tuple(some_tuple=(1,2,3)):
    print some_tuple

print_tuple()        #1
print_tuple((1,2,3)) #2

I'll give you a hint. Here's the disassembly (see http://docs.python.org/library/dis.html):

#1

0 LOAD_GLOBAL              0 (print_tuple)
3 CALL_FUNCTION            0
6 POP_TOP
7 LOAD_CONST               0 (None)
10 RETURN_VALUE

#2

 0 LOAD_GLOBAL              0 (print_tuple)
 3 LOAD_CONST               4 ((1, 2, 3))
 6 CALL_FUNCTION            1
 9 POP_TOP
10 LOAD_CONST               0 (None)
13 RETURN_VALUE

I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs ?)

As you can see, there is a performance benefit when using immutable default arguments. This can make a difference if it's a frequently called function or the default argument takes a long time to construct. Also, bear in mind that Python isn't C. In C you have constants that are pretty much free. In Python you don't have this benefit.

25 AaronHall May 01 2016 at 23:20

Python: The Mutable Default Argument

Default arguments get evaluated at the time the function is compiled into a function object. When used by the function, multiple times by that function, they are and remain the same object.

When they are mutable, when mutated (for example, by adding an element to it) they remain mutated on consecutive calls.

They stay mutated because they are the same object each time.

Equivalent code:

Since the list is bound to the function when the function object is compiled and instantiated, this:

def foo(mutable_default_argument=[]): # make a list the default argument
    """function that uses a list"""

is almost exactly equivalent to this:

_a_list = [] # create a list in the globals

def foo(mutable_default_argument=_a_list): # make it the default argument
    """function that uses a list"""

del _a_list # remove globals name binding

Demonstration

Here's a demonstration - you can verify that they are the same object each time they are referenced by

  • seeing that the list is created before the function has finished compiling to a function object,
  • observing that the id is the same each time the list is referenced,
  • observing that the list stays changed when the function that uses it is called a second time,
  • observing the order in which the output is printed from the source (which I conveniently numbered for you):

example.py

print('1. Global scope being evaluated')

def create_list():
    '''noisily create a list for usage as a kwarg'''
    l = []
    print('3. list being created and returned, id: ' + str(id(l)))
    return l

print('2. example_function about to be compiled to an object')

def example_function(default_kwarg1=create_list()):
    print('appending "a" in default default_kwarg1')
    default_kwarg1.append("a")
    print('list with id: ' + str(id(default_kwarg1)) + 
          ' - is now: ' + repr(default_kwarg1))

print('4. example_function compiled: ' + repr(example_function))


if __name__ == '__main__':
    print('5. calling example_function twice!:')
    example_function()
    example_function()

and running it with python example.py:

1. Global scope being evaluated
2. example_function about to be compiled to an object
3. list being created and returned, id: 140502758808032
4. example_function compiled: <function example_function at 0x7fc9590905f0>
5. calling example_function twice!:
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a']
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a', 'a']

Does this violate the principle of "Least Astonishment"?

This order of execution is frequently confusing to new users of Python. If you understand the Python execution model, then it becomes quite expected.

The usual instruction to new Python users:

But this is why the usual instruction to new users is to create their default arguments like this instead:

def example_function_2(default_kwarg=None):
    if default_kwarg is None:
        default_kwarg = []

This uses the None singleton as a sentinel object to tell the function whether or not we've gotten an argument other than the default. If we get no argument, then we actually want to use a new empty list, [], as the default.

As the tutorial section on control flow says:

If you don’t want the default to be shared between subsequent calls, you can write the function like this instead:

def f(a, L=None):
    if L is None:
        L = []
    L.append(a)
    return L
24 Baczek Jul 16 2009 at 19:19

The shortest answer would probably be "definition is execution", therefore the whole argument makes no strict sense. As a more contrived example, you may cite this:

def a(): return []

def b(x=a()):
    print x

Hopefully it's enough to show that not executing the default argument expressions at the execution time of the def statement isn't easy or doesn't make sense, or both.

I agree it's a gotcha when you try to use default constructors, though.

21 DmitryMinkovsky Apr 25 2012 at 02:43

This behavior is not surprising if you take the following into consideration:

  1. The behavior of read-only class attributes upon assignment attempts, and that
  2. Functions are objects (explained well in the accepted answer).

The role of (2) has been covered extensively in this thread. (1) is likely the astonishment causing factor, as this behavior is not "intuitive" when coming from other languages.

(1) is described in the Python tutorial on classes. In an attempt to assign a value to a read-only class attribute:

...all variables found outside of the innermost scope are read-only (an attempt to write to such a variable will simply create a new local variable in the innermost scope, leaving the identically named outer variable unchanged).

Look back to the original example and consider the above points:

def foo(a=[]):
    a.append(5)
    return a

Here foo is an object and a is an attribute of foo (available at foo.func_defs[0]). Since a is a list, a is mutable and is thus a read-write attribute of foo. It is initialized to the empty list as specified by the signature when the function is instantiated, and is available for reading and writing as long as the function object exists.

Calling foo without overriding a default uses that default's value from foo.func_defs. In this case, foo.func_defs[0] is used for a within function object's code scope. Changes to a change foo.func_defs[0], which is part of the foo object and persists between execution of the code in foo.

Now, compare this to the example from the documentation on emulating the default argument behavior of other languages, such that the function signature defaults are used every time the function is executed:

def foo(a, L=None):
    if L is None:
        L = []
    L.append(a)
    return L

Taking (1) and (2) into account, one can see why this accomplishes the desired behavior:

  • When the foo function object is instantiated, foo.func_defs[0] is set to None, an immutable object.
  • When the function is executed with defaults (with no parameter specified for L in the function call), foo.func_defs[0] (None) is available in the local scope as L.
  • Upon L = [], the assignment cannot succeed at foo.func_defs[0], because that attribute is read-only.
  • Per (1), a new local variable also named L is created in the local scope and used for the remainder of the function call. foo.func_defs[0] thus remains unchanged for future invocations of foo.
20 hugo24 Feb 28 2013 at 18:10

A simple workaround using None

>>> def bar(b, data=None):
...     data = data or []
...     data.append(b)
...     return data
... 
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3, [34])
[34, 3]
>>> bar(3, [34])
[34, 3]
19 Alexander Sep 12 2015 at 13:00

I am going to demonstrate an alternative structure to pass a default list value to a function (it works equally well with dictionaries).

As others have extensively commented, the list parameter is bound to the function when it is defined as opposed to when it is executed. Because lists and dictionaries are mutable, any alteration to this parameter will affect other calls to this function. As a result, subsequent calls to the function will receive this shared list which may have been altered by any other calls to the function. Worse yet, two parameters are using this function's shared parameter at the same time oblivious to the changes made by the other.

Wrong Method (probably...):

def foo(list_arg=[5]):
    return list_arg

a = foo()
a.append(6)
>>> a
[5, 6]

b = foo()
b.append(7)
# The value of 6 appended to variable 'a' is now part of the list held by 'b'.
>>> b
[5, 6, 7]  

# Although 'a' is expecting to receive 6 (the last element it appended to the list),
# it actually receives the last element appended to the shared list.
# It thus receives the value 7 previously appended by 'b'.
>>> a.pop()             
7

You can verify that they are one and the same object by using id:

>>> id(a)
5347866528

>>> id(b)
5347866528

Per Brett Slatkin's "Effective Python: 59 Specific Ways to Write Better Python", Item 20: Use None and Docstrings to specify dynamic default arguments (p. 48)

The convention for achieving the desired result in Python is to provide a default value of None and to document the actual behaviour in the docstring.

This implementation ensures that each call to the function either receives the default list or else the list passed to the function.

Preferred Method:

def foo(list_arg=None):
   """
   :param list_arg:  A list of input values. 
                     If none provided, used a list with a default value of 5.
   """
   if not list_arg:
       list_arg = [5]
   return list_arg

a = foo()
a.append(6)
>>> a
[5, 6]

b = foo()
b.append(7)
>>> b
[5, 7]

c = foo([10])
c.append(11)
>>> c
[10, 11]

There may be legitimate use cases for the 'Wrong Method' whereby the programmer intended the default list parameter to be shared, but this is more likely the exception than the rule.

17 Marcin Mar 21 2012 at 00:22

The solutions here are:

  1. Use None as your default value (or a nonce object), and switch on that to create your values at runtime; or
  2. Use a lambda as your default parameter, and call it within a try block to get the default value (this is the sort of thing that lambda abstraction is for).

The second option is nice because users of the function can pass in a callable, which may be already existing (such as a type)

16 joedborg Jan 15 2013 at 18:02

You can get round this by replacing the object (and therefore the tie with the scope):

def foo(a=[]):
    a = list(a)
    a.append(5)
    return a

Ugly, but it works.

16 Saish Sep 12 2014 at 05:05

When we do this:

def foo(a=[]):
    ...

... we assign the argument a to an unnamed list, if the caller does not pass the value of a.

To make things simpler for this discussion, let's temporarily give the unnamed list a name. How about pavlo ?

def foo(a=pavlo):
   ...

At any time, if the caller doesn't tell us what a is, we reuse pavlo.

If pavlo is mutable (modifiable), and foo ends up modifying it, an effect we notice the next time foo is called without specifying a.

So this is what you see (Remember, pavlo is initialized to []):

 >>> foo()
 [5]

Now, pavlo is [5].

Calling foo() again modifies pavlo again:

>>> foo()
[5, 5]

Specifying a when calling foo() ensures pavlo is not touched.

>>> ivan = [1, 2, 3, 4]
>>> foo(a=ivan)
[1, 2, 3, 4, 5]
>>> ivan
[1, 2, 3, 4, 5]

So, pavlo is still [5, 5].

>>> foo()
[5, 5, 5]
16 bgreen-litl Feb 06 2015 at 04:44

I sometimes exploit this behavior as an alternative to the following pattern:

singleton = None

def use_singleton():
    global singleton

    if singleton is None:
        singleton = _make_singleton()

    return singleton.use_me()

If singleton is only used by use_singleton, I like the following pattern as a replacement:

# _make_singleton() is called only once when the def is executed
def use_singleton(singleton=_make_singleton()):
    return singleton.use_me()

I've used this for instantiating client classes that access external resources, and also for creating dicts or lists for memoization.

Since I don't think this pattern is well known, I do put a short comment in to guard against future misunderstandings.

13 ChristosHayward Jul 17 2009 at 02:17

It may be true that:

  1. Someone is using every language/library feature, and
  2. Switching the behavior here would be ill-advised, but

it is entirely consistent to hold to both of the features above and still make another point:

  1. It is a confusing feature and it is unfortunate in Python.

The other answers, or at least some of them either make points 1 and 2 but not 3, or make point 3 and downplay points 1 and 2. But all three are true.

It may be true that switching horses in midstream here would be asking for significant breakage, and that there could be more problems created by changing Python to intuitively handle Stefano's opening snippet. And it may be true that someone who knew Python internals well could explain a minefield of consequences. However,

The existing behavior is not Pythonic, and Python is successful because very little about the language violates the principle of least astonishment anywhere near this badly. It is a real problem, whether or not it would be wise to uproot it. It is a design flaw. If you understand the language much better by trying to trace out the behavior, I can say that C++ does all of this and more; you learn a lot by navigating, for instance, subtle pointer errors. But this is not Pythonic: people who care about Python enough to persevere in the face of this behavior are people who are drawn to the language because Python has far fewer surprises than other language. Dabblers and the curious become Pythonistas when they are astonished at how little time it takes to get something working--not because of a design fl--I mean, hidden logic puzzle--that cuts against the intuitions of programmers who are drawn to Python because it Just Works.

10 MarkRansom Oct 18 2017 at 00:38

This is not a design flaw. Anyone who trips over this is doing something wrong.

There are 3 cases I see where you might run into this problem:

  1. You intend to modify the argument as a side effect of the function. In this case it never makes sense to have a default argument. The only exception is when you're abusing the argument list to have function attributes, e.g. cache={}, and you wouldn't be expected to call the function with an actual argument at all.
  2. You intend to leave the argument unmodified, but you accidentally did modify it. That's a bug, fix it.
  3. You intend to modify the argument for use inside the function, but didn't expect the modification to be viewable outside of the function. In that case you need to make a copy of the argument, whether it was the default or not! Python is not a call-by-value language so it doesn't make the copy for you, you need to be explicit about it.

The example in the question could fall into category 1 or 3. It's odd that it both modifies the passed list and returns it; you should pick one or the other.

9 Norfeldt Jul 22 2013 at 14:35

This "bug" gave me a lot of overtime work hours! But I'm beginning to see a potential use of it (but I would have liked it to be at the execution time, still)

I'm gonna give you what I see as a useful example.

def example(errors=[]):
    # statements
    # Something went wrong
    mistake = True
    if mistake:
        tryToFixIt(errors)
        # Didn't work.. let's try again
        tryToFixItAnotherway(errors)
        # This time it worked
    return errors

def tryToFixIt(err):
    err.append('Attempt to fix it')

def tryToFixItAnotherway(err):
    err.append('Attempt to fix it by another way')

def main():
    for item in range(2):
        errors = example()
    print '\n'.join(errors)

main()

prints the following

Attempt to fix it
Attempt to fix it by another way
Attempt to fix it
Attempt to fix it by another way
8 ytpillai May 26 2015 at 06:04

Just change the function to be:

def notastonishinganymore(a = []): 
    '''The name is just a joke :)'''
    a = a[:]
    a.append(5)
    return a
7 user2384994 Aug 22 2013 at 12:58

I think the answer to this question lies in how python pass data to parameter (pass by value or by reference), not mutability or how python handle the "def" statement.

A brief introduction. First, there are two type of data types in python, one is simple elementary data type, like numbers, and another data type is objects. Second, when passing data to parameters, python pass elementary data type by value, i.e., make a local copy of the value to a local variable, but pass object by reference, i.e., pointers to the object.

Admitting the above two points, let's explain what happened to the python code. It's only because of passing by reference for objects, but has nothing to do with mutable/immutable, or arguably the fact that "def" statement is executed only once when it is defined.

[] is an object, so python pass the reference of [] to a, i.e., a is only a pointer to [] which lies in memory as an object. There is only one copy of [] with, however, many references to it. For the first foo(), the list [] is changed to 1 by append method. But Note that there is only one copy of the list object and this object now becomes 1. When running the second foo(), what effbot webpage says (items is not evaluated any more) is wrong. a is evaluated to be the list object, although now the content of the object is 1. This is the effect of passing by reference! The result of foo(3) can be easily derived in the same way.

To further validate my answer, let's take a look at two additional codes.

====== No. 2 ========

def foo(x, items=None):
    if items is None:
        items = []
    items.append(x)
    return items

foo(1)  #return [1]
foo(2)  #return [2]
foo(3)  #return [3]

[] is an object, so is None (the former is mutable while the latter is immutable. But the mutability has nothing to do with the question). None is somewhere in the space but we know it's there and there is only one copy of None there. So every time foo is invoked, items is evaluated (as opposed to some answer that it is only evaluated once) to be None, to be clear, the reference (or the address) of None. Then in the foo, item is changed to [], i.e., points to another object which has a different address.

====== No. 3 =======

def foo(x, items=[]):
    items.append(x)
    return items

foo(1)    # returns [1]
foo(2,[]) # returns [2]
foo(3)    # returns [1,3]

The invocation of foo(1) make items point to a list object [] with an address, say, 11111111. the content of the list is changed to 1 in the foo function in the sequel, but the address is not changed, still 11111111. Then foo(2,[]) is coming. Although the [] in foo(2,[]) has the same content as the default parameter [] when calling foo(1), their address are different! Since we provide the parameter explicitly, items has to take the address of this new [], say 2222222, and return it after making some change. Now foo(3) is executed. since only x is provided, items has to take its default value again. What's the default value? It is set when defining the foo function: the list object located in 11111111. So the items is evaluated to be the address 11111111 having an element 1. The list located at 2222222 also contains one element 2, but it is not pointed by items any more. Consequently, An append of 3 will make items [1,3].

From the above explanations, we can see that the effbot webpage recommended in the accepted answer failed to give a relevant answer to this question. What is more, I think a point in the effbot webpage is wrong. I think the code regarding the UI.Button is correct:

for i in range(10):
    def callback():
        print "clicked button", i
    UI.Button("button %s" % i, callback)

Each button can hold a distinct callback function which will display different value of i. I can provide an example to show this:

x=[]
for i in range(10):
    def callback():
        print(i)
    x.append(callback) 

If we execute x[7]() we'll get 7 as expected, and x[9]() will gives 9, another value of i.

6 MisterMiyagi Dec 15 2018 at 19:09

TLDR: Define-time defaults are consistent and strictly more expressive.


Defining a function affects two scopes: the defining scope containing the function, and the execution scope contained by the function. While it is pretty clear how blocks map to scopes, the question is where def <name>(<args=defaults>): belongs to:

...                           # defining scope
def name(parameter=default):  # ???
    ...                       # execution scope

The def name part must evaluate in the defining scope - we want name to be available there, after all. Evaluating the function only inside itself would make it inaccessible.

Since parameter is a constant name, we can "evaluate" it at the same time as def name. This also has the advantage it produces the function with a known signature as name(parameter=...):, instead of a bare name(...):.

Now, when to evaluate default?

Consistency already says "at definition": everything else of def <name>(<args=defaults>): is best evaluated at definition as well. Delaying parts of it would be the astonishing choice.

The two choices are not equivalent, either: If default is evaluated at definition time, it can still affect execution time. If default is evaluated at execution time, it cannot affect definition time. Choosing "at definition" allows expressing both cases, while choosing "at execution" can express only one:

def name(parameter=defined):  # set default at definition time
    ...

def name(parameter=default):     # delay default until execution time
    parameter = default if parameter is None else parameter
    ...
4 PrzemekD Jan 03 2019 at 14:38

Every other answer explains why this is actually a nice and desired behavior, or why you shouldn't be needing this anyway. Mine is for those stubborn ones who want to exercise their right to bend the language to their will, not the other way around.

We will "fix" this behavior with a decorator that will copy the default value instead of reusing the same instance for each positional argument left at its default value.

import inspect
from copy import copy

def sanify(function):
    def wrapper(*a, **kw):
        # store the default values
        defaults = inspect.getargspec(function).defaults # for python2
        # construct a new argument list
        new_args = []
        for i, arg in enumerate(defaults):
            # allow passing positional arguments
            if i in range(len(a)):
                new_args.append(a[i])
            else:
                # copy the value
                new_args.append(copy(arg))
        return function(*new_args, **kw)
    return wrapper

Now let's redefine our function using this decorator:

@sanify
def foo(a=[]):
    a.append(5)
    return a

foo() # '[5]'
foo() # '[5]' -- as desired

This is particularly neat for functions that take multiple arguments. Compare:

# the 'correct' approach
def bar(a=None, b=None, c=None):
    if a is None:
        a = []
    if b is None:
        b = []
    if c is None:
        c = []
    # finally do the actual work

with

# the nasty decorator hack
@sanify
def bar(a=[], b=[], c=[]):
    # wow, works right out of the box!

It's important to note that the above solution breaks if you try to use keyword args, like so:

foo(a=[4])

The decorator could be adjusted to allow for that, but we leave this as an exercise for the reader ;)