Pourquoi la fonction gets est-elle si dangereuse qu'elle ne devrait pas être utilisée?

Nov 08 2009

Lorsque j'essaye de compiler du code C qui utilise la gets()fonction avec GCC, j'obtiens cet avertissement:

(.text + 0x34): avertissement: la fonction `gets 'est dangereuse et ne doit pas être utilisée.

Je me souviens que cela a quelque chose à voir avec la protection et la sécurité de la pile, mais je ne sais pas exactement pourquoi.

Comment puis-je supprimer cet avertissement et pourquoi existe-t-il un tel avertissement concernant l'utilisation gets()?

Si gets()c'est si dangereux, pourquoi ne pouvons-nous pas le supprimer?

Réponses

188 ThomasOwens Nov 08 2009 at 01:56

Pour pouvoir l'utiliser en getstoute sécurité, vous devez savoir exactement combien de caractères vous allez lire, afin de pouvoir rendre votre tampon suffisamment grand. Vous ne le saurez que si vous savez exactement quelles données vous lirez.

Au lieu d'utiliser gets, vous voulez utiliser fgets, qui a la signature

char* fgets(char *string, int length, FILE * stream);

( fgets, s'il lit une ligne entière, laissera le '\n'dans la chaîne; vous devrez vous en occuper.)

Il est resté une partie officielle du langage jusqu'à la norme ISO C 1999, mais il a été officiellement supprimé par la norme 2011. La plupart des implémentations C le prennent toujours en charge, mais au moins gcc émet un avertissement pour tout code qui l'utilise.

176 JonathanLeffler Nov 30 2010 at 08:51

Pourquoi est gets()dangereux

Le premier ver Internet (le Morris Internet Worm ) s'est échappé il y a environ 30 ans (1988-11-02), et il a utilisé gets()un débordement de tampon comme l'une de ses méthodes de propagation d'un système à un autre. Le problème de base est que la fonction ne sait pas quelle est la taille du tampon, elle continue donc à lire jusqu'à ce qu'elle trouve une nouvelle ligne ou rencontre EOF, et peut dépasser les limites du tampon qui lui a été donné.

Vous devriez oublier que vous avez déjà entendu cela gets().

La norme C11 ISO / CEI 9899: 2011 éliminée en gets()tant que fonction standard, qui est A Good Thing ™ (elle a été formellement marquée comme `` obsolescent '' et `` déconseillée '' dans ISO / CEI 9899: 1999 / Cor.3: 2007 - Rectificatif technique 3 pour C99, puis supprimé dans C11). Malheureusement, il restera dans les bibliothèques pendant de nombreuses années (ce qui signifie «décennies») pour des raisons de compatibilité ascendante. Si cela ne tenait qu'à moi, la mise en œuvre de gets()deviendrait:

char *gets(char *buffer)
{
    assert(buffer != 0);
    abort();
    return 0;
}

Étant donné que votre code plantera de toute façon, tôt ou tard, il vaut mieux éviter le problème le plus tôt possible. Je serais prêt à ajouter un message d'erreur:

fputs("obsolete and dangerous function gets() called\n", stderr);

Les versions modernes du système de compilation Linux génèrent des avertissements si vous liez gets()- et aussi pour certaines autres fonctions qui ont également des problèmes de sécurité ( mktemp(),…).

Alternatives à gets()

fgets ()

Comme tout le monde l'a dit, l'alternative canonique à gets()est de fgets()spécifier stdincomme flux de fichiers.

char buffer[BUFSIZ];

while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
    ...process line of data...
}

Ce que personne d'autre n'a encore mentionné, c'est que cela gets()n'inclut pas la nouvelle ligne mais le fgets()fait. Ainsi, vous devrez peut-être utiliser un wrapper fgets()qui supprime la nouvelle ligne:

char *fgets_wrapper(char *buffer, size_t buflen, FILE *fp)
{
    if (fgets(buffer, buflen, fp) != 0)
    {
        size_t len = strlen(buffer);
        if (len > 0 && buffer[len-1] == '\n')
            buffer[len-1] = '\0';
        return buffer;
    }
    return 0;
}

Ou mieux:

char *fgets_wrapper(char *buffer, size_t buflen, FILE *fp)
{
    if (fgets(buffer, buflen, fp) != 0)
    {
        buffer[strcspn(buffer, "\n")] = '\0';
        return buffer;
    }
    return 0;
}

De plus, comme le fait remarquer caf dans un commentaire et paxdiablo le montre dans sa réponse, fgets()il se peut que vous ayez des données sur une ligne. Mon code wrapper laisse ces données à lire la prochaine fois; vous pouvez facilement le modifier pour engloutir le reste de la ligne de données si vous préférez:

        if (len > 0 && buffer[len-1] == '\n')
            buffer[len-1] = '\0';
        else
        {
             int ch;
             while ((ch = getc(fp)) != EOF && ch != '\n')
                 ;
        }

Le problème résiduel est de savoir comment rapporter les trois états de résultat différents - EOF ou erreur, ligne lue et non tronquée et ligne partielle lue mais les données ont été tronquées.

Ce problème ne se pose pas gets()car il ne sait pas où votre tampon se termine et piétine joyeusement au-delà de la fin, faisant des ravages sur votre disposition de mémoire magnifiquement entretenue, perturbant souvent la pile de retour (un débordement de pile ) si le tampon est alloué sur la pile, ou le piétinement des informations de contrôle si le tampon est alloué dynamiquement, ou la copie de données sur d'autres précieuses variables globales (ou de module) si le tampon est alloué statiquement. Rien de tout cela n'est une bonne idée - ils incarnent l'expression «comportement indéfini».


Il existe également le TR 24731-1 (Rapport technique du Comité de normalisation C) qui propose des alternatives plus sûres à une variété de fonctions, notamment gets():

§6.5.4.1 La gets_sfonction

Synopsis

#define __STDC_WANT_LIB_EXT1__ 1
#include <stdio.h>
char *gets_s(char *s, rsize_t n);

Contraintes d'exécution

sne doit pas être un pointeur nul. nne doit être ni égal à zéro ni supérieur à RSIZE_MAX. Un caractère de nouvelle ligne, une erreur de fin de fichier ou de lecture doit se produire dans les n-1caractères de lecture de stdin. 25)

3 If there is a runtime-constraint violation, s[0] is set to the null character, and characters are read and discarded from stdin until a new-line character is read, or end-of-file or a read error occurs.

Description

4 The gets_s function reads at most one less than the number of characters specified by n from the stream pointed to by stdin, into the array pointed to by s. No additional characters are read after a new-line character (which is discarded) or after end-of-file. The discarded new-line character does not count towards number of characters read. A null character is written immediately after the last character read into the array.

5 If end-of-file is encountered and no characters have been read into the array, or if a read error occurs during the operation, then s[0] is set to the null character, and the other elements of s take unspecified values.

Recommended practice

6 The fgets function allows properly-written programs to safely process input lines too long to store in the result array. In general this requires that callers of fgets pay attention to the presence or absence of a new-line character in the result array. Consider using fgets (along with any needed processing based on new-line characters) instead of gets_s.

25) The gets_s function, unlike gets, makes it a runtime-constraint violation for a line of input to overflow the buffer to store it. Unlike fgets, gets_s maintains a one-to-one relationship between input lines and successful calls to gets_s. Programs that use gets expect such a relationship.

The Microsoft Visual Studio compilers implement an approximation to the TR 24731-1 standard, but there are differences between the signatures implemented by Microsoft and those in the TR.

The C11 standard, ISO/IEC 9899-2011, includes TR24731 in Annex K as an optional part of the library. Unfortunately, it is seldom implemented on Unix-like systems.


getline() — POSIX

POSIX 2008 also provides a safe alternative to gets() called getline(). It allocates space for the line dynamically, so you end up needing to free it. It removes the limitation on line length, therefore. It also returns the length of the data that was read, or -1 (and not EOF!), which means that null bytes in the input can be handled reliably. There is also a 'choose your own single-character delimiter' variation called getdelim(); this can be useful if you are dealing with the output from find -print0 where the ends of the file names are marked with an ASCII NUL '\0' character, for example.

23 Jack Nov 08 2009 at 02:03

Because gets doesn't do any kind of check while getting bytes from stdin and putting them somewhere. A simple example:

char array1[] = "12345";
char array2[] = "67890";

gets(array1);

Now, first of all you are allowed to input how many characters you want, gets won't care about it. Secondly the bytes over the size of the array in which you put them (in this case array1) will overwrite whatever they find in memory because gets will write them. In the previous example this means that if you input "abcdefghijklmnopqrts" maybe, unpredictably, it will overwrite also array2 or whatever.

The function is unsafe because it assumes consistent input. NEVER USE IT!

17 paxdiablo Nov 30 2010 at 08:56

You should not use gets since it has no way to stop a buffer overflow. If the user types in more data than can fit in your buffer, you will most likely end up with corruption or worse.

In fact, ISO have actually taken the step of removing gets from the C standard (as of C11, though it was deprecated in C99) which, given how highly they rate backward compatibility, should be an indication of how bad that function was.

The correct thing to do is to use the fgets function with the stdin file handle since you can limit the characters read from the user.

But this also has its problems such as:

  • extra characters entered by the user will be picked up the next time around.
  • there's no quick notification that the user entered too much data.

To that end, almost every C coder at some point in their career will write a more useful wrapper around fgets as well. Here's mine:

#include <stdio.h>
#include <string.h>

#define OK       0
#define NO_INPUT 1
#define TOO_LONG 2
static int getLine (char *prmpt, char *buff, size_t sz) {
    int ch, extra;

    // Get line with buffer overrun protection.
    if (prmpt != NULL) {
        printf ("%s", prmpt);
        fflush (stdout);
    }
    if (fgets (buff, sz, stdin) == NULL)
        return NO_INPUT;

    // If it was too long, there'll be no newline. In that case, we flush
    // to end of line so that excess doesn't affect the next call.
    if (buff[strlen(buff)-1] != '\n') {
        extra = 0;
        while (((ch = getchar()) != '\n') && (ch != EOF))
            extra = 1;
        return (extra == 1) ? TOO_LONG : OK;
    }

    // Otherwise remove newline and give string back to caller.
    buff[strlen(buff)-1] = '\0';
    return OK;
}

with some test code:

// Test program for getLine().

int main (void) {
    int rc;
    char buff[10];

    rc = getLine ("Enter string> ", buff, sizeof(buff));
    if (rc == NO_INPUT) {
        printf ("No input\n");
        return 1;
    }

    if (rc == TOO_LONG) {
        printf ("Input too long\n");
        return 1;
    }

    printf ("OK [%s]\n", buff);

    return 0;
}

It provides the same protections as fgets in that it prevents buffer overflows but it also notifies the caller as to what happened and clears out the excess characters so that they do not affect your next input operation.

Feel free to use it as you wish, I hereby release it under the "do what you damn well want to" licence :-)

14 ThiagoSilveira Nov 30 2010 at 08:28

fgets.

To read from the stdin:

char string[512];

fgets(string, sizeof(string), stdin); /* no buffer overflows here, you're safe! */
9 GerdKlima Nov 08 2009 at 01:58

You can't remove API functions without breaking the API. If you would, many applications would no longer compile or run at all.

This is the reason that one reference gives:

Reading a line that overflows the array pointed to by s results in undefined behavior. The use of fgets() is recommended.

5 pmg Nov 08 2009 at 02:21

I read recently, in a USENET post to comp.lang.c, that gets() is getting removed from the Standard. WOOHOO

You'll be happy to know that the committee just voted (unanimously, as it turns out) to remove gets() from the draft as well.

5 YuHao Oct 06 2013 at 13:15

In C11(ISO/IEC 9899:201x), gets() has been removed. (It's deprecated in ISO/IEC 9899:1999/Cor.3:2007(E))

In addition to fgets(), C11 introduces a new safe alternative gets_s():

C11 K.3.5.4.1 The gets_s function

#define __STDC_WANT_LIB_EXT1__ 1
#include <stdio.h>
char *gets_s(char *s, rsize_t n);

However, in the Recommended practice section, fgets() is still preferred.

The fgets function allows properly-written programs to safely process input lines too long to store in the result array. In general this requires that callers of fgets pay attention to the presence or absence of a new-line character in the result array. Consider using fgets (along with any needed processing based on new-line characters) instead of gets_s.

5 AradhanaMohanty Aug 22 2017 at 16:19

gets() is dangerous because it is possible for the user to crash the program by typing too much into the prompt. It can't detect the end of available memory, so if you allocate an amount of memory too small for the purpose, it can cause a seg fault and crash. Sometimes it seems very unlikely that a user will type 1000 letters into a prompt meant for a person's name, but as programmers, we need to make our programs bulletproof. (it may also be a security risk if a user can crash a system program by sending too much data).

fgets() allows you to specify how many characters are taken out of the standard input buffer, so they don't overrun the variable.

3 user3717661 May 01 2016 at 08:00

The C gets function is dangerous and has been a very costly mistake. Tony Hoare singles it out for specific mention in his talk "Null References: The Billion Dollar Mistake":

http://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare

The whole hour is worth watching but for his comments view from 30 minutes on with the specific gets criticism around 39 minutes.

Hopefully this whets your appetite for the whole talk, which draws attention to how we need more formal correctness proofs in languages and how language designers should be blamed for the mistakes in their languages, not the programmer. This seems to have been the whole dubious reason for designers of bad languages to push the blame to programmers in the guise of 'programmer freedom'.

2 SteveSummit Apr 01 2016 at 04:52

I would like to extend an earnest invitation to any C library maintainers out there who are still including gets in their libraries "just in case anyone is still depending on it": Please replace your implementation with the equivalent of

char *gets(char *str)
{
    strcpy(str, "Never use gets!");
    return str;
}

This will help make sure nobody is still depending on it. Thank you.