The TORCS Racing Board
Username: Password: Remember Me?
Lost Password Register
Author: jisham | Created: 2016-11-17 19:36:06
Subject: dirt-2 hang?
Anyone else having issues with torcs hanging at random, but apparently somewhat repeatable times?

during testing for this race I would get 10-20 laps in, then it would hang. Recent attempts to run the race seem to get 48 laps in.

running torcs -d yields no information. The process is hung and doesn't fall back to the debugger. Is there a kill/signal I can send to force it back to debugger for more info?

At first I tried eliminating drivers, didn't get down to 1, but didn't seem to make any difference.

Then I suspected audio/al, and turned off all sound, no improvement.

I tried it on my laptop. This is a slower machine with a low framerate, so I let it run unattended. after a while it also appears to have hung, but I can't tell how far in since the screensaver kicked in and now the window won't update. I can repeat with the screensaver disabled if there is useful info to be gained from this. It also appears that torcs is consuming 100% cpu on this machine. I didn't check this on the others.

I also tried on a 64bit linux machine, but a separate issue led to my qualification laps being invalidated with a 0 / "from-scratch" quali time. I didn't want to troubleshoot this at the same time, so I abandoned this machine.

None of these machines wants to run on a remoted x session to my main linux machine. I think this is GL not being able to grab a visual, I assume as it needs direct HW support/accel, not available in a remote x session.

I'm just curious if this is a known issue and if there is an easy fix? If not, I might not be able to submit a result, and can hopefully find a fix before next round.

I haven't rebooted my main linux box yet, need to find a convenient time relative to other tasks as it's my main work workstation. I suspect this might not help as other machines are showing the same behavior.

Any info/actions I can send/do to help?

thanks,

-jdi
Last Edited: 2016-11-17 19:36:06 by jisham
    Author: firechief | Created: 2016-11-18 01:55:20
    Subject: Re: dirt-2 hang?
    I remember last year that W-D had a problem with the berniws causing a hang on this track. Try running it without that robot and see if it helps?

    I've had no problem running the race on Windows.
    Last Edited: 2016-11-18 01:55:20 by firechief
      Author: jisham | Created: 2016-11-18 13:39:05
      Subject: Re: dirt-2 hang?
      Thanks, will try that.

      I was eliminating drivers, but I think I gave up before I got to berniew.

      Never could correlate it to an event...

      Will try this and see.

      -jdi
      Last Edited: 2016-11-18 13:39:05 by jisham
        Author: wdbee | Created: 2016-11-18 13:49:42
        Subject: Re: dirt-2 hang?
        The berniw is known for issues like that, have a check at the splines code, infinite looping in the while condition:


        /* compute the y value for a given z */
        double spline(
        int dim,
        double z,
        const double *const x,
        const double *const y,
        const double *const ys
        )
        {
        int i, a, b;
        double t, a0, a1, a2, a3, h;

        a = 0; b = dim-1;
        do {
        i = (a + b) / 2;
        if (x[i] <= z) {
        a = i;
        } else {
        b = i;
        }
        } while ((a + 1) != b);
        i = a;
        h = x[i+1] - x[i];
        t = (z-x[i]) / h;
        a0 = y[i];
        a1 = y[i+1] - a0;
        a2 = a1 - h*ys[i];
        a3 = h * ys[i+1] - a1;
        a3 -= a2;
        return a0 + (a1 + (a2 + a3*t) * (t-1.0))*t;
        }

        Test:
        a > b is not allowed here

        so replace the condition by

        while ((a + 1) < b);

        This helped in the past.

        Wolf-Dieter
        Last Edited: 2016-11-18 13:50:04 by wdbee
    Author: dummy | Created: 2016-11-18 16:21:38
    Subject: Re: dirt-2 hang?
    I'm running 64 bit Linux and always start torcs -d. And indeed during testing torcs was hanging and when I killed the torcs process the backtrace of gdb appeared. In this case it was in dandroid and the indication was that 'dist from start' was 'not a number'. This should be a bug in torcs, but it never happened again since.

    Obviously you have to build with --enable-debug for this to work ;)
    Last Edited: 2016-11-18 16:21:38 by dummy
      Author: jisham | Created: 2016-11-18 18:23:45
      Subject: Re: dirt-2 hang?
      All good suggestions. I might not have a chance to try them until Monday, but I will try them all.

      Thanks, everyone!

      -jdi
      Last Edited: 2016-11-18 18:23:45 by jisham
        Author: wdbee | Created: 2016-11-19 11:36:59
        Subject: Re: dirt-2 hang?
        Trying to run the race I had the same issue here, so it is not the berniw this time, because I used the fixed version.

        Running it on another computer it worked(, but may be only because nearly all drivers are out of race after a short number of laps)

        Cheers

        Wolf-Dieter
        Last Edited: 2016-11-19 11:36:59 by wdbee
          Author: dummy | Created: 2016-11-19 12:31:12
          Subject: Re: dirt-2 hang?
          Looks like we have 3 results already then. As long as it's more than last year it's good :)
          Last Edited: 2016-11-19 12:31:12 by dummy
    Author: jisham | Created: 2016-11-22 13:37:25
    Subject: Re: dirt-2 hang?
    Well, I seem to have missed the submission deadline, but that's probably a good thing given the difficulty I had in getting a reliable result.

    I rebuilt torcs with the --enable-debug, but it didn't seem to add any information to the crashes. I patched berniew and was able to run the race with them. I'm not clear on if they failed without the patch (too many re-runs and not enough documentation).

    The simplest result I got to work was by excluding wdbee_robotics and wdbee_2016 teams. I was then able to run the race successfully. I have the results if anyone is interested, but I guess now they are just an historical anecdote.

    Is there anything further I can do to provide useful debug information? Even with the --enable-debug and running torcs in debug mode I was unable to isolate the source of the hang.

    -jdi
    Last Edited: 2016-11-22 13:37:25 by jisham
    Author: jisham | Created: 2016-11-23 14:26:22
    Subject: Re: dirt-2 hang?
    Bringing it back to this thread rather than the race results thread...

    built torcs with --enable-debug, ran as -d with full field, and when it hung, I killed the process and here's the back trace:

    Program received signal SIGTERM, Terminated.
    compute_det () at Convex.cpp:105
    105 det[15][2] = det[11][0] * (dp[0][0] - dp[0][2]) +
    (gdb) #0 compute_det () at Convex.cpp:105
    ---Type <return> to continue, or q <return> to quit---#1 0xb4734b12 in closest (v=...) at Convex.cpp:163
    #2 closest_points (a=..., b=..., a2w=..., b2w=..., pa=..., pb=...) at Convex.cpp:425
    #3 0xb47381a6 in prev_closest_points (a=..., b=..., v=..., pa=..., pb=...) at Object.cpp:155
    #4 0xb472f0ef in object_test (e=...) at C-api.cpp:327
    #5 0xb472f334 in dtTest () at C-api.cpp:359
    #6 0xb472e29d in SimCarCollideCars (s=0x817a0e8) at collide.cpp:725
    #7 0xb472684a in SimUpdate (s=0x817a0e8, deltaTime=0.002, telemetry=52) at simu.cpp:398
    #8 0xb7eb3760 in ReOneStep (deltaTimeIncrement=0.002) at raceengine.cpp:657
    #9 0xb7eb3b38 in ReUpdate () at raceengine.cpp:727
    #10 0xb7eb1455 in ReStateManage () at racestate.cpp:97
    #11 0xb7eb16e4 in reDisplay () at racegl.cpp:52
    #12 0xb7a9751c in ?? () from /usr/lib/i386-linux-gnu/libglut.so.3
    #13 0xb7a9b04f in fgEnumWindows () from /usr/lib/i386-linux-gnu/libglut.so.3
    #14 0xb7a97a5e in glutMainLoopEvent () from /usr/lib/i386-linux-gnu/libglut.so.3
    #15 0xb7a982ac in glutMainLoop () from /usr/lib/i386-linux-gnu/libglut.so.3
    #16 0x08048ffa in main (argc=7, argv=0xbffff534) at main.cpp:134
    (gdb) quit
    A debugging session is active.

    Inferior 1 [process 13541] will be killed.

    Quit anyway? (y or n) [answered Y; input not from terminal]
    racer@jisham:~/torcs$


    At the time, I had locked my screen with a screensaver and it was hung when I came back and unlocked the screen.

    So after all that, it is looking like the problem was on my end, and perhaps my GL install screwy. This machine probably needs a reboot anyway, just never seems to be a convenient time for it...

    -jdi

    Last Edited: 2016-11-23 14:26:22 by jisham
      Author: dummy | Created: 2016-11-23 16:53:16
      Subject: Re: dirt-2 hang?
      >So after all that, it is looking like the problem was on my end

      Not at all, there is no problem with your system!

      The problem seems to be that there is not one but two 'while' expressions in Convex.cpp:closest_points() and it's under certain circumstances not finding a way out.

      Thanks for the backtrace. I guess Bernhard will have some idea how to prevent this from happening.

      Danny
      Last Edited: 2016-11-23 16:53:16 by dummy
        Author: wdbee | Created: 2016-11-23 19:13:14
        Subject: Re: dirt-2 hang?
        Well I found the NAN bug here as well:

        Running the wdbee_robotics alone at Dirt-2 the pCar->_distFromStart is NAN resulting in an access violation in the robot.

        Debugging the call stack I see, that there are a lot of NANs in the TORCS segment, all DynGC values, speed, toStart, toRight, toMiddle, to Left.

        The segment name is "s9", center is (x349,99997;y15.000051;z0).

        The robot was driving while learning, means the tracks was working the learning loops before and the bug raised "randomly" just in this 3 lap session.

        Have a screenshot and will send it by private mail.

        Cheers

        Wolf-Dieter


        Edit: Just got the same NAN error again, same segment S9, but car setup meanwhile is different (it's learning). So somethings seems to be wrong just at this segment.

        Edit2:
        The issue starts at call of

        static void SimCarWallCollideResponse(void *clientdata, DtObjectRef obj1, DtObjectRef obj2, const DtCollData *collData)

        Here all the points in collData are NAN if the issue raises later.
        Checking (isnan für all points) and just return in case of a NAN contained helps.
        Last Edited: 2016-11-27 15:30:40 by wdbee
          Author: wdbee | Created: 2016-11-27 15:31:51
          Subject: Re: dirt-2 hang?
          Stopped debugging at this Point

          Cheers

          Wolf-Dieter
          Last Edited: 2016-11-27 15:31:51 by wdbee