[discuss] Out of IOMMU space
Eugene BT
eugene.devilliers at btinternet.com
Thu Sep 8 18:28:42 CEST 2005
Thanks for the reply Zach,
With the stock Suse9.3 kernel the machine wouldnt boot with the memory
hole set to anything but default, 1280MB, so that is what it has been
ever since.
I have changed it to 256MB now and also set MTRR to discrete. Dont know
how these changes wil impact the situation. The minimum setting for the
apperture in the BIOS is 64MB.
Searching the error string on google revealed this interesting post on
the nvnews forums in reply to a similar error query:
"unfortunately this isn't an easy problem to fix. the kernel doesn't
provide a way to allocate memory with a physical address < 32-bits, so
any memory > 32-bits has to be remapped through the IOMMU. The IOMMU is
a limited resource (probably about 64 - 128 Megs).
When workstation apps load large data sets, our driver ends up
allocating a lot of memory to handle that data and exhausts the IOMMU.
we've worked pretty hard to make sure our failure paths work correctly,
but we often see other kernel subsystems fail their allocations as a
result. they don't always handle the failure very well. in the output
you posted, it looks like a disk driver is failing.
unfortunately, I don't know what to tell you about working around this
problem. we're investigating ways to avoid these situations in our
driver, but don't have anything yet. the best short-term solution might
be to reduce your system memory to < 4 Gigs of memory (which really sucks)"
(post date 6/3/05) If this is the same issue (graphics driver), then I
am a bit worried, since I habitually load huge displays into memory.
I will give recompiling the kernel with leak detection a try and let you
know what happens.
Eugene
Zachary Amsden wrote:
> Eugene BT wrote:
>
>> I should probably try IOMMU=off at boot time, but I have no idea how
>> this would affect the system. Also, the intermittency and severity of
>> the crashes mean experimentation is quite costly. So my question:
>> does anyone know what might be causing this error? (hardware? nForce4
>> drivers?) Is there anything I can do to prevent the crashes without
>> reducing performance significantly?
>
>
> Sounds scary. Does not sound like a hardware problem; sounds like
> something is slowly leaking DMA-able pages in the IOMMU area, causing
> you to eventually run out of space at a predictable interval, then crash.
>
> Try compiling with debug option CONFIG_IOMMU_LEAK and boot with
> iommu=leak to get some idea of what it might be. Also, you can try
> reducing the AGP aperature size in the BIOS to something tiny; 4meg
> should cause the leak to happen faster, and now you'll have a leak
> trace showing hopefully what driver went bad. What is your aperature
> size now?
>
> I assume you searched around already and didn't find anything, but a
> full hardware list might point someone in the direction of a known
> buggy driver.
>
> Zach
>
More information about the discuss
mailing list