Age | Commit message (Collapse) | Author |
|
No performance difference proven at 95% confidence with my GLSL demo (n=10).
|
|
This saves mapping the index buffer to get a bounds on the indices that
drivers just drop on the floor in the VBO case (cache win), saves a bonus
walk of the indices in the CheckArrayBounds case, and other miscellaneous
validation. On intel it's a particularly a large win (50-100% in my app)
because even though we let the indices stay in both CPU and GPU caches, we
still end up waiting for the GPU to be done with the buffer before reading
from it.
Drivers that want the min/max_index fields must now check index_bounds_valid
and use vbo_get_minmax_index before using them.
|
|
This state flag has been unused since the ffvertex_prog move to core.
|
|
|
|
This can improve debugging with INTEL_DEBUG=batch,sync by giving smaller
batchbuffers.
|
|
I keep wanting to hack this knob in as a one-time thing, so it seemed useful
to have all the time.
|
|
The i965 hardware cannot do GL_CLAMP behavior on textures; an earlier
commit forced a software fallback if strict conformance was required
(i.e. the INTEL_STRICT_CONFORMANCE environment variable was set) and
2D textures were used, but it was somewhat flawed - it could trigger
the software fallback even if 2D textures weren't enabled, as long
as one texture unit was enabled.
This fixes that, and adds software fallback for GL_CLAMP behavior with
1D and 3D textures.
It also adds support for a particular setting of the INTEL_STRICT_CONFORMANCE
environment variable, which forces software fallbacks to be taken *all*
the time. This is helpful with debugging. The value is:
export INTEL_STRICT_CONFORMANCE=2
|
|
i965 doesn't natively support GL_CLAMP; it treats it like
GL_CLAMP_TO_EDGE, which fails conformance tests.
This fix adds a clause to the check_fallbacks() test to check
whether GL_CLAMP is in use on any enabled 2D texture. If so,
and if strict conformance is required (via INTEL_STRICT_CONFORMANCE),
a software fallback is mandated.
In addition, validate textures *before* checking for fallbacks,
rather than after; otherwise, the texture state is never validated
and can't be trusted. (In particular, if texturing is enabled and
the sampler would access any level beyond level 0 of a texture, the
sampler will segfault, because texture validation sets the firstLevel
and lastLevel fields of a texture object so that the valid levels
will be mapped and accessed correctly. If texture validation doesn't
occur, only level 0 is accessed correctly, and that only because
firstLevel and lastLevel happen to be set to 0.)
|
|
When doing line stipple, the stipple count resets on each line segment,
unless the primitive is a GL_LINE_LOOP or a GL_LINE_STRIP.
The existing code correctly identifies the need for a software fallback
to handle conformant line stipple on GL_LINE_LOOP primitives, but
neglects to make the same assessment on GL_LINE_STRIP primitives.
This fixes it so they match.
|
|
|
|
|
|
This got flipped around in 7855b2aef6bd9e9c2d73260b5cd166159b2525c6.
Bug #18907. Thanks to idr for pointing me at a nicer testcase than blender.
|
|
Later primitives, even if they caused a full state validate, wouldn't check
that there was enough space in the batchbuffer, occasionally triggering the
sanity check. We also skipped the aperture space check, even if it would
mean bringing in new programs and associated state.
|
|
This was a regression in 59b2c2adbbece27ccf54e58b598ea29cb3a5aa85 that broke
blender, among other apps.
|
|
Previously, since my check_aperture API change, we would check each piece of
state against the batchbuffer individually, but not all the state against the
batchbuffer at once. In addition to not being terribly useful in assuring
success, it probably also increased CPU load by calling check_aperture many
times per primitive.
|
|
This avoids issues with dereferencing stale cliprects around intel_draw_buffer
time. Additionally, take advantage of cliprects staying constant for FBOs and
DRI2, and emit cliprects in the batchbuffer instead of having to flush batch
each time they change.
|
|
|
|
|
|
This isn't required for GEM (at least, yet), but the check_aperture code
for non-GEM results in batch getting flushed during emit. brw_state_upload
restarts state emits, but a bunch of the state emit functions were assuming
that they would be called exactly once, after prepare and before new_batch.
Bug #17179.
|
|
Makefile.template
|
|
|
|
This reverts commit 7c81124d7c4a4d1da9f48cbf7e82ab1a3a970a7a.
|
|
This reverts commit 53675e5c05c0598b7ea206d5c27dbcae786a2c03.
Conflicts:
src/mesa/drivers/dri/i965/brw_wm_surface_state.c
|
|
To do this, I had to clean up some of 965 state upload stuff. We may end
up over-emitting state in the aperture overflow case, but that should be rare,
and I'd rather have the simplification of state management.
|
|
|
|
Negative value means other errors, not aperture overflow. fix bug #15752
|
|
Makes state emission into a 2 phase, prepare sets things up and accounts
the size of all referenced buffer objects. The emit stage then actually
does the batchbuffer touching for emitting the objects.
There is an assert in dri_emit_reloc if a reloc occurs for a buffer
that hasn't been accounted yet.
|
|
GL_LINE_STRIP"
There is no information in GS to determinate when to reset line stipple count, still fallback to software
This reverts commit 5a0314b431ab147c6156c3011f4cb54161ba4b25.
|
|
|
|
This helps us avoid a bunch of mess with gl_client_arrays that we filled
with unused data and confused readers.
|
|
|
|
Otherwise, we could choose to upload into the temporary VBO that we just fired
off to the hardware. Good for a 60% OA performance improvement.
|
|
The previous change gave us only two modes, one which looped over the batch
per cliprect (3d drawing) and one that didn't (state updeast).
However, we really want 4:
- Batch doesn't care about cliprects (state updates)
- Batch needs DRAWING_RECTANGLE looping per cliprect (3d drawing)
- Batch needs to be executed just once (region fills, copies, etc.)
- Batch already includes cliprect handling, and must be flushed by unlock time
(copybuffers, clears).
All callers should now be fixed to use one of these states for any batchbuffer
emits. Thanks to Keith Whitwell for pointing out the failure.
|
|
The comment about (vbo)_exec_api.c appeared to be stale, as the VBO code seems
to only use non-named VBOs (not actual VBOs) or freshly-allocated VBO data.
This brings a 2x speedup to openarena, because we can submit nearly-full
batchbuffers instead of many 450-byte ones.
|
|
In particular, batch buffers are no longer flushed when switching from
CLIPRECTS to NO_CLIPRECTS or vice versa, and 965 just uses DRM cliprect
handling for primitives instead of trying to sneak in its own to avoid the
DRM stuff. The disadvantage is that we will re-execute state updates per
cliprect, but the advantage is that we will be able to accumulate larger
batch buffers, which were proving to be a major overhead.
|
|
|
|
965 gains fixed TTM typing of the buffer object buffers and unused PBO
functions, and 915 gains buffer size == 0 fixes from 965.
|
|
Putting the bufmgr in the screen is not thread-safe since the emit_reloc
changes. It also led to a significant performance hit from pthread usage
for the attempted thread-safety (up to 12% of a cpu spent on refcounting
protection in single-threaded 965). The motivation had been to allow
multi-context bufmgr sharing in classic mode, but it wasn't worth the cost.
|
|
This is currently believed to work but be a significant performance loss.
Performance recovery should be soon to follow.
The dri_bo_fake_disable_backing_store() call was added to allow backing store
disable like bufmgr_fake.c did, which is a significant performance win (though
it's missing the no-fence-subdata part).
This commit is a squash merge of the 965-ttm branch, which had some history
I wanted to avoid pulling due to noisiness and brokenness at many points
for git-bisecting.
|
|
|
|
This code existed to dump logs of hardware access to be replayed in simulation.
Since we have real hardware now, it's not really needed.
|
|
some 3D programs such as pymol work well.
|
|
|
|
|
|
call swsetup_Wakeup before falling back to software rendering
|
|
|
|
|
|
|
|
|
|
passing them to the kernel. This works because all drawing commands
in the 965 driver are emitted with the lock held and the batchbuffer
is always flushed prior to releasing the lock. This allows multiple
cliprects to be dealt with, without replaying entire batchbuffers and
redundantly re-emitting state.
|