Jeremy Drake

I am running into issues with UDP packets indicated to the ALE_CONNECT layer. I am trying to do Out-Of-Band decisions on whether to allow or block them. It seems that for the most part, I can clone the NET_BUFFER_LIST, pend the operation, and later complete the operation, and re-inject the NET_BUFFER_LIST using WfpsInjectTransportSendAsync0. However, on some UDP packets (I reproducably get this with packets destined for port bootps from svchost.exe) when I call the FwpsCompleteOperation0 function, when my callout is called again with the IS_REAUTHORIZE flag, the layerData paramter is NULL. This is OK for me, I already have to deal with NULL layerData for TCP, but at some point between when my callout returns and the FwpsCompleteOperation0 function returns the machine will bluescreen with TCPIP.sys trying to write to a NULL pointer.

I have tried removing the Pend/Continue for UDP traffic and only BLOCKing with ABSORB in the ClassifyFn, and Reinjecting into the transport layer, but in this case the UDP packets do not seem to get out.

Do I need to call NdisRetreatNetBufferStart before cloning the NBL on the CONNECT layer as I figured out from an example in the DDK that I should on the RECV_ACCEPT layer

Thanks for all of your help. This forum is a godsend when the often confusing and incomplete docs fail me.



Re: Windows Filtering Platform (WFP) pending UDP at the ALE_CONNECT layer

Biao Wang [MSFT]

Sounds like you ran into a known bug (which will be corrected in SP1). The bug manifests under the following conditions --

1) ALE_CONNECT triggered by UDP is pended

2) pend is completed

3) duirng completion-triggered re-auth, a filter (not necessary your filter) returns BLOCK.

4) Access Violation with NULL access.

However for UDP, clone-drop-reinject w/o Pend/Complete should work just fine. In your case, did FwpsInjectTransportSendAsync0 succeed If yes, what's the "Status" code value from the NET_BUFFER_LIST (nbl->Status) when your completion function is invoked

You may also consider a workaround for the UDP pend/complete bug -- Instead of inspecting at ALE_CONNECT, you could register your filter at FLOW_ESTABLISHED and DATAGRAM_DATA (DD) and defer inspection there. During the FLOW_ESTABLISHED invocation, you associate a private context with the flow using FwpsFlowAssociateContext0. Once associated, your context will be indicated to DD.

In this private context structure, you initialize a field (say FlowState) to UNKNOWN. From FLOW_ESTABLISHED, you send the classify information (and the packet) up to user mode for authorization and return PERMIT.

From DD, if FlowState is UNKNOWN, you clone and queue up packets (and block the originals). Once user mode code signals its decision back down, you then either re-inject all queued clones back (and set the FlowState to AUTHORIZED), or discard them all (and set the FlowState to BLOCKED).

Hope this helps,

Biao.W.





Re: Windows Filtering Platform (WFP) pending UDP at the ALE_CONNECT layer

Jeremy Drake

For the record, this is what I am seeing:
callout reports to userspace that svchost.exe is trying to send a UDP packet to
255.255.255.255 port bootps. I opt to allow it. I send in the ALE completion
context and net buffer list clone, but when my callout is invoked I see the layerData
parameter is NULL. After my callout returns PERMIT, but before FwpsCompleteOperation0
returns, I get this bugcheck:

Code Snippet

*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 00000000, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000001, value 0 = read operation, 1 = write operation
Arg4: 873bd758, address which referenced memory

Debugging Details:
------------------


OVERLAPPED_MODULE: Address regions for 'MRxVPC' and 'drmk.sys' overlap

WRITE_ADDRESS: 00000000

CURRENT_IRQL: 2

FAULTING_IP:
tcpip!WfpAleDeleteRemoteEndpoint+26
873bd758 890a mov dword ptr [edx],ecx

DEFAULT_BUCKET_ID: CODE_CORRUPTION

BUGCHECK_STR: 0xD1

PROCESS_NAME: FWService.exe

TRAP_FRAME: 91b05a04 -- (.trap 0xffffffff91b05a04)
ErrCode = 00000002
eax=8329c438 ebx=00000000 ecx=00000000 edx=00000000 esi=8329c340 edi=873edf80
eip=873bd758 esp=91b05a78 ebp=91b05a88 iopl=0 nv up ei pl zr na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246
tcpip!WfpAleDeleteRemoteEndpoint+0x26:
873bd758 890a mov dword ptr [edx],ecx ds:0023:00000000=
Resetting default scope

LAST_CONTROL_TRANSFER: from 818ad13f to 81835688

STACK_TEXT:
91b055e4 818ad13f 00000003 91b0ed74 00000000 nt!RtlpBreakWithStatusInstruction
91b05634 818adbac 00000003 00000000 873bd758 nt!KiBugCheckDebugBreak+0x1c
91b059e4 818494d4 0000000a 00000000 00000002 nt!KeBugCheck2+0x5f4
91b059e4 873bd758 0000000a 00000000 00000002 nt!KiTrap0E+0x2ac
91b05a88 873bcd74 8329c340 83a88000 834f1430 tcpip!WfpAleDeleteRemoteEndpoint+0x26
91b05ac8 873bcdc4 c0000022 82f33af0 830b93e8 tcpip!WfpAleHandleSendCompletion+0x78
91b05ae8 873bc84d 834f1430 00000000 83279880 tcpip!WfpAlepAuthorizeSendCompletion+0x2f
91b05b1c 87c891dc 834f1430 82d7c830 91b05b40 tcpip!WfpAleCompleteOperation+0x6a
91b05b38 87c7fe79 834b2da4 834e7194 00000001 fwcore!WfpAleOnUserReply+0x3c [g:\cvs\firewall\src\drivers\fwcore\sys\vista\userask.c @ 71]
91b05b60 87c7f10b 00000004 00000001 834b2da4 fwcore!CH_OnAskComplete+0xe9 [g:\cvs\firewall\src\drivers\fwcore\sys\channels.c @ 1086]
91b05bc0 87c80480 834e7188 00000017 834e7188 fwcore!CM_ioctl+0x2fb [g:\cvs\firewall\src\drivers\fwcore\sys\cmk.c @ 1318]
91b05c10 83b227bd 83279880 830b93e8 833347a0 fwcore!W32API_Dispatch+0x140 [g:\cvs\firewall\src\drivers\fwcore\sys\w32api.c @ 421]
91b05c2c 81867cc9 83279880 830b93e8 830b93e8 ndis!ndisDummyIrpHandler+0x72
91b05c44 819c808b 833347a0 830b93e8 830b9458 nt!IofCallDriver+0x63
91b05c64 819cc7ce 83279880 833347a0 00000000 nt!IopSynchronousServiceTail+0x1e0
91b05d00 81a20abe 83279880 830b93e8 00000000 nt!IopXxxControlFile+0x6b7
91b05d34 818461fa 00000240 00000000 00000000 nt!NtDeviceIoControlFile+0x2a
91b05d34 77b60f34 00000240 00000000 00000000 nt!KiFastCallEntry+0x12a
01a3f098 77b5f850 76767c92 00000240 00000000 ntdll!KiFastSystemCallRet
01a3f09c 76767c92 00000240 00000000 00000000 ntdll!ZwDeviceIoControlFile+0xc
01a3f0fc 06e06e74 00000240 00220190 0371a400 kernel32!DeviceIoControl+0x14a
<removed userspace stack>


STACK_COMMAND: kb

CHKIMG_EXTENSION: !chkimg -lo 50 -d !nt
81837a90-81837a96 7 bytes - nt!KiIdleLoop+18
[ fa 8b 83 8c 1a 00 00:90 e9 39 29 a1 01 cc ]
81838005-8183800a 6 bytes - nt!ExfInterlockedAddUlong+5 (+0x575)
[ fa f0 0f ba 28 00:e9 d5 23 a1 01 cc ]
81838032-81838037 6 bytes - nt!ExfInterlockedInsertHeadList+6 (+0x2d)
[ fa f0 0f ba 2e 00:e9 0e 25 a1 01 cc ]
8183806a-8183806f 6 bytes - nt!ExfInterlockedInsertTailList+6 (+0x38)
[ fa f0 0f ba 2e 00:e9 23 25 a1 01 cc ]
8183809d-818380a2 6 bytes - nt!ExfInterlockedRemoveHeadList+1 (+0x33)
[ fa f0 0f ba 2a 00:e9 58 24 a1 01 cc ]
81846151-81846157 7 bytes - nt!KiFastCallEntry+81 (+0xe0b4)
[ c7 45 08 00 0d db ba:e9 62 42 a0 01 cc cc ]
81846250-81846254 5 bytes - nt!KiServiceExit (+0xff)
[ fa f6 45 72 02:e9 ff ed 88 01 ]
8184633a-8184633c 3 bytes - nt!KiSystemCallExitBranch+2 (+0xea)
[ 5a 59 9d:c8 02 04 ]
81846be8-81846bec 5 bytes - nt!Kei386EoiHelper (+0x8ae)
[ fa f6 45 72 02:e9 df e4 88 01 ]
818492dc-818492e1 6 bytes - nt!KiTrap0E+b4 (+0x26f4)
[ fb f7 45 70 00 02:90 e9 04 be 88 01 ]
81849b74-81849b7b 8 bytes - nt!KiFlushNPXState+4 (+0x898)
[ fa 64 8b 3d 1c 00 00 00:e9 b0 08 a0 01 cc cc cc ]
8189dda2-8189ddab 10 bytes - nt!KeDisableInterrupts+2 (+0x5422e)
[ 25 00 02 00 00 c1 e8 09:e9 6e c6 9a 01 eb f9 cc ]
75 errors : !nt (81837a90-8189ddab)

MODULE_NAME: memory_corruption

IMAGE_NAME: memory_corruption

FOLLOWUP_NAME: memory_corruption

DEBUG_FLR_IMAGE_TIMESTAMP: 0

MEMORY_CORRUPTOR: LARGE

FAILURE_BUCKET_ID: MEMORY_CORRUPTION_LARGE

BUCKET_ID: MEMORY_CORRUPTION_LARGE

Followup: memory_corruption
---------





Re: Windows Filtering Platform (WFP) pending UDP at the ALE_CONNECT layer

Jeremy Drake

Biao Wang [MSFT] wrote:

In your case, did FwpsInjectTransportSendAsync0 succeed

Yes

Biao Wang [MSFT] wrote:

If yes, what's the "Status" code value from the NET_BUFFER_LIST (nbl->Status) when your completion function is invoked

Just checked this, I am getting 0xc0000225.

Any ideas





Re: Windows Filtering Platform (WFP) pending UDP at the ALE_CONNECT layer

Biao Wang [MSFT]

0xc0000225 is STATUS_NOT_FOUND.

Can you check the endpointHandle parameter It should be the same value as the transportEndpointHandle member of the FWPS_INCOMING_METADATA_VALUES0 structure when the FWPS_METADATA_FIELD_TRANSPORT_ENDPOINT_HANDLE bit is set in currentMetadataValues.

Biao.W.





Re: Windows Filtering Platform (WFP) pending UDP at the ALE_CONNECT layer

Biao Wang [MSFT]

yes this is the bug.



Re: Windows Filtering Platform (WFP) pending UDP at the ALE_CONNECT layer

Jeremy Drake

Code Snippet

inMetaValues->currentMetadataValues = 0x4cb8

FWPS_METADATA_FIELD_TRANSPORT_ENDPOINT_HANDLE = 0x8000

inMetaValues->transportEndpointHandle = 0

I am currently passing 0 for the endpointHandle paramter. What should I be passing if the flag is not set





Re: Windows Filtering Platform (WFP) pending UDP at the ALE_CONNECT layer

Jeremy Drake

I noticed in the docs that the FWPS_TRANSPORT_SEND_PARAMS0 struct expects the remoteAddress to be in network byte order, while the fields give them to me in host byte order. I byte-swapped the address, but I am still getting the same error.



Re: Windows Filtering Platform (WFP) pending UDP at the ALE_CONNECT layer

Biao Wang [MSFT]

Yes you need to byte-swap. Also you will need to deep-copy memory to make it aviable outside of classifyFn -- this is true for remoteAddress and controlData (if present).

Biao.W.





Re: Windows Filtering Platform (WFP) pending UDP at the ALE_CONNECT layer

Biao Wang [MSFT]

Ok I see this field is currently not set for ALE_CONNECT. (It will be in SP1)

FwpsInjectTransportSendAsync0 requires a non-NULL endpontHandle. Basically this means clone-drop-re-inject from ALE_CONNECT is not currently feasible. However you shouldn't need to clone-drop-reinject because UDP apps typically re-transmit assuming UDP is not a reliable protocol. So you could simply discard packets while pending for user-mode decision.

If that's not acceptable and you want to be more friendly to the apps, you could implement my suggestions earlier by operating at Datagram-Data. transportEndpointHandle is indicated to DD so FwpsInjectTransportSendAsync0 is fully supported at that layer.





Re: Windows Filtering Platform (WFP) pending UDP at the ALE_CONNECT layer

Jeremy Drake

Biao Wang [MSFT] wrote:

Sounds like you ran into a known bug (which will be corrected in SP1).

Biao Wang [MSFT] wrote:

Ok I see this field is currently not set for ALE_CONNECT. (It will be in SP1)

I don't suppose there is any chance of seeing either of these addressed in a hotfix I am not looking forward to having to hack around this, nor the potential of having seperate code paths for Vista Gold vs SP1 (I know I don't have to remove the kludge for SP1, but it is probably best to).

Because UDP is a connectionless, inherently unreliable protocol, ISTM that dropping packets on the floor would be MORE harmful for it than for TCP, because TCP will at least notice and retransmit automatically. With UDP, it is up to the specific protocol author to handle this, so it is likely to have issues and/or not exist.

Looks like I am going to be queueing data at the DD layer . Will I just have to hope the user decides what to do with the connection before the NonPagedPool is exhausted from queued data





Re: Windows Filtering Platform (WFP) pending UDP at the ALE_CONNECT layer

Biao Wang [MSFT]

Jeremy Drake wrote:

Will I just have to hope the user decides what to do with the connection before the NonPagedPool is exhausted from queued data

The queuing limiting is an issue even if you could operate at ALE_CONNECT. Different from TCP where pending ALE_CONNECT would block the connect() call, UDP app can send data from the get-go. But with a reasonable limit the DD solution should be fine (not saying it is perfect ;-))

Hope this helps,

Biao.W.





Re: Windows Filtering Platform (WFP) pending UDP at the ALE_CONNECT layer

Jeremy Drake

Biao Wang [MSFT] wrote:

The queuing limiting is an issue even if you could operate at ALE_CONNECT.

Yes, but it wouldn't be my issue, it would be your issue, since you would be doing the queuing. That's an important distinction

Basically it sounds like I am going to be reimplementing what happens (or rather, should happen) at the ALE_CONNECT layer when someone asks it to pend a UDP connection.

Again, I don't suppose anything could be done about either bug pre-SP1 I would be amenable to ship some sort of patcher in my installer to support Vista Gold...





Re: Windows Filtering Platform (WFP) pending UDP at the ALE_CONNECT layer

Jeremy Drake

I have been working on implementing a datagram data/flow established callout, and while I can now reinject packets, I run into various issues that make this approach basically unworkable.

  • This callout is supposed to have a very low weight, so that it is only invoked if no other rules on the CONNECT layer matched the packet. Being moved out of the connect layer, it is now called all of the time, which is both inefficient and not the proper behavior for my application.
  • The FlowDelete callback is being called while I'm waiting for a response from userspace, and packets are still queued. On flow delete, I clean up my flow context, which is the decision for the flow and the queued packets. When the user gets around to answering me, I get back a pointer to deleted memory, and promptly crash. I could not do cleanup on flow delete, but then I would never know when it is safe to do the cleanup. Plus, any future packets will not be indicated with the same flowId, so will re-query usermode anyway.

I don't really see any way around these issues. This workaround is looking to be DOA at this point. The combination of these two bugs make writing a pending filter for UDP flows sufficiently difficult to be indistinguishable from impossible. Barring some sort of miracle, it seems my only hope to be able to ask usermode what to do with a UDP flow and be able to actually do it with the flow's data intact is some sort of hotfix or patch that I could invoke from my setup program on Vista RTM.

I really seem to be up a creek here...





Re: Windows Filtering Platform (WFP) pending UDP at the ALE_CONNECT layer

Biao Wang [MSFT]

When you register your DD callout (FwpsCalloutRegister0) you can pass the FWP_CALLOUT_FLAG_CONDITIONAL_ON_FLOW bit into the FWPS_CALLOUT0->flags field. That way your callout will only be invoked for the flow that you associated from FLOW_ESTABLISHED.

Instead of using the flow context pointer as the key, you could use the [transportEndpontHandle, remoteAddress, remotePort] as the key and maintain the pended strutures in a hashtable; you could use the flow-id as a secondary key (and maintain your flow context in a hashtable). That way if the flow is still alive, you can use the flow-id to retrieve your flow context; if that fails (flow context has be removed from the hashtable) you could fallback to the slower search to find clones based on endponit/remoteAddress/port. If you use dual-index, you should ref-count your pended structures.

Biao.W.