More active flows

1. Abstract

From version v2.13 there is a new Stateful scheduler that works better in the case of high concurrent/active flows. In case of EMIX 70% better performance was observed. In this tutorial there are 14 DP cores & up to 8M flows. There is a special config file to enlarge the number of flows. This tutorial present the difference in performance between the old scheduler and the new.

2. Setup details

Server:	UCSC-C240-M4SX
CPU:	2 x Intel® Xeon® CPU E5-2667 v3 @ 3.20GHz
RAM:	65536 @ 2133 MHz
NICs:	2 x Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 01)
QSFP:	Cisco QSFP-H40G-AOC1M
OS:	Fedora 18
Switch:	Cisco Nexus 3172 Chassis, System version: 6.0(2)U5(2).
TRex:	v2.13/v2.12 using 7 cores per dual interface.

3. Traffic profile

cap2/cur_flow_single.yaml

- duration : 0.1
  generator :
          distribution : "seq"
          clients_start : "16.0.0.1"
          clients_end   : "16.0.0.255"
          servers_start : "48.0.0.1"
          servers_end   : "48.0.255.255"
          clients_per_gb : 201
          min_clients    : 101
          dual_port_mask : "1.0.0.0"
  cap_info :
     - name: cap2/udp_10_pkts.pcap  (1)
       cps : 100
       ipg : 200
       rtt : 200
       w   : 1

One directional UDP flow with 10 packets of 64B

4. Config file command

/cfg/trex_08_5mflows.yaml

- port_limit: 4
  version: 2
  interfaces: ['05:00.0', '05:00.1', '84:00.0', '84:00.1']
  port_info:
      - ip: 1.1.1.1
        default_gw: 2.2.2.2
      - ip: 3.3.3.3
        default_gw: 4.4.4.4

      - ip: 4.4.4.4
        default_gw: 3.3.3.3
      - ip: 2.2.2.2
        default_gw: 1.1.1.1

  platform:
      master_thread_id: 0
      latency_thread_id: 15
      dual_if:
        - socket: 0
          threads: [1,2,3,4,5,6,7]

        - socket: 1
          threads: [8,9,10,11,12,13,14]
  memory    :
        dp_flows    : 1048576                    (1)

add memory section with more flows

5. Traffic command

command

[bash]>sudo ./t-rex-64 -f cap2/cur_flow_single.yaml -m 30000 -c 7 -d 40 -l 1000 --active-flows 5000000 -p --cfg cfg/trex_08_5mflows.yaml

The number of active flows can be change using --active-flows CLI. in this example it is set to 5M flows

6. Script to get performance per active number of flows

def minimal_stateful_test(server,csv_file,a_active_flows):

    trex_client = CTRexClient(server)                                   (1)

    trex_client.start_trex(                                             (2)
            c = 7,
            m = 30000,
            f = 'cap2/cur_flow_single.yaml',
            d = 30,
            l = 1000,
            p=True,
            cfg = "cfg/trex_08_5mflows.yaml",
            active_flows=a_active_flows,
            nc=True
            )

    result = trex_client.sample_until_finish()                         (3)

    active_flows=result.get_value_list('trex-global.data.m_active_flows')
    cpu_utl=result.get_value_list('trex-global.data.m_cpu_util')
    pps=result.get_value_list('trex-global.data.m_tx_pps')
    queue_full=result.get_value_list('trex-global.data.m_total_queue_full')
    if queue_full[-1]>10000:
        print("WARNING QUEU WAS FULL");
    tuple=(active_flows[-5],cpu_utl[-5],pps[-5],queue_full[-1])         (4)
    file_writer = csv.writer(test_file)
    file_writer.writerow(tuple);



if __name__ == '__main__':
    test_file = open('tw_2_layers.csv', 'wb');
    parser = argparse.ArgumentParser(description="Example for TRex Stateful, assuming server daemon is running.")

    parser.add_argument('-s', '--server',
                        dest='server',
                        help='Remote trex address',
                        default='127.0.0.1',
                        type = str)
    args = parser.parse_args()

    max_flows=8000000;
    min_flows=100;
    active_flow=min_flows;
    num_point=10
    factor=math.exp(math.log(max_flows/min_flows,math.e)/num_point);
    for i in range(num_point+1):
        print("=====================",i,math.floor(active_flow))
        minimal_stateful_test(args.server,test_file,math.floor(active_flow))
        active_flow=active_flow*factor

    test_file.close();

connect
Start with different active_flows
wait for the results
get the results and save to csv file

This script iterate between 100 to 8M active flows and save the results to csv file.

7. The results v2.12 vs v2.14

MPPS/core

TW0 - v2.14 default configuration
PQ - v2.12 default configuration
To run the same script on v2.12 (that does not support active_flows directive) a patch was introduced.

Observation
TW works better (up to 250%) in case of 25-100K flows
TW scale better with active-flows

8. Tunning

let’s add another modes called TW1, in this mode the scheduler is tune to have more buckets (more memory)

TW1 cap2/cur_flow_single_tw_8.yaml

- duration : 0.1
  generator :
          distribution : "seq"
          clients_start : "16.0.0.1"
          clients_end   : "16.0.0.255"
          servers_start : "48.0.0.1"
          servers_end   : "48.0.255.255"
          clients_per_gb : 201
          min_clients    : 101
          dual_port_mask : "1.0.0.0"
  tw :
     buckets : 16384                    (1)
     levels  : 2                        (2)
     bucket_time_usec : 20.0
  cap_info :
     - name: cap2/udp_10_pkts.pcap
       cps : 100
       ipg : 200
       rtt : 200
       w   : 1

more buckets
less levels

in TW2 mode we have the same template, duplicated one with short IPG and another one with high IPG 10% of the new flows will be with long IPG

TW2 cap2/cur_flow.yaml

- duration : 0.1
  generator :
          distribution : "seq"
          clients_start : "16.0.0.1"
          clients_end   : "16.0.0.255"
          servers_start : "48.0.0.1"
          servers_end   : "48.0.255.255"
          clients_per_gb : 201
          min_clients    : 101
          dual_port_mask : "1.0.0.0"
          tcp_aging      : 0
          udp_aging      : 0
  mac        : [0x0,0x0,0x0,0x1,0x0,0x00]
  #cap_ipg    : true
  cap_info :
     - name: cap2/udp_10_pkts.pcap
       cps : 10
       ipg : 100000
       rtt : 100000
       w   : 1
     - name: cap2/udp_10_pkts.pcap
       cps : 90
       ipg : 2
       rtt : 2
       w   : 1

9. Full results

PQ - v2.12 default configuration
TW0 - v2.14 default configuration
TW1 - v2.14 more buckets 16K
TW2 - v2.14 two templates

MPPS/core Comparison

MPPS/core

Factor relative to v2.12 results

Extrapolation Total GbE per UCS with average packet size of 600B

Observation:

TW2 (two flows) almost does not have a performance impact
TW1 (more buckets) improve the performance up to a point
TW is general is better than PQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trex_appendix_active_flows.asciidoc

trex_appendix_active_flows.asciidoc

More active flows

1. Abstract

2. Setup details

3. Traffic profile

4. Config file command

5. Traffic command

6. Script to get performance per active number of flows

7. The results v2.12 vs v2.14

8. Tunning

9. Full results

Files

trex_appendix_active_flows.asciidoc

Latest commit

History

trex_appendix_active_flows.asciidoc

File metadata and controls

More active flows

1. Abstract

2. Setup details

3. Traffic profile

4. Config file command

5. Traffic command

6. Script to get performance per active number of flows

7. The results v2.12 vs v2.14

8. Tunning

9. Full results