AUDITING NETWORK ACTIVITY

Using Argus

Getting Argus

Argus Wiki

Development

Documentation

Publications

Support

Links

News

Geolocation

There are a lot of definitions for geolocation, but for argus data, geolocation is the use of argus object values for geo-relevant positioning. For pure argus data, the Layer 2 and Layer 3 network addresses that are contained in the flow data records provide the basis for geographically placing the data. For argus data derived from Netflow data, AS numbers can be used to provide a form of netlocation. Additional data that is used to provide relative geolocation are TTL (hops), Round Trip Times, and One-Way Delay metrics. Layer 2 and 3 network address information doesn't provide any sense of where they are, but because Layer 2 and 3 addresses are suppose to be globally unique, at any given moment, there should be a single physical location for each of these objects.

To provide geolocation, such as country codes, or latitude/longitude (lat/lon) information, argus clients use third party databases to provide the mapping between Layer 3 addresses and some geo-relevant information. Argus clients support the use of two free Internet information systems; the InterNic databases, which provide Country Codes, and MaxMind's Opensource GeoIP database, which can provide geolocation for the registered administrator of the domain.

Country codes are fairly reliable, and some IP address location from GeoIP are well mapped, so these free systems are very useful.

Getting the databases

InterNIC Databases

All argus clients can use the databases from the Internic for providing country codes. This support is triggered by simply printing either of the country code designations in the argus record. The database itself is specified in your .rarc fil. This is usually stored in /usr/local/argus/delegated-ipv4-latest. The name is pretty weird, but it follows the convention of how the Internic names its files.

But where did this database come from? A starter file is provided in the argus-clients distribution tarfile, in the ./support/Config directory, but its probably stale when you get it. In the ./support/Config directory, there is a shell script, called ragetcountrycodes.sh. This script uses wget() to retrieve databases from the various internation domain name registries, and merges them together to form our database.

After placing the database file at the PATH specified in your .rarc, printing country codes is available.

MaxMind GeoIP

Argus clients can be configured and compiled to use MaxMind's OpenSource C API libraries, which provide support for using their free and "pay as you go" databases. To enable this support, you need to obtain the GeoIP C API and install it on your system, using the instructions provided. After installation, you configure all the argus clients to use these databases using:

% ./configure --with-GeoIP=yes
% make clean; make

Currently, two (2) programs, ralabel() and radium() provide support for argus record labeling. The Maxmind libraries are configured for the location of the databases, so argus client programs configured with these libraries don't have to have any additional configuraiton support for the information store, they just need to know what kind of geolocation information you're interested in working with.

Ralabel() - Inserting Geolocation Data into Argus Records

Geolocation data maybe relevant for only a short time, and so getting the data into the argus data records is an important support feature. Geolocation data, such as country codes, lat/lon values and AS numbers, have structured storage support, so that you can filter on it, aggregate using the data as aggregation keys, and you can anonymize the information. Use ralabel() and its extensive configuration support to specify what geolocation data will be inserted into each argus record that it encounters. The amount of data added is not huge, but it will have an impact. Most uses of this information involve some form of pipeline processing, where the geolocation data is added at some point in a data pipeline, so that a downstream process can process the records that contain the location data, at which time the data is stripped, prior to storing on disk. This form of "semantic pumping" is a common practice with near real-time flow data processing.

Geolocation data for country codes and AS numbers are structred. We have specific DSR's for this information, the data is stored as packed binary data, and because its in specific DSR's you can filter and aggregate on these values. Other geolocation data such as lat/lon, postal address, state/region, zip and area codes are unstructured. The unstructured data is stored as argus label metadata; ascii text strings with a very simple syntax. These can be printed, merged, grep'ed, and stripped.

To insert geolocation data into argus data, whether its in a file or a stream, you use either ralabel() or radium(). For specific information regarding radium() and record classification/ labeling, please see the radium() documentation. ralabel() has its own ralabel.conf configuration file that turns on the various labeling features, and all the geolocation support is configured using this file. To get a feel for all the features, grab the sample ralabel.conf that came with the most recent argus-clients distribution, and give it a test drive.

In the general case, you will process either "primitive" argus data or you will be processing the data in either an informal or formal pipeline process. Below is a standard strategy for taking an argus file, and labeling both the source and destination IPv4 addresses with geolocation data.

% ralabel -r /tmp/ipaddrs.out -f /tmp/ralabel.conf -w /tmp/ralabel.out

 

Working with Geolocation Labels

Country Codes

Country code databases are maintained by the Internet Corporation for Assigned Names and Numbers (ICANN). The InterNIC, which maintains the global Domain Name Systems registration functions, is supported by a collection of Regional Internet Registries (RIR) that manage the allocation of IP addresses for their region. These organizations maintain databases of responsible parties for the IP addresses that have been allocated. This data store provides the information for IP address/country code databases.

RIR based country code information is a reasonably good source of geolocation information, at least for locating the person/company that claims to have responsibility for a particular address. The actual physical location of a specific address, however, is outside the scope of what the RIRs are doing. But the information is very useful for many geolocation applications. If you want better information, commercial databases can be more accurate. Many commercial databases, however, simply repackage the RIR databases, so take this information with a "grain of salt", so to speak.

As an example, to generate aggregated statistics for country codes, you need to first insert country codes into the records themselves, using ralabel(), and then you need to aggregate the resulting argus data stream, using racluster(). This will require a ralabel.conf configuration file to turn on RALABEL_ARIN_COUNTRY_CODES labeling. Use the sample ./support/Config/ralabel.conf file as a starting point, and uncomment the two lines that reference "ARIN". Assuming you created the file as /tmp/ralabel.conf, run:

% racluster -M rmon -m srcid smac saddr -r daily.argus.data -w /tmp/ipaddrs.out - ipv4
% ralabel -r /tmp/ipaddrs.out -f /tmp/ralabel.conf -w /tmp/ralabel.out
% racluster -m sco -r /tmp/ralabel.out -w - | rasort -m pkts -w /tmp/country.stats.out

The first command will generate the list of singular IPv4 addresses from your daily.argus.data file. The "-M rmon" option is important here, as that tells racluster() to generate stats for singular objects. The second command labels the argus records with the country code appropriate for the IP addresses. Then we cluster again, based on the "sco" label, and sort the output based on packet count.

The resulting /tmp/country.stats.out file will have argus records representing aggreagated statistics for each distinct country code found in the IPv4 addresses in your data. We limit the effort to IPv4 because the labeling is currently only working for IPv4 addresses. This should be corrected soon.

The resulting output argus record file can be printed, sorted, filtered whatever. So lets generate a report that shows the percent traffic per country. Using ra() we specify the columns we want in the report, and we use the "-%" option to print the columns as percent total. We only need three(3) decimal precision here, so:

% ra -r /tmp/ra.out -s stime dur sco:10 pkts:10 bytes -p3
              StartTime        Dur        sCo    TotPkts   TotBytes 
2009/09/15.00:00:00.901  86402.672         US    9092847 8180857733
2009/09/15.15:26:35.781   6932.988         UA      37323   34939933
2009/09/15.00:01:52.826  85437.820         EU      34853   27606569
2009/09/15.12:08:29.805  41747.223         NO       5338    3510110
2009/09/15.00:01:52.388  85318.320         DE       4374    1960894
2009/09/15.00:01:52.733  85038.320         GB       2063     961983
2009/09/15.00:51:19.951  82470.445         JP       1635     646413
2009/09/15.00:19:22.518  84389.109         SE       1336     500372
2009/09/15.00:00:40.821  85499.008         CA       1233     235801
2009/09/15.00:06:17.104  86020.656         FR       1223     154310
2009/09/15.01:18:37.078  80444.398         KR       1067      74638
2009/09/15.16:35:30.421      7.551         PL        900     336890
2009/09/15.10:44:19.486  41702.855         SI        834     470894
2009/09/15.00:19:22.174  84388.914         NL        596      81189
2009/09/15.09:52:10.142  50289.852         IT        545     360437
2009/09/15.00:19:21.677  85236.164         CH        412      46950
2009/09/15.09:51:57.418  43514.777         AU        396     197180
2009/09/15.09:05:00.989  51842.965         AP        216      35912
2009/09/15.00:06:39.473  81546.859         CN        138      17668
2009/09/15.21:49:08.497      1.763         ZA         80      12868
2009/09/15.01:05:13.283  80195.633         TW         64       9190
2009/09/15.01:22:16.781  57680.957         IE         64       8344
2009/09/15.09:41:22.225  42981.516         IN         48       7456
2009/09/15.00:01:53.272  63859.348         RU         44       5401
2009/09/15.00:01:54.930      0.237         NZ         16       1800
2009/09/15.12:52:03.644      0.196         DK         16       1916
2009/09/15.23:37:05.528      0.577         LU         16       2244
2009/09/15.09:52:27.636      0.141         CS          8        864
2009/09/15.11:39:26.752      0.166         FI          8       1202
2009/09/15.09:43:33.302      0.166         BR          4        458
2009/09/15.09:43:15.856      0.105         BE          4        412
2009/09/15.07:59:15.435  12581.292         MX          4        260
2009/09/15.09:43:39.674      2.363         VE          2        150
2009/09/15.15:08:30.196      2.404         HU          2        150
2009/09/15.11:29:36.073      0.000         CO          1         63
2009/09/15.02:59:00.745      0.000         RO          1         79
2009/09/15.08:33:38.455      0.000         AR          1        418
2009/09/15.08:22:13.605      0.000         ES          1         62
2009/09/15.20:27:30.993      0.000         IL          1         92
2009/09/15.08:30:49.794      0.000         HK          1         62

 

Autonomous System Numbers (ASN)

AS Numbers are not strictly geographic location information, but rather network location information. Each IP address resides in a single Source Autonomous System, and like country codes, there is geolocation information for the management entity for each AS in many public and commercial databases. Cisco's Netflow records can provide AS numbers for IP addresses, and they can be either the Origin ASN's, or the Peer AS, which is an Autonomous System that claims to be a good route for traffic headed to a particular IP address. Peer ASNs are your next-hop AS for routing. While this information is important for traffic engineering and routing, it is not useful for geolocation of the IP address itself.

We use the MaxMind GeoIP Lite database to provide Origin AS Number values for IP addresses. These numbers can be filtered and aggregated, so that you can generate views of argus data specific to Origin ASNs. The methods above can be used to generate data views for "sas" the source AS number.

Latitude/Longitude (Lat/Lon)

There are a large number of both commercial and public sources of IP Address GeoLocation Information that can provide lat/lon data. We provide programatic support using MaxMind's GeoIP Open Source API's (see above) which provides lat/lon for IP addresses. MaxMind's commercial database has reported excellent quality for this information.