Documents

PCS Harvesting at Scale

Feb. 19, 2015

1/24
Download
Page 1 from PCS Harvesting at Scale
TOP SECRET Reference: UPC-TDSDITECHIZI Date: 27Th April 2010 PCS Harvesting at Scale Summary This report explores the introduction of an automated approach to Ki harvesting in OPC- TDSD with the aim of increasing the volume of keys that can be collected. Methods are also explored to use data from the automated system to assess the effectiveness of current techniques and improve knowledge of mobile network operations. Work was carried out between January and April 2010 in OPC-TDSD and UPC-CAP. Distribution (all softcopies, via email) UPC-T1351) UPC-HQ UPC-EDP UPC-MGR ICTR UPC-CAP UPD-GTAC NSA TDB TEA of 24 II 1 .. . ._11 .- li?i TOP SECRET
TOP SECRET Reference: UPC-TDSDITECHIZI Date: 27Th April 2010 PCS Harvesting at Scale Summary This report explores the introduction of an automated approach to Ki harvesting in OPC- TDSD with the aim of increasing the volume of keys that can be collected. Methods are also explored to use data from the automated system to assess the effectiveness of current techniques and improve knowledge of mobile network operations. Work was carried out between January and April 2010 in OPC-TDSD and UPC-CAP. Distribution (all softcopies, via email) UPC-T1351) UPC-HQ UPC-EDP UPC-MGR ICTR UPC-CAP UPD-GTAC NSA TDB TEA of 24 II 1 .. . ._11 .- li?i TOP SECRET
Page 2 from PCS Harvesting at Scale
PCS Harvesting at Scale Introducing Automation to Ki Harvesting Efforts in TDSD UPC-TDSD April 2010 Contributions from and Summary Individuai Subscriber Authentication Keys, or Ki vaiues, are required to GSM communications. They are stored both on the mobiie user?s SIM card and at a Home Location Register operated by the provider. TDSD has deveioped a methodoiogy for intercepting these keys as they are transferred between various network operators and SIM card providers. This is now a core part of business carried out by anaiysts in the team. This report expiores the introduction of an automated technique with the aim of increasing the voiume of keys that can be harvested. Methods are also explored to use data from the automated system to assess the effectiveness of current techniques and improve knowiedge of mobiie network operations. 20f24 [his L5 usc::1pL front under the rucdum oi liliUIlchl?..U.?L ALL .3000 and may be LU munlplion under ulecr LK Refer GCHQ 0: Email TOPSECRETSTRAPI
PCS Harvesting at Scale Introducing Automation to Ki Harvesting Efforts in TDSD UPC-TDSD April 2010 Contributions from and Summary Individuai Subscriber Authentication Keys, or Ki vaiues, are required to GSM communications. They are stored both on the mobiie user?s SIM card and at a Home Location Register operated by the provider. TDSD has deveioped a methodoiogy for intercepting these keys as they are transferred between various network operators and SIM card providers. This is now a core part of business carried out by anaiysts in the team. This report expiores the introduction of an automated technique with the aim of increasing the voiume of keys that can be harvested. Methods are also explored to use data from the automated system to assess the effectiveness of current techniques and improve knowiedge of mobiie network operations. 20f24 [his L5 usc::1pL front under the rucdum oi liliUIlchl?..U.?L ALL .3000 and may be LU munlplion under ulecr LK Refer GCHQ 0: Email TOPSECRETSTRAPI
Page 3 from PCS Harvesting at Scale
Table of Contents 1 INTRODUCTION 2 APPROACH 2.1 Automated Technique 2.1.1 Bulk Data Retrieval 2.1.2 Identifying Content 2.1.3 Processing I storing 2.2 Possible improvements 3 RUNNING TRIALS 3.1 Activity of Networks 3.2 Target Discovery 3.3 Measuring Targeting Effectiveness 3.4 Comparison with present efforts 3.4.1 Manually collected Kis 3.4.2 Overall harvesting efforts 4 CONCLUSIONS 4.1 Future Work REFERENCES APPENDIX 30f this Ls lrum disuloaiurc Luich of information ALL .2000 and may be subject to exculptiou under olecr information legislation. liclcr disuloaiurc recucsLs LU Du TOPSECRETSTRAPI linen-sec] 0: Email
Table of Contents 1 INTRODUCTION 2 APPROACH 2.1 Automated Technique 2.1.1 Bulk Data Retrieval 2.1.2 Identifying Content 2.1.3 Processing I storing 2.2 Possible improvements 3 RUNNING TRIALS 3.1 Activity of Networks 3.2 Target Discovery 3.3 Measuring Targeting Effectiveness 3.4 Comparison with present efforts 3.4.1 Manually collected Kis 3.4.2 Overall harvesting efforts 4 CONCLUSIONS 4.1 Future Work REFERENCES APPENDIX 30f this Ls lrum disuloaiurc Luich of information ALL .2000 and may be subject to exculptiou under olecr information legislation. liclcr disuloaiurc recucsLs LU Du TOPSECRETSTRAPI linen-sec] 0: Email
Page 4 from PCS Harvesting at Scale
1 Introduction key harvesting methodology centres around collecting Ki values in transit between mobile network operators and SIM card personalisation centres. Provisioning information is often sent between these organisations by email or FTP with simple methods that can be broken out by OPE-CAP, or occasionally with no at With targeting in place, a large volume of IMSI and associated Ki values can be harvested from UDAQ corporate CZC data repository. With known individuals and operators targeted, items of interest can often be returned from bulk CZC data using a simple search for the terms ?Ki? and in close proximity. Results will often contain a large number of unrelated items, however an analyst with good knowledge of the operators involved can perform this trawl regularly and spot the transfer of large batches of Kis. Work has already been carried out to automate this sifting of bulk data; reference 1 describes techniques successfully trialled so far. This work builds upon these techniques introducing a system to bulk query UDAQ itself, perform the sifting operation on data to identify items of interest, packaging these up in a form that can usefully be interpreted by researchers in OPC- CAP. Summary information is also produced for the use of in TD SD. The main desired outcomes from this work are to: - Improve effectiveness at finding Kis in CZC content repositories. By automating the approach it should be possible to perform a more thorough search than TDSD has had the manpower to do at present. This is likely to bring higher volumes of Kis and IMSIs to light in addition to spotting interesting items that would not have come to the attention of previously. - Improve target knowledge. A more complete picture of data in EEG repositories will allow TDSD to view the effectiveness of current targeting, spot trends as target behaviour changes and also spot any obvious gaps in coverage for example providers for whom this type of harvesting is ineffective. - Develop and enhance TDSD's harvesting methodology. This methodology is based around knowledge of how network operators, SIM suppliers and hardware providers co-operate to share data. By looking at the types of organisations associated with traffic seen in the wild we can test assumptions about communication patterns we expect to take place, improving our knowledge of relationships between these companies. 1 It should also be noted that TDSD have observed the use of strong products being used (eg. PGP products). These have become increasingly common and used as standard for large SIM suppliersfpersonalisation centres to exchange SIM output and input data with mobile network operators. 40f 24 llLis information is exempt from diseiosure under the lv'rccdom of information Act 2000 and may be subject to exemption under oLner LK inlonuation legislation. liclcr diseiosure requEsLs to Dll? Elton-set] or email
1 Introduction key harvesting methodology centres around collecting Ki values in transit between mobile network operators and SIM card personalisation centres. Provisioning information is often sent between these organisations by email or FTP with simple methods that can be broken out by OPE-CAP, or occasionally with no at With targeting in place, a large volume of IMSI and associated Ki values can be harvested from UDAQ corporate CZC data repository. With known individuals and operators targeted, items of interest can often be returned from bulk CZC data using a simple search for the terms ?Ki? and in close proximity. Results will often contain a large number of unrelated items, however an analyst with good knowledge of the operators involved can perform this trawl regularly and spot the transfer of large batches of Kis. Work has already been carried out to automate this sifting of bulk data; reference 1 describes techniques successfully trialled so far. This work builds upon these techniques introducing a system to bulk query UDAQ itself, perform the sifting operation on data to identify items of interest, packaging these up in a form that can usefully be interpreted by researchers in OPC- CAP. Summary information is also produced for the use of in TD SD. The main desired outcomes from this work are to: - Improve effectiveness at finding Kis in CZC content repositories. By automating the approach it should be possible to perform a more thorough search than TDSD has had the manpower to do at present. This is likely to bring higher volumes of Kis and IMSIs to light in addition to spotting interesting items that would not have come to the attention of previously. - Improve target knowledge. A more complete picture of data in EEG repositories will allow TDSD to view the effectiveness of current targeting, spot trends as target behaviour changes and also spot any obvious gaps in coverage for example providers for whom this type of harvesting is ineffective. - Develop and enhance TDSD's harvesting methodology. This methodology is based around knowledge of how network operators, SIM suppliers and hardware providers co-operate to share data. By looking at the types of organisations associated with traffic seen in the wild we can test assumptions about communication patterns we expect to take place, improving our knowledge of relationships between these companies. 1 It should also be noted that TDSD have observed the use of strong products being used (eg. PGP products). These have become increasingly common and used as standard for large SIM suppliersfpersonalisation centres to exchange SIM output and input data with mobile network operators. 40f 24 llLis information is exempt from diseiosure under the lv'rccdom of information Act 2000 and may be subject to exemption under oLner LK inlonuation legislation. liclcr diseiosure requEsLs to Dll? Elton-set] or email
Page 5 from PCS Harvesting at Scale
TOPSECRETSTRAPI Additionally it is likely that similar opportunities exist to introduce this type of automation to other analyst tasks. This work will help develop requirements for such services and bring more automation opportunities to light. 50f 24 [his information is oxonlpt from distiosuro undo: the of information Act 2000 and may be Subject to under other LK information legislation. Refer disciowrc requests to GCHQ Dtt? (non-sec) o:11uii TOPSECRETSTRAPI
TOPSECRETSTRAPI Additionally it is likely that similar opportunities exist to introduce this type of automation to other analyst tasks. This work will help develop requirements for such services and bring more automation opportunities to light. 50f 24 [his information is oxonlpt from distiosuro undo: the of information Act 2000 and may be Subject to under other LK information legislation. Refer disciowrc requests to GCHQ Dtt? (non-sec) o:11uii TOPSECRETSTRAPI
Page 6 from PCS Harvesting at Scale
TOPSECRETSTRAPI 2 Approach Figure 1 shows a high level overview of current manual harvesting methodology. Perform bulk Intercept queries 1' Harvest results manually 7 Perform further data manipulation 1' Forward on to system owners Figure 1 - Manual Ki Harvesting Process in the team regularly perform queries on targeted CZC intercept using UDAQ. A number of queries exist designed to return results liker to contain IMSI and Ki values. Queries often return results with a high noise threshold of several thousand results perhaps a few hundred will contain items of value. The next stage is to trawl these results for items of value. If a list of IMSI and Ki values is found this can be copied from the tool and sent on to UPC-CAP for further processing. In the best case lists of several hundred thousand Kis associated with IMSI values can be found. However, a large number of messages each contain only a few associated Ki values. The responsibility of converting lists into a storable form lies with TDSD can only spend limited time manipulating the layout of data before forwarding. 50f 24 this informalion is Item disclosure under lite l-'recdum of luluruialien Act 2000 and may be subject to DLEier LK inlenuutiou legislation. Refer disclosure requEsLs Lu GCHQ TOPSECRETSTRAPI
TOPSECRETSTRAPI 2 Approach Figure 1 shows a high level overview of current manual harvesting methodology. Perform bulk Intercept queries 1' Harvest results manually 7 Perform further data manipulation 1' Forward on to system owners Figure 1 - Manual Ki Harvesting Process in the team regularly perform queries on targeted CZC intercept using UDAQ. A number of queries exist designed to return results liker to contain IMSI and Ki values. Queries often return results with a high noise threshold of several thousand results perhaps a few hundred will contain items of value. The next stage is to trawl these results for items of value. If a list of IMSI and Ki values is found this can be copied from the tool and sent on to UPC-CAP for further processing. In the best case lists of several hundred thousand Kis associated with IMSI values can be found. However, a large number of messages each contain only a few associated Ki values. The responsibility of converting lists into a storable form lies with TDSD can only spend limited time manipulating the layout of data before forwarding. 50f 24 this informalion is Item disclosure under lite l-'recdum of luluruialien Act 2000 and may be subject to DLEier LK inlenuutiou legislation. Refer disclosure requEsLs Lu GCHQ TOPSECRETSTRAPI
Page 7 from PCS Harvesting at Scale
TOPSECRETSTRAPI 2.1 Automated Technique Figure 2 describes 3 stages of the automated method developed. Automated pulling of data from bulk repositories Analytics identify interesting content Data made available to other systems Figure 2 - Automated Ki Harvesting Process Details of each stage is provided below: 2.1.1 Bulk Data Retrieval ICTR provide a bulk data download capability using the research server LLANDARCYPARK. This was used to automate the querying of C2C content in UDAQ. Given a standard SQL query wrapped in an XML form this will return a package containing all matching C2C intercept. A base query, a proximity search for the strings and was used for this experiment. This can be seen in Appendix 1. Date fields are marked with placeholders so these can be automatically filled out using regular expressions at run time. Results are returned as a compressed ?le containing a CCDF2 mesh. A routine was then written to unpack this mesh, allowing results to be treated from then on as a set of plain text files. Scripts were developed to perform all steps of the operation automatically, retrieving packaged data to be interpreted by the user (reference 6). This operates as follows: The script JrunRemoteQuerysh is used to launch the process. This: - Requests a date range to query - Rewrites the query XML file with required dates 2 Common Data Format. Details are described in reference 4. ?of 24 this information is exempt from disclosure under the freedom of information Act 2000 and may be subject to exemption under oLner LK info:mation legislation. Refer disclosure Lo GCHQ Dil? (non-set) or email TOPSECRETSTRAPI
TOPSECRETSTRAPI 2.1 Automated Technique Figure 2 describes 3 stages of the automated method developed. Automated pulling of data from bulk repositories Analytics identify interesting content Data made available to other systems Figure 2 - Automated Ki Harvesting Process Details of each stage is provided below: 2.1.1 Bulk Data Retrieval ICTR provide a bulk data download capability using the research server LLANDARCYPARK. This was used to automate the querying of C2C content in UDAQ. Given a standard SQL query wrapped in an XML form this will return a package containing all matching C2C intercept. A base query, a proximity search for the strings and was used for this experiment. This can be seen in Appendix 1. Date fields are marked with placeholders so these can be automatically filled out using regular expressions at run time. Results are returned as a compressed ?le containing a CCDF2 mesh. A routine was then written to unpack this mesh, allowing results to be treated from then on as a set of plain text files. Scripts were developed to perform all steps of the operation automatically, retrieving packaged data to be interpreted by the user (reference 6). This operates as follows: The script JrunRemoteQuerysh is used to launch the process. This: - Requests a date range to query - Rewrites the query XML file with required dates 2 Common Data Format. Details are described in reference 4. ?of 24 this information is exempt from disclosure under the freedom of information Act 2000 and may be subject to exemption under oLner LK info:mation legislation. Refer disclosure Lo GCHQ Dil? (non-set) or email TOPSECRETSTRAPI
Page 8 from PCS Harvesting at Scale
- Transfers all required ?les onto the LLANDARCYPARK server, including pulludaq.sh puiludaqsh is then executed on LLANDARCYPARK. This: - Executes the bulk IIB query (can take 5-10 mins) 0 Retrieves query results as compressed CCDF ?les - Unpacks the CCDF contents into a directory as plain text for processing. The next stage is to identify content of interest in the processed files. 2.1.2 Identifying Content Once plain text is retrieved from IIB this is parsed to identify items containing IMSI and Ki values. A previously proven rule based approach is used to identify content of interest. The routine scrapes the plain text identifying lines containing IMSI and Ki values, which may appear in intercept in any conceivable format. The technique also attempts to identify header information describing the contents, as well as associating results with a UDAQ identifier that can be later researched. Further technical discussion on this technique is available in reference 1, IDSD Technical Note 11: What Makes a Good PCS Key Harvester?. A final stage generates statistics and additional information linked to the results, developed in consultation with TD SD This includes: - A list of unique UDAQ item identi?ers resulting in valid Ki 1? IMSI data. This allows to conduct further research into these traffic sources. These are ranked according to the number of sections of IMSI data seen in each UDAQ item. - A list of network and country codes identi?ed. These are derived from the first 6 characters of an IMSI and used to provide an overview of countries and networks identi?ed. - A list of associated email addresses. This is generated by scraping all email addresses from results found to contain valid Ki data. These are then ranked by the number of occurrences of each address. Care should be taken when interpreting ranking positions. In the case of email addresses a higher score does not necessarily indicate association with more Kis, however they can provide an indication of how active an address is. An example set of statistics produced is shown in Appendix 2. 2.1.3 Processing 1' storing Output ?les generated by the previous step typically take the form shown in Appendix 3 section markers separate the UDAQ item reference, potential header information and content. This format was developed alongside It should be noted that although the content will contain IMSI and Ki data it could take any conceivable form it is presented as found in raw intercept. It is the task of OPE-CAP to interpret any additional data in any recognised header section, decoding as necessary. Ki values may still be at this stage. 80f 24 lhis information is oxonlpt from diacioscro under the l?rccdom of information ALL 2000 and may be subject to under oanr LK inlonnation legislation. diacioscro Lo onlail TOPSECRETSTRAPI
- Transfers all required ?les onto the LLANDARCYPARK server, including pulludaq.sh puiludaqsh is then executed on LLANDARCYPARK. This: - Executes the bulk IIB query (can take 5-10 mins) 0 Retrieves query results as compressed CCDF ?les - Unpacks the CCDF contents into a directory as plain text for processing. The next stage is to identify content of interest in the processed files. 2.1.2 Identifying Content Once plain text is retrieved from IIB this is parsed to identify items containing IMSI and Ki values. A previously proven rule based approach is used to identify content of interest. The routine scrapes the plain text identifying lines containing IMSI and Ki values, which may appear in intercept in any conceivable format. The technique also attempts to identify header information describing the contents, as well as associating results with a UDAQ identifier that can be later researched. Further technical discussion on this technique is available in reference 1, IDSD Technical Note 11: What Makes a Good PCS Key Harvester?. A final stage generates statistics and additional information linked to the results, developed in consultation with TD SD This includes: - A list of unique UDAQ item identi?ers resulting in valid Ki 1? IMSI data. This allows to conduct further research into these traffic sources. These are ranked according to the number of sections of IMSI data seen in each UDAQ item. - A list of network and country codes identi?ed. These are derived from the first 6 characters of an IMSI and used to provide an overview of countries and networks identi?ed. - A list of associated email addresses. This is generated by scraping all email addresses from results found to contain valid Ki data. These are then ranked by the number of occurrences of each address. Care should be taken when interpreting ranking positions. In the case of email addresses a higher score does not necessarily indicate association with more Kis, however they can provide an indication of how active an address is. An example set of statistics produced is shown in Appendix 2. 2.1.3 Processing 1' storing Output ?les generated by the previous step typically take the form shown in Appendix 3 section markers separate the UDAQ item reference, potential header information and content. This format was developed alongside It should be noted that although the content will contain IMSI and Ki data it could take any conceivable form it is presented as found in raw intercept. It is the task of OPE-CAP to interpret any additional data in any recognised header section, decoding as necessary. Ki values may still be at this stage. 80f 24 lhis information is oxonlpt from diacioscro under the l?rccdom of information ALL 2000 and may be subject to under oanr LK inlonnation legislation. diacioscro Lo onlail TOPSECRETSTRAPI
Page 9 from PCS Harvesting at Scale
TOPSECRETSTRAPI UPC-CAP have developed and successfully trialled techniques to speed up the task of importing these scripts, indentifying expected column header names and mapping these to data fields, and even automating the final stage. Once properly interpreted these Ki values can be stored, or clear, in relevant databases and shared with partners as necessary. 2.2 Possible improvements A number of improvements have been identified for the above technique. These are described below: - Improved access rights for bulk data retrieval Access to bulk access capability runs on research prototype hardware and is supported only on a best endeavours basis. Making use of a processing user to obtain data, the maximum classification that can be returned is TOP SECRET STRAPZ UK Eyes Only. This means that some data currently retrieved using the manual method, such as password-recovered items, is not available to the automated system. An improved system would allow bulk access to more intercept data. - Processing performance Performance of queries on LLANDARCYPARK is comparable to that of UDAQ, however when large numbers of items are retrieved the generation of statistics can take some time (sometimes hours for large sets). Some simple code optimisations could significantly improve this performance. - Improvements to summary information scores and ranking The value of using ranks to assess the usefulness of an email or UDAQ item identified is limited, since the score used relates to the number of sections of Ki data in a given file. This means where a very large number of IMSIs are identified, but they appear in a single block, a low score is awarded. A value relating to the number of IMSI items would be more useful to identify the most important results. 90f 24 lltis information is exempt from diseiosure under the freedom of information Act 2000 and may be subject to exemption under oLner LK infonnation legislation. Refer diseiosure requeaLs Lo L'm? (non-sec] or email TOPSECRETSTRAPI
TOPSECRETSTRAPI UPC-CAP have developed and successfully trialled techniques to speed up the task of importing these scripts, indentifying expected column header names and mapping these to data fields, and even automating the final stage. Once properly interpreted these Ki values can be stored, or clear, in relevant databases and shared with partners as necessary. 2.2 Possible improvements A number of improvements have been identified for the above technique. These are described below: - Improved access rights for bulk data retrieval Access to bulk access capability runs on research prototype hardware and is supported only on a best endeavours basis. Making use of a processing user to obtain data, the maximum classification that can be returned is TOP SECRET STRAPZ UK Eyes Only. This means that some data currently retrieved using the manual method, such as password-recovered items, is not available to the automated system. An improved system would allow bulk access to more intercept data. - Processing performance Performance of queries on LLANDARCYPARK is comparable to that of UDAQ, however when large numbers of items are retrieved the generation of statistics can take some time (sometimes hours for large sets). Some simple code optimisations could significantly improve this performance. - Improvements to summary information scores and ranking The value of using ranks to assess the usefulness of an email or UDAQ item identified is limited, since the score used relates to the number of sections of Ki data in a given file. This means where a very large number of IMSIs are identified, but they appear in a single block, a low score is awarded. A value relating to the number of IMSI items would be more useful to identify the most important results. 90f 24 lltis information is exempt from diseiosure under the freedom of information Act 2000 and may be subject to exemption under oLner LK infonnation legislation. Refer diseiosure requeaLs Lo L'm? (non-sec] or email TOPSECRETSTRAPI
Page 10 from PCS Harvesting at Scale
TOPSECRETSTRAPI 3 Running Trials The automated harvesting technique was used to extract IMSI and Ki values from bulk data over a 3-month period. This was performed over six 2-week intervals. The resulting number of IMSIs, Kis and associated statistics produced are shown in Table l. UDA unique Query Start Query End addresses item?s? country paired . . . codes With KI Idenn?ed 30?Dec?09 14?Jan?10 130 10 7.802 13?Jan? 10 28?Jan?10 4 11 8.960 2T?Jan? 11?Feb? 10 18 12 1.809 10?Feb?10 25?Feb?10 4 50 18 2.848 24?Feb?10 11?Mar?10 I 6 3 84.93? 10?Mar?10 25?Mar?10 8 1s 473 Table 1 - Details of Trial Queries The technique can be seen to identify a steady stream of IMSI and Ki data over a period of time. UDAQ item identi?ers which contain the IMSI and Ki data can additionally be provided to allowing sources to be further investigated. These results are further analysed in the following section: 10 of 24 this information is Extupl from disclosure and may he Lo excupliou under DLlicr LK information legislation. Refer disclosuan requests to GCHQ cznuil TOPSECRETSTRAPI
TOPSECRETSTRAPI 3 Running Trials The automated harvesting technique was used to extract IMSI and Ki values from bulk data over a 3-month period. This was performed over six 2-week intervals. The resulting number of IMSIs, Kis and associated statistics produced are shown in Table l. UDA unique Query Start Query End addresses item?s? country paired . . . codes With KI Idenn?ed 30?Dec?09 14?Jan?10 130 10 7.802 13?Jan? 10 28?Jan?10 4 11 8.960 2T?Jan? 11?Feb? 10 18 12 1.809 10?Feb?10 25?Feb?10 4 50 18 2.848 24?Feb?10 11?Mar?10 I 6 3 84.93? 10?Mar?10 25?Mar?10 8 1s 473 Table 1 - Details of Trial Queries The technique can be seen to identify a steady stream of IMSI and Ki data over a period of time. UDAQ item identi?ers which contain the IMSI and Ki data can additionally be provided to allowing sources to be further investigated. These results are further analysed in the following section: 10 of 24 this information is Extupl from disclosure and may he Lo excupliou under DLlicr LK information legislation. Refer disclosuan requests to GCHQ cznuil TOPSECRETSTRAPI
Page 11 from PCS Harvesting at Scale
TOPSECRETSTRAPI 3.1 Activity of Networks Unique country codes identified in each of the time periods were correlated to produce the chart shown in Figure 3. Only networks with significant results are shown raw data can be seen in Appendix 4. IMSIs Identified with Ki data for ?atwork Providers 100000 . 10000 - 1000 XI AWCCAFGHANIS AN - 5 . - -- 1?00 IRNCEL IRAN - BABLN. . 10 1 14?Jan?10 23?Jan? 10 11?Feb?10 25?Feb? 10 11?Mar?10 25?Mar? 10 Date range Figure 3 - IMSIs identified with Ki data for Network Providers This shows the number of IMSIs found with Ki data in each period for the providers shown, portraying a steady rate of activity from several networks of interest. New Ki and IMSI pairs are regularly seen for AWCC, TDCA and MTN. A large batch of Somali Kis was recovered in mid-March using this automated process. Somali providers are not on list of interest, hence it is likely this item would have been missed by manual collection, however this was usefully shared with NSA. A number of other unexpected providers were brought to light including Babilon?Mobile in Tajikistan and Icelandic provider Nova 3G. This has demonstrated that an automated Ki recovery method can effectively identify IMSI and Ki pairs from bulk CZC sources for key targets, with the added benefit of identifying content that would not normally come to analyst attention. The chart presented provides an overview of networks accessible in CZC repositories. 3.2 Target Discovery 11 of 24 this information is oaonlpt from disciosIn-c under 1ch Freedom of information Act 2000 and may be subject to exculption under other LK information legislation. Refer disciowrc rcquesLs Lo GCHQ on (non-set) cznail TOPSECRETSTRAPI
TOPSECRETSTRAPI 3.1 Activity of Networks Unique country codes identified in each of the time periods were correlated to produce the chart shown in Figure 3. Only networks with significant results are shown raw data can be seen in Appendix 4. IMSIs Identified with Ki data for ?atwork Providers 100000 . 10000 - 1000 XI AWCCAFGHANIS AN - 5 . - -- 1?00 IRNCEL IRAN - BABLN. . 10 1 14?Jan?10 23?Jan? 10 11?Feb?10 25?Feb? 10 11?Mar?10 25?Mar? 10 Date range Figure 3 - IMSIs identified with Ki data for Network Providers This shows the number of IMSIs found with Ki data in each period for the providers shown, portraying a steady rate of activity from several networks of interest. New Ki and IMSI pairs are regularly seen for AWCC, TDCA and MTN. A large batch of Somali Kis was recovered in mid-March using this automated process. Somali providers are not on list of interest, hence it is likely this item would have been missed by manual collection, however this was usefully shared with NSA. A number of other unexpected providers were brought to light including Babilon?Mobile in Tajikistan and Icelandic provider Nova 3G. This has demonstrated that an automated Ki recovery method can effectively identify IMSI and Ki pairs from bulk CZC sources for key targets, with the added benefit of identifying content that would not normally come to analyst attention. The chart presented provides an overview of networks accessible in CZC repositories. 3.2 Target Discovery 11 of 24 this information is oaonlpt from disciosIn-c under 1ch Freedom of information Act 2000 and may be subject to exculption under other LK information legislation. Refer disciowrc rcquesLs Lo GCHQ on (non-set) cznail TOPSECRETSTRAPI
Page 12 from PCS Harvesting at Scale
TOPSECRETSTRAPI An experiment was carried out to make use of results from this technique for target discovery. Statistics produced alongside results include email addresses appearing in communications alongside this content. These email addresses are scored by the number of times they are seen. It was proposed that analysis of these addresses should bring to light common communication patterns between operators, as well as help identify actors most involved in the sharing of Ki data. UDAQ C2C collection is targeted; hence any traffic found will originate from an identifier in corporate systems. However it was surmised that additional useful contact addresses could be found associated with traf?c. All email addresses associated with traffic in each of the 6 periods were compiled together. This resulted in a list of 154 unique email addresses, each associated with a score. From this it was possible to identify a number of candidate targets for further research that scored highly: - target?s email handle suggests an Ericsson employee using a webmail account - -@huawei.com this was the highest scoring overall address, a previously unknown target on the Huawei network. - mm highest scoring webmail address, indicating lots of activity associated with IMSIs and Kis, was a previously unknown target. - ics.mc a number of users associated with this previously unknown domain. EDI research shows international gateway for South African provider MTN - an MSN address found to be associated with IMSIs and Kis This has demonstrated a number of opportunities to apply this harvesting technique to target discovery efforts. 3.3 Measuring Targeting Effectiveness An experiment was carried out to discover the effectiveness of current targeting methods. Email addresses identified in the previous section were converted into a list of domains, again scored by the number of associations with data. The complete list can be seen in Appendix 5. It was then possible to group domains into 5 categories: - Hardware Companies Organisations such as Huawei, Ericsson, who manufacture PCS hardware. - Network Operators Operators of mobile networks such as MTN Irancell, Belgacom. - SIM Suppliers SIM Suppliers or SIM Personalisation centres, for example Bluefish. - Mail Providers Users of general email providers (Gmail, Yahoo etc). These may be in use by employees of any of the above. 12 of 24 this Ls exculpt from discloacrc tunic-r 1ch freedom of luloruulton ALL .2000 and may subject to exculptiou under olecr LK information legislation. Refer discloacrc recucsLs to Du? o: Email TOPSECRETSTRAPI
TOPSECRETSTRAPI An experiment was carried out to make use of results from this technique for target discovery. Statistics produced alongside results include email addresses appearing in communications alongside this content. These email addresses are scored by the number of times they are seen. It was proposed that analysis of these addresses should bring to light common communication patterns between operators, as well as help identify actors most involved in the sharing of Ki data. UDAQ C2C collection is targeted; hence any traffic found will originate from an identifier in corporate systems. However it was surmised that additional useful contact addresses could be found associated with traf?c. All email addresses associated with traffic in each of the 6 periods were compiled together. This resulted in a list of 154 unique email addresses, each associated with a score. From this it was possible to identify a number of candidate targets for further research that scored highly: - target?s email handle suggests an Ericsson employee using a webmail account - -@huawei.com this was the highest scoring overall address, a previously unknown target on the Huawei network. - mm highest scoring webmail address, indicating lots of activity associated with IMSIs and Kis, was a previously unknown target. - ics.mc a number of users associated with this previously unknown domain. EDI research shows international gateway for South African provider MTN - an MSN address found to be associated with IMSIs and Kis This has demonstrated a number of opportunities to apply this harvesting technique to target discovery efforts. 3.3 Measuring Targeting Effectiveness An experiment was carried out to discover the effectiveness of current targeting methods. Email addresses identified in the previous section were converted into a list of domains, again scored by the number of associations with data. The complete list can be seen in Appendix 5. It was then possible to group domains into 5 categories: - Hardware Companies Organisations such as Huawei, Ericsson, who manufacture PCS hardware. - Network Operators Operators of mobile networks such as MTN Irancell, Belgacom. - SIM Suppliers SIM Suppliers or SIM Personalisation centres, for example Bluefish. - Mail Providers Users of general email providers (Gmail, Yahoo etc). These may be in use by employees of any of the above. 12 of 24 this Ls exculpt from discloacrc tunic-r 1ch freedom of luloruulton ALL .2000 and may subject to exculptiou under olecr LK information legislation. Refer discloacrc recucsLs to Du? o: Email TOPSECRETSTRAPI
Page 13 from PCS Harvesting at Scale
TOPSECRETSTRAPI - Other Unknown Most of TD targeting effort is focussed on SIM suppliers and network operators, hence it was expected that most associated addresses would fall into these categories. Category Associations Hardware Companies 743 Mail Providers 298 Sim Suppliers 38 Network Operators 603 Dtherr? Unknown 3? Table 2 - Types of organisations associated with traffic Table 2 shows how often each type of organisation was associated with Ki traffic. Contrary to expectation the vast majority of addresses seen belonged either to network operators or hardware companies. This could indicate increased use of strong products amongst SIM suppliers, leaving only the other groups open to this method of exploitation. TDSD may wish to ensure that targeting for SIM suppliers is up to date, as well as investigating the possibility of targeting hardware companies and network operators to improve results. 3.4 Comparison with present efforts 3.4.1 Manually collected Kis A manual trawl of UDAQ data was performed against AWCC for the period between 28Lh March and 10w1 April 2010. This was compared directly against results from an automated run over the same period, not targeted against any particular provider. 13 of 24 [his information is exempt from diseiosure under the freedom of information ALL 2000 and may be subject to exemption under oLner LK infonuation legislation. Refer diseiosure requEsLs Lo enlaii TOPSECRETSTRAPI
TOPSECRETSTRAPI - Other Unknown Most of TD targeting effort is focussed on SIM suppliers and network operators, hence it was expected that most associated addresses would fall into these categories. Category Associations Hardware Companies 743 Mail Providers 298 Sim Suppliers 38 Network Operators 603 Dtherr? Unknown 3? Table 2 - Types of organisations associated with traffic Table 2 shows how often each type of organisation was associated with Ki traffic. Contrary to expectation the vast majority of addresses seen belonged either to network operators or hardware companies. This could indicate increased use of strong products amongst SIM suppliers, leaving only the other groups open to this method of exploitation. TDSD may wish to ensure that targeting for SIM suppliers is up to date, as well as investigating the possibility of targeting hardware companies and network operators to improve results. 3.4 Comparison with present efforts 3.4.1 Manually collected Kis A manual trawl of UDAQ data was performed against AWCC for the period between 28Lh March and 10w1 April 2010. This was compared directly against results from an automated run over the same period, not targeted against any particular provider. 13 of 24 [his information is exempt from diseiosure under the freedom of information ALL 2000 and may be subject to exemption under oLner LK infonuation legislation. Refer diseiosure requEsLs Lo enlaii TOPSECRETSTRAPI
Page 14 from PCS Harvesting at Scale
TOPSECRETSTRAPI In the manual trawl l4 UDAQ items were identi?ed, all containing 1 or more pair for AWCC. The automated run found 12 UDAQ items, 3 of which had been identi?ed in the manual trawl. A summary of results is shown in Table 3: Fou mi in search Result Date Manual Automated Details Comments 1 29-Mar-10 I AWCC .No occurrence of "it-i151" 2 Z?Apr?l? I AWCC _No occurrence o1' multi?Iine 3 3-Apr-1D i Huawei. HLR inconsistency. BE lines 4 15-Apr-10 AWCC No occurrence of "iMSi". multi-Iine 5 savior-10 I awcc, onlv ointpuk into a 5?Apr?lD i awcc new activation I I a: snot-1o _m new activation 5?Apr?lD awcc new activation to 23?Apr?10 i it awcc 11 ?-Apt-lD I AWCC No occurrence of trust. multi-Iine 12 15-Apr-10 0 Roshan new sim vendor query 13 error-10 I awcc 1?4 7'AlJl'l'3 . . .. . 15 AWCC _No occurrence of IMSI . l?j 'r?Apr?l?i iawcc occurrence o1' multi?Iine 23?Apr?10 AWCC No occurrence o1' multi?Iine_ 23-Apr-10 . AWCC No occurrence of trust. multi-Iine 153 23-Apr-10 I AWCC No occurrence of multi-Iine 2o a?Apr?lo i sim replacement 21 23-Apr-10 I AWCC sim replacement 22 II awcc new activation 23 3?Apr?1D HLR update containing 53 items Same as item 3 Table 3 - Results of Ki IMSI trawl The manual search resulted in a total of 27? IMSI values for AWCC. The automated search resulted in 320 values, 26 of which were from the AWCC network. The automated methods also identified 10 unique IMSIs from Roshan and 83 from MTN Yemen (results 3 and 23). It can be seen that the automated search missed the majority of manually recovered items. Reasons for this are noted in the comments column: in all cases the string IMSI did not appear in the results file, hence these items were not returned in the initial bulk query. The majority of these items also had IMSI and Ki data split across multiple lines, meaning they would not have been identified by the detection techniques employed in this work in any case. Both techniques found comparable quantities of IMSIs for AWCC with the result sets being mostly complimentary. This has demonstrated that although the automated method is able to return a representative set of items from bulk data, and often-larger volumes of Kis, it tended to miss items found manually. More work is required both at the initial bulk query stage as well as with processing and detection techniques. 14 of 24 this inionnation is exeulpL from disclosure under the l-?reedom of information Aer 2000 and may be Subject to exemption under DLiter LK information legislation. Refer disclosure requests to GCHQ on (non-set) e:11ail TOPSECRETSTRAPI
TOPSECRETSTRAPI In the manual trawl l4 UDAQ items were identi?ed, all containing 1 or more pair for AWCC. The automated run found 12 UDAQ items, 3 of which had been identi?ed in the manual trawl. A summary of results is shown in Table 3: Fou mi in search Result Date Manual Automated Details Comments 1 29-Mar-10 I AWCC .No occurrence of "it-i151" 2 Z?Apr?l? I AWCC _No occurrence o1' multi?Iine 3 3-Apr-1D i Huawei. HLR inconsistency. BE lines 4 15-Apr-10 AWCC No occurrence of "iMSi". multi-Iine 5 savior-10 I awcc, onlv ointpuk into a 5?Apr?lD i awcc new activation I I a: snot-1o _m new activation 5?Apr?lD awcc new activation to 23?Apr?10 i it awcc 11 ?-Apt-lD I AWCC No occurrence of trust. multi-Iine 12 15-Apr-10 0 Roshan new sim vendor query 13 error-10 I awcc 1?4 7'AlJl'l'3 . . .. . 15 AWCC _No occurrence of IMSI . l?j 'r?Apr?l?i iawcc occurrence o1' multi?Iine 23?Apr?10 AWCC No occurrence o1' multi?Iine_ 23-Apr-10 . AWCC No occurrence of trust. multi-Iine 153 23-Apr-10 I AWCC No occurrence of multi-Iine 2o a?Apr?lo i sim replacement 21 23-Apr-10 I AWCC sim replacement 22 II awcc new activation 23 3?Apr?1D HLR update containing 53 items Same as item 3 Table 3 - Results of Ki IMSI trawl The manual search resulted in a total of 27? IMSI values for AWCC. The automated search resulted in 320 values, 26 of which were from the AWCC network. The automated methods also identified 10 unique IMSIs from Roshan and 83 from MTN Yemen (results 3 and 23). It can be seen that the automated search missed the majority of manually recovered items. Reasons for this are noted in the comments column: in all cases the string IMSI did not appear in the results file, hence these items were not returned in the initial bulk query. The majority of these items also had IMSI and Ki data split across multiple lines, meaning they would not have been identified by the detection techniques employed in this work in any case. Both techniques found comparable quantities of IMSIs for AWCC with the result sets being mostly complimentary. This has demonstrated that although the automated method is able to return a representative set of items from bulk data, and often-larger volumes of Kis, it tended to miss items found manually. More work is required both at the initial bulk query stage as well as with processing and detection techniques. 14 of 24 this inionnation is exeulpL from disclosure under the l-?reedom of information Aer 2000 and may be Subject to exemption under DLiter LK information legislation. Refer disclosure requests to GCHQ on (non-set) e:11ail TOPSECRETSTRAPI
Page 15 from PCS Harvesting at Scale
TOPSECRETSTRAPI 3.4.2 Overall harvesting efforts TDSD and OPE-CAP collect overall stats for Kis harvested from networks of interest (reference 5). Overall rates of Kis received over a 3-month period, January March 2010, were compared against those from the automated technique. Figure 4 shows this comparison for a range of networks. New Kis ide ntified 10,000,000 1,000,000 - 100,000 10,000 - El 3 month total 1,000 DAuton'ate-d collection 100 - 10 1 MTN. Yemen Nova, Iceland Ideacl. India Teles, Somalia Irancell, Iran Mobtel, Serbia Babilon, Tajikstan AWCC. Afghanistan Roshan. Afghanistan Sabafn. Yemen Mobilink, Pakistan Telenor Pakistan Figure 4 - comparing data from the trial to historical data (priority targets marked *1 The overall data set contains values gained from a range of sources including Ki generation techniques and information sharing with partners. It can be seen that for the first three providers; AWCC, Irancell and Roshan; the number of keys collected by automated harvesting is comparatively small. Many of the larger batches of Kis received in this period were provided by partners on request, and it is difficult to estimate the real time period they were collected over. Additionally, the value of a small number of Kis should not be underestimated as these can often be used as seeds to generate much larger batches. It is clear that the automated technique is able to identify Kis for a greater range of networks, successfully identifying a large batch of Kis for a particular Somali provider. This comparison did bring to light a number of networks where the CZC harvesting method is not bringing results, notably the Pakistani networks Mobilink and Telenor for whom we do have a store of Kis. There could be a number of explanations: it is possible that these 15 of 24 this information is from disciosarc undo: the Freedom of information Act 2000 and may be Subject to exonlpLion under oLiicr LK information legislation. Refer disciosarc roanSLs Lo GCHQ Dit? (non-set) cznail TOPSECRETSTRAPI
TOPSECRETSTRAPI 3.4.2 Overall harvesting efforts TDSD and OPE-CAP collect overall stats for Kis harvested from networks of interest (reference 5). Overall rates of Kis received over a 3-month period, January March 2010, were compared against those from the automated technique. Figure 4 shows this comparison for a range of networks. New Kis ide ntified 10,000,000 1,000,000 - 100,000 10,000 - El 3 month total 1,000 DAuton'ate-d collection 100 - 10 1 MTN. Yemen Nova, Iceland Ideacl. India Teles, Somalia Irancell, Iran Mobtel, Serbia Babilon, Tajikstan AWCC. Afghanistan Roshan. Afghanistan Sabafn. Yemen Mobilink, Pakistan Telenor Pakistan Figure 4 - comparing data from the trial to historical data (priority targets marked *1 The overall data set contains values gained from a range of sources including Ki generation techniques and information sharing with partners. It can be seen that for the first three providers; AWCC, Irancell and Roshan; the number of keys collected by automated harvesting is comparatively small. Many of the larger batches of Kis received in this period were provided by partners on request, and it is difficult to estimate the real time period they were collected over. Additionally, the value of a small number of Kis should not be underestimated as these can often be used as seeds to generate much larger batches. It is clear that the automated technique is able to identify Kis for a greater range of networks, successfully identifying a large batch of Kis for a particular Somali provider. This comparison did bring to light a number of networks where the CZC harvesting method is not bringing results, notably the Pakistani networks Mobilink and Telenor for whom we do have a store of Kis. There could be a number of explanations: it is possible that these 15 of 24 this information is from disciosarc undo: the Freedom of information Act 2000 and may be Subject to exonlpLion under oLiicr LK information legislation. Refer disciosarc roanSLs Lo GCHQ Dit? (non-set) cznail TOPSECRETSTRAPI
Page 16 from PCS Harvesting at Scale
TOPSECRETSTRAPI networks now use more secure methods to transfer Kis, or targeting for those networks might be ineffective. In summary, the automated technique is unlikely to bring in very large batches of Ki data of the size produced with Ki generation schemes or received from partner repositories. However it can bring in a steady stream of data over a period of time. These smaller volumes can fill gaps where no other data is available, and also provide essential seed points from which Ki generation can be applied. 16 of 24 lliis inlormuiion is exenlpl from disclosure under ilie l-'reedonl oi inlormuiion Act 2000 and may be Subjeel Lo exculplion under LK information legislation. Refer disclosure requesLs to GCHQ on? (non-sec) e:11uil TOPSECRETSTRAPI
TOPSECRETSTRAPI networks now use more secure methods to transfer Kis, or targeting for those networks might be ineffective. In summary, the automated technique is unlikely to bring in very large batches of Ki data of the size produced with Ki generation schemes or received from partner repositories. However it can bring in a steady stream of data over a period of time. These smaller volumes can fill gaps where no other data is available, and also provide essential seed points from which Ki generation can be applied. 16 of 24 lliis inlormuiion is exenlpl from disclosure under ilie l-'reedonl oi inlormuiion Act 2000 and may be Subjeel Lo exculplion under LK information legislation. Refer disclosure requesLs to GCHQ on? (non-sec) e:11uil TOPSECRETSTRAPI
Page 17 from PCS Harvesting at Scale
4 Conclusions This work has demonstrated that an automated method of Ki recovery, once in place, can deliver significant results with little manual effort compared to current harvesting methods. In addition to Ki harvesting a number of further applications have been demonstrated: the monitoring of mobile network activity, where views have been provided over a 3-month period; discovery of new target identi?ers associated with detected traffic; and methods of measuring the effectiveness of current techniques. A picture of types of organisations associated with Ki traffic has been constructed providing a new view of mobile network operations to TDSD. It has also been shown that although the automated method is able to return a representative set of items from bulk data, it often fails to detect all items that would be found manually. More work is required at the initial bulk query stage and also with detection techniques to ensure accurate and full coverage of Ki data. Whilst problems have been identified such as limits on coverage due to access restrictions, this work makes a strong case that such harvesting efforts will continue to deliver results in TDSD and areas such as the CP SD team. It is the author?s view that increased levels of corporate support for such bulk data processing activities would allow TDSD, as well as many other business areas, to benefit from more applications of these techniques. 4.1 Future Work A number of items of follow-up work have been identified: - Improving initial query effectiveness It has been shown that the initial base ?proximity? query is not effective enough to return all results currently found using manual harvesting. Work should be carried out to identify more effective queries to process data on. An alternative option is to run the technique repeatedly against a number of result sets. - Improved detection techniques Detection techniques are unable to identify Ki and IMSI data where the fields of interest appear on separate lines (see section 3.4.2). An improved technique would ensure these results are also detected and included. - Improved summary information Summary information currently consists of a list of email addresses, UDAQ item identifiers and network codes associated with simple scores. would like to be able to find the UDAQ item associated with a particular IMSI or email address more easily. An improved scoring system would also help more accurately 17? of 24 lhis information is from diocioscrc under the freedom of information ALL 2000 and may be subject to under DLiior LK information legislation. diocioscrc roqutsLs Lo Du? (non-sot) o: entail TOPSECRETSTRAPI
4 Conclusions This work has demonstrated that an automated method of Ki recovery, once in place, can deliver significant results with little manual effort compared to current harvesting methods. In addition to Ki harvesting a number of further applications have been demonstrated: the monitoring of mobile network activity, where views have been provided over a 3-month period; discovery of new target identi?ers associated with detected traffic; and methods of measuring the effectiveness of current techniques. A picture of types of organisations associated with Ki traffic has been constructed providing a new view of mobile network operations to TDSD. It has also been shown that although the automated method is able to return a representative set of items from bulk data, it often fails to detect all items that would be found manually. More work is required at the initial bulk query stage and also with detection techniques to ensure accurate and full coverage of Ki data. Whilst problems have been identified such as limits on coverage due to access restrictions, this work makes a strong case that such harvesting efforts will continue to deliver results in TDSD and areas such as the CP SD team. It is the author?s view that increased levels of corporate support for such bulk data processing activities would allow TDSD, as well as many other business areas, to benefit from more applications of these techniques. 4.1 Future Work A number of items of follow-up work have been identified: - Improving initial query effectiveness It has been shown that the initial base ?proximity? query is not effective enough to return all results currently found using manual harvesting. Work should be carried out to identify more effective queries to process data on. An alternative option is to run the technique repeatedly against a number of result sets. - Improved detection techniques Detection techniques are unable to identify Ki and IMSI data where the fields of interest appear on separate lines (see section 3.4.2). An improved technique would ensure these results are also detected and included. - Improved summary information Summary information currently consists of a list of email addresses, UDAQ item identifiers and network codes associated with simple scores. would like to be able to find the UDAQ item associated with a particular IMSI or email address more easily. An improved scoring system would also help more accurately 17? of 24 lhis information is from diocioscrc under the freedom of information ALL 2000 and may be subject to under DLiior LK information legislation. diocioscrc roqutsLs Lo Du? (non-sot) o: entail TOPSECRETSTRAPI
Page 18 from PCS Harvesting at Scale
TOPSECRETSTRAPI prioritise items found. Additionally, the accuracy of results could be improved by detecting only IMSIs with valid country and network codes. - Bulk access limitations The maximum classification that can be returned from LLANDARCYPARK is TOP SECRET STRAPZ UK Eyes Only. This limits access to some data likely to contain IMSI and Ki values, such as password-recovered items. An improved system would allow bulk access to the full range of data. - Adapting technique to be used for other key types This technique currently identifies only IMSI and Ki values. In time it should be extended to also support efforts against UTA keys, UMTS and more. - Data mining opportunities Opportunities exist to mine bulk data produced during this process, potentially detecting further items of interest and developing knowledge of targets involved. Proposed ideas include detecting requests for batches of data by identifying messages containing maximum and minimum SIM values. - Corporate support for bulk CZC processing Access to bulk access capability is restricted to a small number of users, however a number of business units have expressed an interest. This work should continue to be used to develop requirements for a corporate solution allowing more business units to benefit from these types of techniques. 18 of 24 fltis information is esenlpt from diseiosure under the lv'recdom of information Act 2000 and may be subject to exemption under oLner LK infonnation legislation. Refer diseiosure reQUEsLs to Dll? (non-set] o: enlail TOPSECRETSTRAPI
TOPSECRETSTRAPI prioritise items found. Additionally, the accuracy of results could be improved by detecting only IMSIs with valid country and network codes. - Bulk access limitations The maximum classification that can be returned from LLANDARCYPARK is TOP SECRET STRAPZ UK Eyes Only. This limits access to some data likely to contain IMSI and Ki values, such as password-recovered items. An improved system would allow bulk access to the full range of data. - Adapting technique to be used for other key types This technique currently identifies only IMSI and Ki values. In time it should be extended to also support efforts against UTA keys, UMTS and more. - Data mining opportunities Opportunities exist to mine bulk data produced during this process, potentially detecting further items of interest and developing knowledge of targets involved. Proposed ideas include detecting requests for batches of data by identifying messages containing maximum and minimum SIM values. - Corporate support for bulk CZC processing Access to bulk access capability is restricted to a small number of users, however a number of business units have expressed an interest. This work should continue to be used to develop requirements for a corporate solution allowing more business units to benefit from these types of techniques. 18 of 24 fltis information is esenlpt from diseiosure under the lv'recdom of information Act 2000 and may be subject to exemption under oLner LK infonnation legislation. Refer diseiosure reQUEsLs to Dll? (non-set] o: enlail TOPSECRETSTRAPI
Page 19 from PCS Harvesting at Scale
TOPSECRETSTRAPI References 1. TDSD Technical Note 11: What Makes a Good PCS Key Harvester? TDSD. .12m January 20.10, available on request from TDSD 2. DRAFT METHODOLOGY for investigating SIM card supplier relationships with Target Mobile phone operators . TDSD. 2010. available from - 3. ICTR Bulk MB Download Capability 4. Common Data Model FAQ 5. TDSD Non EPR Statistics 6. PCS Harvesting Scripts are stored under ClearCase and can be accessed and run from the following location: 19 of 24 :13. In?ll]: fan: Ln.- tx 133$; TOPSECRETSTRAPI
TOPSECRETSTRAPI References 1. TDSD Technical Note 11: What Makes a Good PCS Key Harvester? TDSD. .12m January 20.10, available on request from TDSD 2. DRAFT METHODOLOGY for investigating SIM card supplier relationships with Target Mobile phone operators . TDSD. 2010. available from - 3. ICTR Bulk MB Download Capability 4. Common Data Model FAQ 5. TDSD Non EPR Statistics 6. PCS Harvesting Scripts are stored under ClearCase and can be accessed and run from the following location: 19 of 24 :13. In?ll]: fan: Ln.- tx 133$; TOPSECRETSTRAPI
Page 20 from PCS Harvesting at Scale
TOP SECRET STRAPI Appendix 1 Example proximity query used by LLANDARCY PARK <?xml <cib:query exportQuery="true" <cib:query-text} SELECT Item_ID FROM CIB.CIB WHERE DatE_Df_Intercept {d &apos; AND Date_0f_Intercept {d AND Content 2 ?apos;( imsi AND Ki WITHIN 63 )?apos; AND Item_Type IN <fcib:query-text> <cib:queryHetadata> <cib:property intercept<lcib:property> <cib:property <cib:property SECRET STRAP1<fcib:property} <cib:proper ty <cib:property <cib:property Theme RESEARCH INTO SIM CARD SUPPLY GSM OPERATORS UPI-MENA AND {Icib:queryMetadata> <fcib:query} 2 Example stats.txt produced by script IHSI results: Emails: 9 items ?@id ea . ad it yabir la . corn ?@bluefish . corn nidea . aditya birla . corn mgrameenphone. com ?@grameenphone. corn ?@bluefish .com E) -@bluefish . corn 20 of 24 lhis infurmatitm i5 L'xt'mpt [mm undvr Ehl? [-Tvvdurn [1i [Iiinrmatitm ALI information ivgisiatiun. Rv?'r [listiusurv It} [111 (nun?5m} [1r L-rnaii? TOP SECRET STRAPI
TOP SECRET STRAPI Appendix 1 Example proximity query used by LLANDARCY PARK <?xml <cib:query exportQuery="true" <cib:query-text} SELECT Item_ID FROM CIB.CIB WHERE DatE_Df_Intercept {d &apos; AND Date_0f_Intercept {d AND Content 2 ?apos;( imsi AND Ki WITHIN 63 )?apos; AND Item_Type IN <fcib:query-text> <cib:queryHetadata> <cib:property intercept<lcib:property> <cib:property <cib:property SECRET STRAP1<fcib:property} <cib:proper ty <cib:property <cib:property Theme RESEARCH INTO SIM CARD SUPPLY GSM OPERATORS UPI-MENA AND {Icib:queryMetadata> <fcib:query} 2 Example stats.txt produced by script IHSI results: Emails: 9 items ?@id ea . ad it yabir la . corn ?@bluefish . corn nidea . aditya birla . corn mgrameenphone. com ?@grameenphone. corn ?@bluefish .com E) -@bluefish . corn 20 of 24 lhis infurmatitm i5 L'xt'mpt [mm undvr Ehl? [-Tvvdurn [1i [Iiinrmatitm ALI information ivgisiatiun. Rv?'r [listiusurv It} [111 (nun?5m} [1r L-rnaii? TOP SECRET STRAPI
Page 21 from PCS Harvesting at Scale
TOP SECRET 12 ?@bluefish .com 18 _@grameenphone. com UDAQ Item Identifiers used: 8 items Country Codes: 16 items 4 421020 8 340041 8 012000 9 404040 10 410011 12 220018 16 412012 10 404120 26 048032 40 452048 40 510880 42 4?0010 56 220020 00 404041 108 220012 809 412200 IHSIs: 423 items 21 of 24 TOP SECRET STRAPI
TOP SECRET 12 ?@bluefish .com 18 _@grameenphone. com UDAQ Item Identifiers used: 8 items Country Codes: 16 items 4 421020 8 340041 8 012000 9 404040 10 410011 12 220018 16 412012 10 404120 26 048032 40 452048 40 510880 42 4?0010 56 220020 00 404041 108 220012 809 412200 IHSIs: 423 items 21 of 24 TOP SECRET STRAPI
Page 22 from PCS Harvesting at Scale
TOPSECRETSTRAPI 3 Example PCS Ki output file *ii?k?k 22 of 24 TOPSECRETSTRAPI
TOPSECRETSTRAPI 3 Example PCS Ki output file *ii?k?k 22 of 24 TOPSECRETSTRAPI
Page 23 from PCS Harvesting at Scale
4 IMSI results broken down by network code Network oode Location Period 1 Period 2 Period 0 Period 4 Period 5 Period 0 000000 INVALID 4 000021 INVALID 0' 0' 012400 INVALID '111111 INVALID 4' 123454 .INVALID -. 201002 INVALID 4 210231 INVALID 220012 SERBIA 100' :222013 ITALV 2 2 1' 220010 ITALY 12 220020 PROMNT. MONTENEGRO 50 224113 NOVA. ICELAND 2 02 40 I 340041 . FRENCH GUADELOUPE AND SAINT MARTIN 0 345120 INVALID 22 045012 INVALID 0' 052040 31 002001 INVALID 110' 3000400 SAINT LUCIA 22: 404040 IDEA-3L, INDIA 0. 404041 IDEA-3L, INDIA 00 404120 INDIA 10' 404002 VOPNOG. INDIA 00' 410011 PAKISTAN 10_ 2412012 achc. 5004; 1004; 11400; 5022; 12: 10%, 412200 TDCA. AFGHANISTAN 000 0023 320 000. 0121010 YEMEN 10: 0e 14; 421020 MTN, YEMEN 22_ 100 542 140 4_ 402300 IRNCEL. IRAN 1100 40 432352 IRAN 00 435020 INVALID 2 5430040 TAJIKISTAN 12E 52% 433320 444440 0; 444441 INVALID 120 452040 VI ETEL. VIETNAM 40 452010 LAOS 12 12_ 5400022 EINVALID 420010 BANGLAOESH 42 E510000 INOONESIA 40; 012000 . OOTE DIVOIRE 20 0 020040 GABON 4: I 032010 SOMALIA 04024 032002 NLINK. SOMALIA 2 I 040002 TELOEL. ZIMBABWE 20. 040011 MOSTEL, NAMIBIA 0 004510 INVALID 00 002010 2' 23 of 24 TOPSECRETSTRAPI This information is exeran from disciowre under 00: Freedom of information Act 2000 and may be Subject to exemoiion under oLher information legislation. Refer disciowre requesLs Io GCHQ on (non-sec] or email TOPSECRETSTRAPI
4 IMSI results broken down by network code Network oode Location Period 1 Period 2 Period 0 Period 4 Period 5 Period 0 000000 INVALID 4 000021 INVALID 0' 0' 012400 INVALID '111111 INVALID 4' 123454 .INVALID -. 201002 INVALID 4 210231 INVALID 220012 SERBIA 100' :222013 ITALV 2 2 1' 220010 ITALY 12 220020 PROMNT. MONTENEGRO 50 224113 NOVA. ICELAND 2 02 40 I 340041 . FRENCH GUADELOUPE AND SAINT MARTIN 0 345120 INVALID 22 045012 INVALID 0' 052040 31 002001 INVALID 110' 3000400 SAINT LUCIA 22: 404040 IDEA-3L, INDIA 0. 404041 IDEA-3L, INDIA 00 404120 INDIA 10' 404002 VOPNOG. INDIA 00' 410011 PAKISTAN 10_ 2412012 achc. 5004; 1004; 11400; 5022; 12: 10%, 412200 TDCA. AFGHANISTAN 000 0023 320 000. 0121010 YEMEN 10: 0e 14; 421020 MTN, YEMEN 22_ 100 542 140 4_ 402300 IRNCEL. IRAN 1100 40 432352 IRAN 00 435020 INVALID 2 5430040 TAJIKISTAN 12E 52% 433320 444440 0; 444441 INVALID 120 452040 VI ETEL. VIETNAM 40 452010 LAOS 12 12_ 5400022 EINVALID 420010 BANGLAOESH 42 E510000 INOONESIA 40; 012000 . OOTE DIVOIRE 20 0 020040 GABON 4: I 032010 SOMALIA 04024 032002 NLINK. SOMALIA 2 I 040002 TELOEL. ZIMBABWE 20. 040011 MOSTEL, NAMIBIA 0 004510 INVALID 00 002010 2' 23 of 24 TOPSECRETSTRAPI This information is exeran from disciowre under 00: Freedom of information Act 2000 and may be Subject to exemoiion under oLher information legislation. Refer disciowre requesLs Io GCHQ on (non-sec] or email TOPSECRETSTRAPI
Page 24 from PCS Harvesting at Scale
TOPSECRETSTRAPI Domains connected to traf?c Must fruitful domains 7?30 EDU- 3 500? ?a n: =3 93400? 0: BI- 300? u: i 200? 100? D- J: :EcuEmaa: 9;;th mag-ME! J: {1:30 ED ?5 0 Eli-Eb an mm =1 240f24 This informaL?Lcn is from under Lhe Freedom of lnforn1aL?LonAr2L2000 and may be Subject to exemplion under oLher information legislation. Refer requesLs L0 GCHQ on (nun-sec] or email TOPSECRETSTRAPI
TOPSECRETSTRAPI Domains connected to traf?c Must fruitful domains 7?30 EDU- 3 500? ?a n: =3 93400? 0: BI- 300? u: i 200? 100? D- J: :EcuEmaa: 9;;th mag-ME! J: {1:30 ED ?5 0 Eli-Eb an mm =1 240f24 This informaL?Lcn is from under Lhe Freedom of lnforn1aL?LonAr2L2000 and may be Subject to exemplion under oLher information legislation. Refer requesLs L0 GCHQ on (nun-sec] or email TOPSECRETSTRAPI