data ethics – R-posts.com

Join Data Natives for a discussion on how to curb Climate Change and better protect our environment for the next generation. Get inspired by innovative solutions which use data, machine learning and AI technologies for GOOD. Lubomila Jordanova, Founder of Plan A, and featured speaker, explains that “the IT sector will use up to 51% of the global energy output in 2030. Let’s adjust the digital industry and use Data for Climate Action, because carbon reduction is key to making companies future-proof.” When used carefully, AI can help us solve some of the most serious challenges. However, key to that success is measuring impact with the right methods, mindsets, and metrics.

The founders of startups that developed innovative solutions to combat humanity’s biggest challenge, will share their experiences and thoughts: Brittany Salas (Co-Founder at Active Giving) | Peter Sänger (Co-Founder/Executive Managing Director at Green City Solutions GmbH) | Shaheer Hussam (CEO & Co-Founder at Aetlan) | Lubomila Jordanova (Founder at Plan A) | Oliver Arafat (Alibaba Cloud’s Senior Solution Architect)

Details
What? Climate Change & AI for GOOD | DN Unlimited Open Forum powered by Alibaba Cloud
When? October 15th at 6 PM CET
Where? Online, worldwide
Register for FREE here: https://datanatives.io/climate-change-ai-for-good-open-forum/

We are excited to announce the latest major release of rIP (v1.2.0), which is an R package that detects fraud in online surveys by tracing, scoring, and visualizing IP addresses. Essentially, rIP takes an array of IP addresses, which are always captured in online surveys (e.g., MTurk), and the keys for the services the user wishes to use (IP Hub, IP Intel, and Proxycheck), and passes these to all respective APIs. The output is a dataframe with the IP addresses, country, internet service provider (ISP), labels for non-US IP Addresses, whether a virtual private server (VPS) was used, and then recommendations for blocking the IP address. Users also have the option to visualize the distributions, as discussed below in the updates to v1.2.0.

Especially important in this is the variable “block”, which gives a score indicating whether the IP address is likely from a “server farm” and should be excluded from the data. It is coded 0 if the IP is residential/unclassified (i.e. safe IP), 1 if the IP is non-residential IP (hostping provider, proxy, etc. – should likely be excluded, though the decision to do so is left to the researcher), and 2 for non-residential and residential IPs (more stringent, may flag innocent respondents).

Including some great contributions from Bob Rudis, some of the key feature updates included in v1.2.0 of rIP are:

Added discrete API endpoints for the three IP services so users can use this as a general purpose utility package as well as for the task-specific functionality currently provided. Each endpoint is tied to an environment variable for the secret info (API key or contact info). This is documented in each function.
On-load computed package global .RIP_UA which is an httr user_agent object, given the best practice to use an identifiable user agent when making API calls so the service provider can track usage and also follow up with any issues they see.
A new plotting option that, when set to “TRUE”, produces a barplot of the IP addresses checked with color generated via the amerika package.
Users can now supply any number of IP service keys they wish to use (1, 2, or all 3), and the function will ping only the preferred IP check services (formerly, the package required all three keys or none to be entered).
For those interested in reading more and citing the package in published work, check out our recently published software paper in the Journal of Open Source Software.

Here is a quick demo of the package with some fake (auto-generated) IP addresses:

# Install and load rIP, v1.2.0
install.packages("rIP")
library(rIP)

# Store personal keys (only "IP Hub" used here)
ip_hub_key = "MzI2MTpkOVpld3pZTVg1VmdTV3ZPenpzMmhodkJmdEpIMkRMZQ=="

ipsample = data.frame(rbind(c(1, "15.79.157.16"), c(2, "1.223.176.227"), c(3, "72.167.36.25"), c(4, "5.195.165.176"),
                             c(5, "27.28.25.206"), c(6, "106.118.241.121"), c(7, "231.86.14.33"), c(8, "42.56.9.80"), c(9, "41.42.62.229"),
                             c(10, "183.124.243.176")))
names(ipsample) = c("number", "IPAddress")

# Call the getIPinfo function to check the IPs
getIPinfo(ipsample, "IPAddress", iphub_key = ip_hub_key, plots = TRUE)

Running the code above will generate the following plot, as well as the dataframe mentioned above.

Note that to use the package, users must have valid personal keys for any IP service they wish to call via the function. These can be obtained for free at the corresponding IP check services.

Finally, we welcome contributions and reports of bugs either by opening an issue ticket or a pull request at the corresponding Github repository. Thanks and enjoy the package!

Tag: data ethics

Climate Change & AI for GOOD | Online Open Forum Oct 15th

A New Release of rIP (v1.2.0) for Detecting Fraud in Online Surveys