People who defend this under the heading of "it's their service, don't use it if you don't like it", or "they're doing this for your convenience" completely miss the point.
There is a reason why we had strict regulations (a dirty word on HN, I know) for "old fashioned" mail and telephone. To eavesdrop on people's private communication was considered a disgusting practice that belonged in totalitarian regimes, and an unacceptable violation of people's rights.
Modern online services have circumvented such regulations, but that doesn't make what Microsoft, or Facebook, or Google are doing any more ethical or socially desirable.
All of this casual disregard for basic ethics can't continue without a serious backlash. And such a backlash won't just hit Microsoft e.a., but our entire industry.
It's time we stopped considering ourselves to be untouchable just because the law hasn't caught up yet, or because the majority of the people haven't figured out what the fuck we're doing.
Some changes through technology are unstoppable. This however, isn't one of them. It's a choice.
Disclaimer: I work for MS. My opinions are my own, but they are biased.
I think we need to make a distinction between automated services and humans eavesdropping. I'd feel weird if someone was snooping in on my conversations and clicking my links, but on the other hand I very much appreciate the little bot that sits in my IRC channel and displays the title of any page linked. Both monitor the chat and access links, but I value one and feel weird about another. I don't think there's a way to truly make a distinction between the two though, and I think saying, "nothing is ever allowed to access your communications" removes the possibilities for a lot of added functionality (the link bot being just the base camp of the mountain of things that are possible). I think the better choice here is to ensure that the public has a way to communicate securely, and that our mental model of "trust usually, distrust as the exception" needs to move to "distrust usually, trust as the exception". This is similar to how sudo works in a way - we maintain a lower level of security usually for the convenience, then escalate only when needed.
2. Joining/forming another IRC channel is cheaper than creating a competitor to Skype.
3. MS looking at your messages implies that they have the ability to do so. If the conversation were entirely secure, only the other end of the conversation would be able to.
4. MS eavesdropping at all means that when law enforcement comes knocking, they can do so too. Law enforcement is not always your friend. More so in some countries than others.
5. The 'link bot' can be implemented in the client, rather than by a middle-man.
6. Such 'beneficial processing' could be opt-in at the client level. The message is encrypted between clients. Once a client has it, the message can be processed locally or sent off for processing. Sure this adds delay, but it's a trade-off for privacy.
The distinction between eavesdropping and an irc bot is intent, context, and result.
Intent: A irc bot that provide titles do so to translate machine readable URL's to human readable Title's to benefit both the one who writes it and reads it.
Intent: A eavesdropper tries to extract meaning out of being in a privileged position between two speakers.
Context: An irc is a public forum or at least a forum for invited participants. Expectation of privacy is low.
Context: A private communication between two parties is private. Expectation of privacy is high.
Result: An IRC bot that do what everyone in an irc channel is aware of is good. People feel happy by the increased value.
Result: An eavesdropper makes people feel scared, insecure and angry. It makes society worse, causes physical damage to people, and are destructive.
It really doesn't matter if its an automated or a human that does the eavesdropping. intent, context, and result do.
If that "automated eavesdropping" can be accessed by humans later, than no I don't think there should be a difference at all. It's the same thing. The only difference with the latter is that it doesn't happen in real-time. But the damage can be just as bad. So from that point of view, there's absolutely no difference.
The bot is "in plain sight", and can be kicked and banned by ops. That's a major difference. Further, the bot is only given the ability to eavesdrop when enabled by authorized participants (e.g. it was either invited or not kickbanned by people who could see when it joined the channel).
The visibility and lack of expectation of privacy creates a very different situation. If this "MS bot" showed up in your conversation and said "hi, I'm going to visit every URL you post unless you kick me off this conversation", it'd be a very different situation.
What about when the automated service makes a summary of an email conversation?
Honestly I don't know what to think of this matter. All I know is that the capabilities of a malevolent intelligent individual are skyrocketing frankly.
As a dictator once said: Wish for sunshine, but build dykes.
ircbots generally operate in public or group channels. I think there' a distinction between this and a bot operating in a private communications channel.
>I think the better choice here is to ensure that the public has a way to communicate securely and that our mental model of "trust usually, distrust as the exception" needs to move to "distrust usually, trust as the exception".
I can't square this with your previous stance easily. This would for me imply that private communicates should have end-to-end encryption. And that another party couldn't intercept URLs unless they were explicitly transmitted out of band by the client (which I would guess isn't the case here).
> There is a reason why we had strict regulations (a dirty word on HN, I know) for "old fashioned" mail and telephone.
I would argue that the reason had to do with the fact that you couldn't encrypt, and you couldn't choose your provider. In a time when people could encrypt everything if they just cared, and when there are a ton of mostly independent ways to contact people and it's even easy to host your own, I wouldn't say that such regulations would be a good idea.
What were the regulations that prevented private landline providers from snooping on conversations?
Personally, I find this disgusting as well, and I agree that the lack of ethics in our profession is a huge problem, though fueled by user ignorance and apathy, but as much as I'd welcome an healthy backlash, I shudder to think of the lobbies that would "inform" the regulators when drafting such laws.
The story of ultra-wealthy shipping magnate Aristotle Onassis is very interesting to read about. He was brazen in his profiteering and I've read stories that his rise to fortune started by eavesdropping on wealthy citizens making deals and buying stock etc when he worked as a telephone switchboard operator.
There was no expectation of privacy with early phone systems. I'm only 31 but when i was very young my parents (in small town Washington) had a party line [1] shared with some neighbors such that you could listen in on each others calls. See also the archetype of the gossipy telephone operator.
There was an tiny anecdote on a TED talk about the speakers grandmother working at a telephone switchboard in India. She regularly heard conversations she shouldn't have, including conversations with Nehru (the then president of India).
When MS bought Skype they changed supernodes from peers to company owned Linux boxes.[0] This change gives them the ability to eavesdrop on any conversation.
My friend "Alice" (a Chinese national studying in the US), recently sent a present to her friend "Bob" in the Chinese army and talked about it on Skype.
The Chinese Army found out that Bob was receiving a gift from the US and tracked down the relevant Skype conversation. Bob was interrogated about Alice and what the gift was for.
Microsoft complies with all governments' legal requests, as it should. I have no doubt the US government has made similar requests of MS.
Skype's original protocol made eavesdropping harder, but not after the changes Microsoft made.
Regardless of their supposed intentions (which they could easily lie about--this IS a business we're talking about), the supernodes do allow very easy eavesdropping given a warrant.
> Microsoft complies with all governments' legal requests, as it should.
Microsoft can make such requests pointless if they choose to do so, by not having unencrypted data of people's private conversations in the first place.
> Microsoft complies with all governments' legal requests, as it should.
For governments from the set of countries that Microsoft has a presence in, or just from governments in general? I can't imagine they respond to DPRK requests, nor should they.
And because they have a pretense in China sure they cooperate with China. As far as I know they have no presence in the DPRK though, as they are not even allowed to sell there.
Even in our "western" democracy, the right to vote is limited (whether age, residency, prisoner-status etc). In the USSR, only members of the Soviets (worker's councils) could vote, as the Soviets represented the people. In ancient Athenian democracy, only male citizens could vote. According to Plato, that was too wide-spread and only philosophers should vote, as they were the only people wise enough to make serious decisions.
But all of those forms fall under the name "democracy" - it just depends who you define as "the people".
Are you sure Alice or Bob weren't using the TomTom version of Skype? From what I understand, that version is especially modified to comply with the regulations of the PRC.
Disclaimer: work for Microsoft in China but clueless about how Skype works here.
This is wrong. So tired of seeing this, how is this possible. This took individual people out of the pool of Super Nodes. This results in the same amount of traffic being TURNed as was being TURNed before and everything else is purely peer-to-peer. Flatly, you're wrong, please stop spreading that info.
I tested this with Facebook a while back. I put two videos up of various lengths and linked to them directly by IP in Facebook chat. I also included a restrictive robots.txt. In both cases Facebook downloaded the entire videos from my server. I repeated the experiment with several other providers and the results were varied. Skype, for example, does not download the entire video and seems to respect the robots.txt...
Not sure if this is still the case for Skype but, I just tested on FB again and they pulled the whole video...
That is certainly one possible explanation. For fun if you share a Google Drive video you can watch the FB user popup on the active user list and then disappear after a certain amount of time that varies based on video length and quality.
Facebook generally will present the OpenGraph data from pasted links; for sites lacking it they might present the first image on the page.
It would be vast overkill to download a whole video just to present a thumbnail, and the user experience in the delay while the whole file is downloaded and processed would render the feature fairly useless.
Which leaves the question of why Facebook would do this open... It might just be really bad practice and an under-/non-optimized crawler.
Or they download the whole video to pull it through an analyzer to look for stuff prohibited to be republished via FB, such as child pornography. If I were FB, I'd screen everything published via my site.
I have to imagine they contract that work out. Unless doing that was my business, I wouldn't want to touch that training set with a thousand foot pole. Well I wouldn't regardless..., but you see what I'm saying about encapsulating and isolating that.
I've always wondered about that. Surely possesing the 'training data' (basically child porn) is illegal. Only police can posses it for investigations etc. So how can you make something that can recongnise it?
There are companies who have special licenses, etc to do that type of work.
I was supposed to meet with a Scandinavian vendor a few years ago that provides a service where you can match SHA hashes of files against known illegal content. I recall the pitch deck said that they had millions of records to match against, which I found incredibly depressing.
I ended up bailing on the meeting... it's a subject I don't want to learn more about.
It's humans. There was a story about the 'dark side of Google' within the last few months about how Google contracts this out, and they burn out, but I'm not searching for that story directly myself.
Not only do these exist, there are significant institutional practices that exist to facilitate the sharing of detection signatures between companies and between companies and various governments.
It's good to prevent the re-sharing of child pornography, but a lot of companies don't like to talk about the systems because they're very much the same systems that the same companies are busy telling the content industry that they can't build to enforce copyrights. So they're generally kept pretty hush-hush by a kind of gentleman's agreement.
I think general copyright is actually a thornier problem and this secrecy is unnecessary. CP is just something you can block if you see it. Tracking down exactly which uses of a copyrighted work are intended to be allowed and which are not is really much harder. And I think that's a legitimate argument, especially because it recasts the copyright debate in terms of compromises that make sense: if clearing a bunch of rights is confusing, maybe certain types of compulsory licenses are more attractive. But that debate is for another thread...
| I think general copyright is actually a thornier
| problem and this secrecy is unnecessary
1. When we talk about 'detecting copyright infringement' there is an implicit assumption that we're talking about the IP of major corporations. No one is going to bend over backwards to prevent copyright infringement of 'the small guy'[1].
2. Even if you detect that a video contains copyrighted material, there is no central registry for you to determine if the person (or entity) has permission to distribute/use that material. Even if there was such a registry, how do you link it to an online id on some random website?
3. There is no middle-ground for fair use. Fair Use is something that is usually only definitively determined after a court battle. If restrictions are too strict, then the only way for one to exercise their Fair Use rights is a court battle. This effectively means that Fair Use would be dead for all but some rich, motivated people (at least on large content networks, e.g. YouTube).
[1] Unless it wins brownie points with the public for getting a system in place to protect the 'big guys,' who are obviously the only ones that matter.
It wouldn't be surprising. YouTube looks for video fingerprints of copyrighted data. Repurposing such technology to find copies of known CP videos would be pretty straightforward.
If you start filtering for child pornography, and people know about it, they will come at your hard if you miss anything. Even for a technology that gets all known child porn videos, people would be beating down the door to sue/bad mouth Facebook for missing the new ones. Mostly due to people being people, but also because the technology is 'magical' to far too many people in society so they feel like there are no limitations (or it's hard for them to grasp them).
I don't think he's talking about necessarily private, but they detect nudity on the "publicly" available images available in your albums which your friends an see...
So, you post a comment in a private Skype "please don't visit this link, it's copyright and reproduction of a single copy requires a license at a cost of $10 million USD due to the sensitive nature of the content". You make sure the link is to a brand new unshared domain with robots.txt denying access.
MS download and you've got them on copyright infringement for which there is no apparent excuse outside of wilful negligence.
What's the multiplying factor the MPAA use for copyright infringement, something like 1000 times the regular licensing fee.
"Bob, if you should ever need to delete 100% of the content that I've created for you, you would follow this URL. Let me be clear, it will destroy USD $1,000,000 worth of goods."
Well per the HTTP standard, GET requests should be idempotent. I kid, but imagine seeing pompous development advice stapled to official justification for some internet sleuthing!
Do you mean that GET requests shouldn't modify data? A URL that deletes everything would be idempotent; if it's hit once, it's the same effect as if it's hit many times.
Yes. Idempotency is just one property of the GET request, what parent described would be a DELETE request (these too are idempotent). So I could have been more precise and funny with that joke, I guess.
"In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe". This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested."
The document section also makes a distinction between "safe" and "idempotent" methods. That's not the same thing.
Yes, and you regularly see notes about how [some bot] has caused some developer to learn why this is bad practice whenever a link like that gets accidentally exposed in a way that allows crawling.
Methods can also have the property of "idempotence" in that (aside from error or expiration issues) the side-effects of N > 0 identical requests is the same as for a single request. The methods GET, HEAD, PUT and DELETE share this property.
Maybe I'm sick, but sometimes I dream of a world where there a people that actually go out and do stuff like this, to bring attention to the stupidity of some laws.
I imagine some people do go out and do it, then realize that it's ridiculous and pointless, perhaps after every lawyer they contact tells me that there is no way they would take such an insane case.
IANAL "licences" are basically contracts (to buy a good). Them clicking on a link isn't capable of entering a contract, since it's a machine that does the clicking and isn't really a meeting of the minds. Contracts can't be unconsciensable and extreme. So you won't really be able to sue them.
Copyright infringement doesn't require establishment of a license, in fact just the opposite. The license is there to underpin the value of the good that is infringed. The court might not buy it, but that's a separate issue to that of infringement, it's only there to establish the level of damages.
The machine excuse is inoperable - the agreed mode of operation of such machines is to refuse to follow links in a robots.txt file. If the machine has been programmed to ignore this de facto regulation then there's mens rea to go with actus rea performed by the machine for and at the behest of those who programmed it.
I can't program a tank to drive me on to your land and then claim that I haven't committed the tort of trespass, same goes here.
IANA(IP)L either, but sometimes I play one on the internet.
Robots.txt are an industry-standard, but non-legally-binding method of denying access. If you wish to actually deny access, the proper, legally-recognized means for doing so is by actually denying access (i.e., by using .htaccess or similar means).
Also, mens rea (i.e., "guilty mind" or colloquially, intent) is a criminal law concept that has no relevance in the civil/tort law realm. It doesn't matter whether the machine has been programmed to ignore robots.txt because no law requires a software program to obey robots.txt.
Indeed your assertions about robots.txt are the expected logical response. However when the issue of copyright infringement for Google (and other search engines) came up in the past [eg storing entire page text, thumbnailing and such] the response, that was apparently accepted, was that as a protocol exists (namely robots.txt) that enabled a website owner to refuse access to works there was no actual infringement. The onus being placed on the works owner to label the work as off-limits using robots.txt.
If that line holds then the corollary appears to be that programming a robot to duplicate linked content deny-ed by robots.txt is certain copyright infringement.
Copyright law is a weird one, it sits in a sort of limbo between tort and crime hence it seems relevant to not only consider the balance of probabilities and the test for tortuous infringement but also the criminal considerations. Moreover in various jurisdictions it's a crime to access a part of a computer system without authorisation, this would appear to apply to MS's directorship in this case; such laws might "require" obedience to established protocols for determining if access is authorised [would be interested if anyone reading can cite caselaw to support/discredit this line of thinking].
"A spokesman for the company confirmed that it scans messages to filter out spam and phishing websites. This explanation does not appear to fit the facts, however. Spam and phishing sites are not usually found on HTTPS pages"
From the very next story down from the same publication:
"The company has reported that, since 2009, some malware has been concealing its data traffic by mimicking known instant messaging protocols or, to avoid detection, trying to camouflage its data traffic as HTTP or HTTPS. To achieve this, these trojans copy at least the header of the instant messaging protocol, leaving the remainder of the packets to carry the trojan's encrypted communications."
Looks like they're contradicting themselves here to score some click-bait headlines.
What is the contradiction? A program might be able to 'camouflage' its data as HTTPS to a casual observer but that has nothing to do with HTTPS urls. An HTTPS url won't work unless it's real. Therefore such a thing provides zero justification towards checking HTTPS urls.
The only thing that is required to make a https url "real" is that it is hosted on a server that serves up a certificate that is valid for that domain. It's trivial enough to obtain a valid cert anonymously (shell company with bearer shares somewhere suitable) or find places to upload the malware that makes it available on https urls.
It requires actually implementing SSL, which is a lot more work than using port 443 and faking a couple headers.
Am I misreading the word 'camouflage'? I read it as 'pretends to be' not 'actually uses'.
Even if the trojan is using HTTPS, that is still not a reason to scan HTTPS URLs. The command and control network is completely orthogonal to the links given to users to try to infect them.
Is any of the big IM companies going to offer OTR encryption by default or what?
It's not like they could make a ton of money by monitoring the chats, and even if they did, they shouldn't be doing that anyway. At least with e-mail they have an excuse for not using local encryption (it gets too complicated for the end-user), but they can't really use that excuse for chatting.
So why isn't OTR enabled like yesterday in Gtalk, Skype and Yahoo Messenger? (by default of course, otherwise 99% of the users won't use it).
It's not a big company, but Silent Circle offers end-to-end encrypted voice, video, and text chat. Also secure email, which can use PGP. Here is a summary of their text chat protocol: https://silentcircle.com/web/scimp-protocol/
Is this an official thing, like are companies legally forced to not offer it? Or is it more of an "unwritten" policy, that in order to "make law enforcement happy" they don't offer that type of encryption?
The govt comes by with a subpoena (secret, classified, or public) and requires Microsoft or the customer company to produce communication records that exist in a form that may be used as evidence. Failure to do so is best contempt of court and worst obstruction of justice. No 5th amendment privilege for other people's crimes. So everyone who chooses to store or process messages makes it so the encryption is reversible and they can honor court requests. Nothing is private as a result.
EDIT: I should make it clear I don't agree with the current status quo. Let me answer two very good questions.
> Would they also do that if logging failed during the period requested?
If it can be shown that there is a willful neglect of collecting logs then the govt in the past has gone after companies for some form of conspiracy (most famous: MegaUpload, but Microsoft customers have had their fair share for accounting and securities fraud) or criminal negligence. There is a prevailing theory that companies are responsible for employee actions, and failing to log is seen as unacceptable.
It's only recently (within the last year) that the courts have ruled whether compelling passwords is protected by the 5th amendment, and most systems in place were designed and built using previous assumptions from 10 years ago.
> If this is true, why is OTR offered at all?
It's a checkbox feature required for HIPAA, PCI, and similar. "Must have encryption" -- the standards and IT departments don't say how the keys are managed.
They would get them for contempt of court or obstruction of justice for being not unwilling but rather unable to fulfill the request?
That seems wrong. Would they also do that if logging failed during the period requested? Or if they simply neglected to log in the first place?
Edit to your Edit: I don't see how the 5th is involved in the slightest. If party A has a secret, and the courts subpena party B, then party B's inability to comply has nothing to do with pleading the 5th. They simply do not have the information requested, and in fact never possessed it in the first place. Furthermore, party B cannot possibly be said to have been negligent. The courts subpenaed the wrong person.
The point about the 5th amendment is that prior to last year's ruling, it was assumed that passwords (or private keys) could be compelled for any reason in the US, and systems were built with that assumption. I recommend you look into legal disclosure or e-discovery products. Not right or wrong, but that's the design assumptions backed by a bunch of lawyers in govt and private sector.
The example is incomplete. In messaging systems, Party A uses Party B to send messages, which may be useful for a court case. The government may reasonably subpoena Party B to produce Party A's records since they may not want to alert Party A that it is under investigation.
If Party B (the service provider) has any feature that makes use of the content of Party A's secrets (read: url-checking, auto-loading thumbnails, indexing for search, some types of routing, etc), then there is little ground for Party B to argue it can't decrypt Party A's records for the subpoena. Further, Party B may become a co-conspirator if it keeps incomplete records or destroys records too quickly. Even if the co-conspirator charge is remote and hard to convict, most service providers would prefer to avoid the publicity of such a court case and add decryption anyway.
The entire point of this discussion is that Party B should be constructing their service in such a way that Party A never gives them their secret. Party B would then be able to keep complete and perfect records, turn over all of those records at the drop of a hat, and nevertheless be unable to reveal Party A's secret.
Think PGP + Gmail, except that unlike usual, Google provides you with download of PGP.
The court could subpena Google, and Google could give the courts my PGP encrypted communications. However they would be unable to give them my private key and declaring them to be in contempt of court for that would be a massive miscarriage of justice.
PGP is a pain in the ass to use. However we have more streamlined technologies currently available that provide the same properties in this scenario. What is upsetting to people in this discussion is that companies like Skype are not employing such systems. We know Skype is not because they are employing one such "feature" that you mention (url-checking). They are therefore employing a system that does leave them open to having private information subpenaed.
You're absolutely right in describing the theory behind PGP (or GPG for the purists :), but unfortunately there is not yet a way to build a messaging service that has both features AND privacy. The should in your statement "Party B should be constructing their service" implies and expects a capability that is not (yet) possible to build. The point of my posts were to illuminate the reasoning companies make it easy to decrypt for the US government due to their exposure from in-demand content-aware features and fear of legal action.
Like you said, PGP+Gmail sucks for all parties included on a chain. Clients stop working. Non-users can't read the emails. Gmail spam filtering, ads, search indexing, and labeling all break. The same is true for PGP+Exchange, and most corporate customers much prefer Exchange features to the privacy offered to individuals with PGP.
I'm also not aware of 'more streamlined services' that offer true privacy -- please illuminate them if they exist. Services like Voltage suffer from the same root cause of reversible encryption.
So, customers have to choose: use a service with content-aware features OR use a dumb service that (currently) does not have the features. Most people choose features, and I would venture in this case Microsoft opted for features over pure privacy.
I would be among the first to welcome a way to accomodate both pure privacy and features in a service, and I encourage all to find a way. Please build it!
But if you're referring to Google's "Off the Record" feature, that's not OTR encryption. They just don't save that conversation in your logs. But they still have access to it themselves.
They are legally forced to do so in multiple locations, Blackberry is a prominent example - IIRC Saudi Arabia and India were among the places with legal requirements. In practice, of course, China as well (see articles about China-specific skype versions).
It proves that Microsoft is able to decrypt chats and that's unfortunate. Switch to application that has end to end encryption if that's important (e.g. Jabber with OTR protocol on top http://en.wikipedia.org/wiki/Off-the-Record_Messaging).
But making HEAD or GET requests, whether it's HTTPS or not, shouldn't be a problem — those HTTP methods are not allowed to have any significant side effects.
Scanning of URLs is useful for such service that is often abused to send spam, phishing and exploits.
If you want to chat privately, use OTR https://en.wikipedia.org/wiki/Off-the-Record_Messaging, authenticate your key fingerprints, ensure that neither party's chat program is logging, and that both computers are free of malware.
Well if you're passing session data in a URL (logins or passwords as parameters, not the typical one-click to activate one-time links) you're doing something wrong. Second, Microsoft has no way of distinguishing a normal URL from a URL that has an &password parameter at the end. And finally, perhaps one of their bots is simply crawling the link to, like Facebook does, display a link summary or thumbnail. Or, as the article says, looking for spam.
I think it's pretty well documented at this point that Skype is not a secure video/chat product. But for the 99.9% of users outside of the "never read my data" echo chamber, it seems to be working fine for them. Use what works for you.
If you paste a "one-click to activate one time" link into a Skype session, the bot will click the link and then it won't be valid anymore for the legitimate recipient. That's a terrible idea.
The "one-click to activate one time" links should always lead to a page where the user has to perform a POST to prevent "smart" applications from using the token when trying to show a thumbnail / preview or offer other services.
There is a similar problem with SmartScreen, also courtesy by Microsoft.
You, basically, send an email with a link to someone in Europe only to see it being accessed from some random US IP that doesn't even have a PTR record. With some effort this IP can be traced back to SmartScreen, but what's strange is that it sometimes takes hours for the URL to get hit from such IP. This doesn't make any sense whatsoever, because SmartScreen is supposed to be a pro-active defense against phishing and malware, so it should really be scanning new links in real-time, upon reception. This scenario is arguably even more troublesome than Skype's snooping, because it's not possible to predict beforehand if the mail will end up getting SmartScreen'd.
I am confused, I thought SmartScreen was for downloaded files and website visted through Internet Explorer. What does it have to do with emailing links? Also as far as I understand, SmartScreen only sends in the hashes/part of the hashes to get a match.
Think about the combination of this monitoring with the U.S. CISPA data-sharing provisions:
Microsoft says they are logging and pulling the content of links shared via Skype for spam and malware prevention. This certainly falls under the umbrella of "cybersecurity". Under CISPA, this "cybersecurity" information can be freely shared with the U.S. government without fear of liability, and can be further shared among all government agencies.
This sharing is probably happening already. But CISPA would allow it to be brought out into the open and, particularly, for evidence so acquired to be used in court proceedings and as supporting evidence for search warrants.
I suppose no one is using gmail, gtalk, g+, g hangout, yahoo mail, etc etc here?
The web is open, putting credentials in urls is stupid, and complaining about spider hits is too.
If you think your URLs are safe because no one KNOWS about them, you are simply doing it wrong. Hopefully your URLs are not changing any server side resource, otherwise you have a bigger problem than a spider.
Have you considered those spiders might be verifying that you are not spamming your friends, e.g. your computer could be infected and MS is trying to help your friends?
> If you think your URLs are safe because no one KNOWS about them, you are simply doing it wrong.
Case in point: I regularly bring up test web servers on my home server on weird ports, and Googlebot regularly finds them; I don't particularly care - anything I want to be private I password protect, but these are sites that are linked from nowhere. But there's any number of acceptable way it could find them: Some of the tools I use have default setups of Google Analytics that'd leak any urls I visit on them; I often click through links to other public sites from them, and any of those sites might leak the referrer.
So I agree with you that you should expect that URLs stay private.
But you can still have privacy expectations about the exchange of a public URL. E.g. someone might not like the thought that there is potentially a record that they're visiting certain embarrassing sites.
The reason there is a shitstorm over this is not that Skype is necessarily or even likely doing anything nefarious with the URLs, but that people had an expectation of privacy about their conversation that they then found out clearly is not being met.
If Skype/MS wants to scan every link, or retrieve them and put the contents on billboards in public spaces, there's nothing inherently wrong with either option. What is wrong is not realising that the actions they are taking are at odds with peoples privacy expectations and making sure to communicate what their actual expectation should be to users.
It's funny to watch documentaries about the collapse of the soviet union and then read about US companies doing this.
As long as a government is not doing it, I don't really any problem. But if american companies sell those tech to other countries, maybe there's a problem. Aren't there laws that prevent US companies to sell spying tools to some countries ?
I really think that as time pass, the world will want more and more p2p or pseudonymous/anonymous techs to evade such problems.
I can confirm a case of http (not httpS) link sent in skype accessed from microsoft.
Access happened from 65.52.100.214 about 6hrs 40 minutes after I shared it in skype. There were 4 http requests, while I shared the link with 2 people.
Unfortunately the server logs are not detailed enough to understand what exactly been requested, given the page was under basic http authorization (with credentials NOT in URL).
This suddenly escalated from "thread on HN" to personal encounter and realization that probably all Skype messages worldwide are stored by MS FOREVER...
Within 15 minutes of setting up a https CI environment, complete with robots.txt, Googlebot was hitting the DNS name which also wasn't public or previously used or easily guessed.
Google gets a lot of leeway from people. If you have done SEO, you will learn that the Googlebot doesn't always respect the robots.txt. Requesting to de-index a page may take weeks or even months. The quickest way is file a DMCA complaint for the link to your own site.
Recently, they started tracking all downloads made on Chrome (for malwares), it includes the filename, the URL, IP and the timestamp. Sucks hard since I love Chrome and the only way to disable it is to disable the website malware checker (which only uses part of the hashes anyway).
Another possibility was that the hostnames were leaked via the SSL certificate. I've seen evidence of spiders using this for discovery, including Google. Your best protection in that case is to use a wildcard certificate, if you want it to validate.
If MS doesn't request a GET operation, is this still a privacy issue? The writer is bombing his/her own theory with saying "MS cannot understand if the site is phising related or not, by just sending HEAD request".
This happens when you install toolbars as well, you think a URL is private, then all of a sudden private URL's are getting hits from googlebot/etc. Its not a big leap to assume that this happens in anything else...
That seems like something of a bizarre argument - there's a different between a computer reading it and a person doing so. I'm not convinced privacy is infringed if it goes through, perhaps detects particular phrases (which are presumably related to illegal activity in any case) and checks links for malware. I don't have a problem with that.
You have no evidence for any of that. You literally just made that up. We don't know what people or what computers have this information, how long they are keeping it, or what they are doing with it. See, if the communications network was really secure, we would have an answer for all those questions.
couldn't they be using this to prevent/detect spam?
edit: the article claims this can't be so because the page only does a HEAD request, though a HEAD request could be useful if you wanted to detect an HTTPS domain with ephemeral pages (which perhaps, could be a good feature in detecting spam domains)
FTA: No. It is not a fitting explanation for spam or phishing prevention. The author claims that these sites rarely use https.
Furthermore they used test URLs containing hypothetical login credentials and showed how skype would get access to these.
The last troubling bit the author points out is that there seemed to be anomalous traffic to URLs shared in a skype conversation, where a microsoft IP seemed to attempt what they call a "replay attack".
> It is not a fitting explanation for spam or phishing prevention.
It is very common for spammers to break into other websites (using simple well known exploit) and create links redirecting to the site hosting the malware. So a site should not be excluded just because it has SSL.
Because "rarely" means "they should just ignore them- no malware hoster would ever use HTTPS, especially not if they ever figured out that Microsoft was only checking HTTP URLs". Also, anyone sending log-in information in a GET request is doing it wrong.
I don't know about Skype but I know other parts of MSFT would frequently try and classify links on whether they had malware, whether they were phishing attempts and so on. Doing this on IM networks was important as they could cause so much distribution of such malware through automated sending of messages.
I would be surprised if Microsoft wasnt doing this as it would leave users at risk.
Also, sending login credentials over HTTP GETs? That's a pretty contrived scenario. The HTTP HEAD might be a red herring - it might just be checking to see whether it redirects somewhere else/whether the URL has already been seen. Perhaps this URL didn't set off the spam/malware machine learning models to initiate a full crawl/human review.
With HEAD you should get Content-type -header, which could be used for detecting malware (ie if the content type points towards an executable). Frankly I'd be more worried about web-services that carry authentication in the URL than Skype checking them out.
I can confirm that HTTP links, not just HTTPS, are visited by a Microsoft IP if you paste them in to Skype. I noticed it happening a couple of weeks ago.
Alternate headline: Microsoft protects hundreds of millions of Skype users by going to the effort of checking even https URLs in chat for malware and spam.
Or "Skype's secure peer-to-peer traffic is being centrally decrypted and inspected at Microsoft." Obviously no one sane should ever have trusted this, but I've seen people try to explain to me that Skype's architecture is inherently private and thus secure even to snooping by Microsoft. This is proof to the contrary.
The headline would be wrong (but I do not blame you, the translation is more than bad): Microsoft only scans https urls, not http urls. This doesn't match with their explanation that it is a malware scan.
Alternative headline: If microsoft can read your skype chats so can any government that has a serious relationship with microsoft (read all NATO nations + other).
"Why does Skype even have any clickable links in it at all if Microsoft can't be bothered to keep the obvious malware out?"
Another comment:
"Re the conclusion: to protect yourself, don't run an OS that will silently install software just because you clicked on a blue link in a program published by the OS vendor.
Steve Ballmer should be jailed as an accessory for allowing this."
The browser follows the link. The browser can absolutely be checking for malware links, and warning you before you follow them. And you should be able to configure your browser to not do that, if you don't want it to.
So, it's not "damned if they do, damned if they don't," from my point of view, they have to do it in the right place. Within Skype itself is absolutely not the right place.
When the viral infection is spreading at 4000 messages a minute, by the time the Skype team informs the IE, Chrome, Firefox, Opera anti-spam teams and waits for them to add it to their blacklist, they might as well not do anything.
What's wrong with scanning links in chat for malware on the Skype servers so that they can immediately stop such messages from spreading?
I think the problem is that we are getting the worst of all worlds right now: successful malware, messages can only be sent when both sides are online (why, if there is a middle-man??), and messages are being snooped on. Oh and clients still display them out of order.
Unlike, say, FB messages, Skype has always felt so brittle and unreliable that I hoped it was peer-to-peer, and this news came as a bigger shock.
Okay, first, I'll just say it: I'll bet Microsoft has a team of developer-relations/pr people commenting on HN. HN is so important to the startup world, and Microsoft sees tech evangelism as war ...
It might be a call from 1998 but the halloween documents are real. You are really naive or just stupid if you think these kind of operations arn't being executed till this day.
You mean I could get paid for what I do for free?! I didn't know that, damn!
If there is such a team on HN, they're doing quite a shitty job by the looks of it. Even a review of the Surface is routinely flagged off the front page for daring to be on the same page as a new Chromebook announcement.
> They could distribute a list of malicious URLs to the clients and do the checking locally.
A list of essentially every known malicious link on the entire Internet? I speculate that would be quite a few gigabytes in size, and would only get larger if they wanted to store the links in some data structure that could be scanned in a practical amount of time. And said list wouldn't be complete, either- it would only cover known links that Microsoft had seen before, and would only record their malicious state at the time of the last scan, not now.
> They could ensure that clicking a link doesn't compromise your system.
These sorts of vulnerabilities often come from obscure and surprising places (e.g., their TrueType font parsing code), from blocks of code that have been around for a decade or two without the vulnerability being noticed. Identifying security vulnerabilities is notoriously hard, even when you're not contending with the complexity and scale of Windows and all its associated applications.
There's an argument to be had about the acceptability of the privacy/security tradeoff Microsoft could provide by eavesdropping on your conversations, but your implication that such a tradeoff is mostly or entirely avoidable is untrue.
I don't think either removing clickable links nor fixing "OS that will silently install software just because you clicked on a blue link" should only be possible by allowing themselves to eavesdrop.
Apparently it says so on their privacy page:
http://www.skype.com/en/legal/privacy/Skype may use automated scanning within Instant Messages and SMS to (a) identify suspected spam and/or (b) identify URLs that have been previously flagged as spam, fraud, or phishing links. In limited instances, Skype may capture and manually review instant messages or SMS in connection with Spam prevention efforts. Skype may, in its sole discretion, block or prevent delivery of suspected Spam, and remove suspicious links from messages.
If phishing and malware was spread thanks to skype, what would people say?
There is a reason why we had strict regulations (a dirty word on HN, I know) for "old fashioned" mail and telephone. To eavesdrop on people's private communication was considered a disgusting practice that belonged in totalitarian regimes, and an unacceptable violation of people's rights.
Modern online services have circumvented such regulations, but that doesn't make what Microsoft, or Facebook, or Google are doing any more ethical or socially desirable.
All of this casual disregard for basic ethics can't continue without a serious backlash. And such a backlash won't just hit Microsoft e.a., but our entire industry.
It's time we stopped considering ourselves to be untouchable just because the law hasn't caught up yet, or because the majority of the people haven't figured out what the fuck we're doing.
Some changes through technology are unstoppable. This however, isn't one of them. It's a choice.