Skip to main content

tv   Key Capitol Hill Hearings  CSPAN  September 24, 2015 11:00pm-12:01am EDT

11:00 pm
were in. we want to know what the relationships between all these things and what veterans programs you received. we don't know how efficient these -- we don't nope the outcomes of all these veterans programs. the budget is $163 billion last year. and we just don't know very much about the efficacies of these programs. we just don't know. but by merging these data sets together, we can figure this out. and then as i mentioned more competitive businesses. if we look at the u.s. economy, the economy has been growing 2%, 2.5% the past couple of years. one of the really big questions we have out there is productivity growth. most of you probably don't think about productivity growth that much, how fast the economy grows, it's how fast our labor
11:01 pm
force is growing plus how efficient our economy is becoming. if you add those two up, gdp growth. so what we're seeing the last four years productivity growth averaging less than 1%. historically that's low in the united states. that's kind of really retarding u.s. growth. how are we -- why aren't we growing faster in this kind of data revolution where we hear all these great things about data? there's this big conundrum there. maybe what it is we have all these businesses gathering all this information and they really haven't yet materialized the benefits from all this data. but from a macro perspective that's a huge, huge question. so how much all this data stuff we're talking about, how much can this improve the u.s. economy, okay? so i'm just going to throw out a couple real rough numbers here. there's lots of studies out there who always talk about trillions of dollars and billions of dollars and so on and so forth. i always find those numbers really hard to understand. so i'm an economist.
11:02 pm
i've been studying the economy for most of my professional career. i can't understand $1 trillion. maybe if i was warren buffett i could understand $1 trillion. he has yet to adopt me. you have all these numbers. let me put them into context. if this could improve the economy by just 1%, think about the improvements in government that we could get from this. think about the improvements in the private sector. 1%. that's not very much. is it? okay. so 1% of gdp is $175 billion, that's hard to relate to. it's such a big number. that's $543 a person. so that's about $1,300 for a typical american household. that's a lot. the median household income is about $52,000. so that would be a nice, big bump. that's if we could improve the economy one percentage point from this data revolution. let's be a little more optimistic.
11:03 pm
over the next couple years this data revolution can improve our economy by, say, 5%, which i still think is actually a conservative estimate. it improves it by 5% and you get close to $1 trillion, a concept hard to understand, but that's over $2,700 per person. and, again, there's about 2.4 people in the typical american household, so now all of a sudden you're talking over $6,000 per household increase. so that's why everything that you're doing, everything that everybody out there is doing in this data space is just so important because the better information that we have, the better way we can analyze it, the better decisions we can make and get better outcomes and get those better outcomes because we get better government, because our businesses become more productive. we have businesses who actually thrive in the data space. and then, more importantly, what's not quantified here is that our citizens become more
11:04 pm
informed. so i'm not sure what dollar value put on that but that's important as well. so let me repeat the main takeaway here which is i think we really are at this tipping point. we have more and more data. we have more and more groups like this who are advocating to make data accessible, to make it usable and make it more actionable. but how quickly we reap these benefits and these benefits are actually huge. the numbers i gave you just a moment ago, i think those are actually somewhat kind of conservative. that's not much of a stretch to get there but they could make a huge improvement in the quality of lives of just so many hundreds of millions of people who live here. we have to make our data more and more accessible and then we also have to -- one of the biggest constraints i think we're facing and, again, when i talked to the people in the lcñ private sector, we have to invest as a country to the skills so we can really take advantage, so we can really leverage this kind of data revolution.
11:05 pm
so with that, thank you very much. [ applause ] all right, thank you, mark. we have time for about ten minutes of questions. and i think i'll start out with the first one. so you just finished six years in government. you've been advocating for fact-based decisions. you've been advocating for the release of high-quality data. what's next?'(!8÷ >> so i'm single, so i'm looking to marry an heiress. this is on tv, that's unfortunate. i don't work for government anymore, so i don't care. but more seriously, one thing i'm thinking about doing is writing a book. and what i'd like to do with this book is talk about all the different areas where data can really improve the quality of our lives and really improve our country.
11:06 pm
what i notice across all these different areas whether you're talking about accessibility of data, the federal government on the spending side, whether they're talking about the health care data, the veterans data i've talked about, there are all these common challenges and how do you get all this data together while maintaining privacy? so on the one hand, we have this ability to take all this data -- many different aspects of our lives, combine it so we can answer these important questions. again, just think about the veterans example, but how can we do that while maintaining privacy and also the perception of privacy? the american public is getting very concerned about information that the government has on them and what the private sector has on them. and we want to use this data on people for good, okay. so if i got data on veterans, for instance, i could combine
11:07 pm
that with the labor market outcomes. if i could look at their credit scores, i could look at how much debt they have, are they making their mortgage payments, for instance? i could better design veterans policies. i could better design programs while they're within the department of defense so that they have better outcomes when they leave the defense department. but to do that, i would have to combine data from lots of different sources, and our society, i think, is grappling with this big question about how do you do that while maintaining privacy and also just kind of this perception of privacy? people give a lot of their private data, their personal data to private companies, so facebook knows a lot about me. facebook provides me a service in return for that. but when it comes to the government doing this or even
11:08 pm
the private sector getting more and more data on us, there's this real fear. i think it's a balance. the more data we have, the better decisions we can make. but the more data we have, the heightened anxiety of people. so how do we have this conversation with folks in order to do this? so this is something that i'd like to really work on in some capacity at some point because, again, i really do think that if we can really leverage all the information out there, we can really move the social needle quite a bit. >> so if you have a question, raise your hand. we have a couple mikes. right there is the first one that i saw. >> hi, i'm from the organization of leading excellence. we go back to your d.o.m.s model. one of the things i noticed absent from that model was the very beginning divisioning or hypothesis making before the data is examined or collected. so to the extent that data analysis and data decision making is a science, to what extent does there have to be active manipulation in terms of experimentation to get the type of data you need to make the decisions you want and make sure
11:09 pm
those decisions are the right decisions? >> yeah, it's an excellent question. when it comes to hypothesis testing, that's what's really hard. and so when i was talking about having experts in the field, that's where you really, you know, need that expertise because, again, you get the big data sets, and statistically speaking you'll find lots of correlations, spurious patterns. there's this phrase we've heard which is correlation and causation and lots of examples of that. and i think what's going to happen then is you've seen this before in slow motion, it's entered a process that goes back and forth. we have data. you look at the data and say what hypotheses can i test with that data? what data do i want? and you collect that data. and as anyone who has done big data science before, wow, that was really cool. but then the number of questions begins to multiply even more than the data sets themselves. and so i think as a country we
11:10 pm
have to be more adept at saying, okay, this is the information we have. this is what we can glean from it. based on the hypotheses we can't answer what data should we be gathering? and so we have to make sure that the causality goes both ways from a hypothesis to data and data back to the hypotheses. >> thank you. >> this side of the room. way over there. run, hudson, run. >> hi. i think you had a lot of very good points. you said 91,000 local governments and the potential to impact every citizen in the country. usually when we talk about open data we talk the national or a national level, at very broad scales. in your opinion what are ways to make this part of the vernacular of the way every citizen or business thinks, acts, and the smallest local government to consider how they change their processes, their policies in terms of making data and open data analysis to improve local outcomes?
11:11 pm
>> so when it comes to local governments, this is fascinating. i think we have some groups here that present local governments. we traveled around and spoke to lots of different local government organizations and as one of the previous speakers said, some of the big cities, for instance, released data in a pretty good way, right, and there's a company or two here, one of your sponsors, they work a lot with kind of local governments in making their data accessible and i think what they are still in the early stage is what data do people want? my assistant and i went to chicago. they're at the vanguard at the local government level. and they have this active community of, what do you call them, warren? hackathons. they would have them once a month where they would bring in these people -- the city of chicago would say, here's our
11:12 pm
data, do something with it. i think what they found is sometimes there were data sets they were releasing, yeah, there's not much use for that. sometimes there are data sets that they found really interesting. and so, for instance, one example of local governments doing good local government stuff in chicago, you take a picture of a pothole, you send it in to the public works department of chicago. it is then posted on the department of chicago's website so then they are accountable for filling in that pothole in a timely manner because everybody knows when that pothole was posted onto the website. so i just thought that was like a great example of making information available that then made the government accountable to addressing these concerns of the citizenry. the challenge we face, as i mentioned, we have so many local governments. a lot of these local governments don't have the resources to do this type of stuff. when you talk about data science, making data open, analyzing, making sure it's out
11:13 pm
and what not, a lot of the local governments are just really hamstrung. if you look at the most recent recession, one of the biggest drags we had on our economy coming out of the recession was a state and local government sector. so employment just plummeted in the state and local sector because they were really hurt in part because of the housing crisis. the property values went down. their tax revenues went down. and so i think what's happening at the state and local
11:14 pm
government level is that you have some that are really kind of out, the vanguard, some that aren't. the ones that aren't, are often, you know, they aren't because they're hamstrung. so maybe what we really need here are kind of standards, suggested best practices, across all these local governments and now that we've been doing this pretty well for a couple years for some of these larger entities, they're beginning to learn what these best practices are. but we need more -- i've spoken to many of these local government agencies. they get this. the state and local government level, again, i think it's that human capital constraint they're really facing. >> ellen? in front. >> i thank you so much. i'm helena sims, director of intergovernmental relations for the association of government accountants. one of the questions that i have in the human capital aspect that came up in the last question is relevant to this. all this data stuff, as you mentioned, is taking place at the same time where people are getting frustrated with the cost of higher education, and at the same time we need people trained
11:15 pm
in this data stuff. what implications do you think that has for educational system in terms of -- what's the best way to educate people who are knowledgeable on the data stuff? >> one thing -- so i worked in the administration for six years, and one thing the administration really pushed on at the federal level is not a strong lever at all is the community college system. so if you look at community colleges across the country, i think they're just doing a better and better job, better working with local business to better match the skills of workers with those businesses. when you survey businesses, what you often see is that there's this skills mismatch issue. so businesses say we're not hiring because we can't find people with the right skills, right? and so how many millions of people are in the mismatched category? those estimates vary quite a bit. it's literally in the millions. when we're thinking about getting these skills to do the data stuff, we have to think, i think, outside the traditional kind of four-year college, you know, degree. and also this is the big concept
11:16 pm
society always pushes is lifetime learning. so how can you learn these skills? and so literally i'm teaching myself a data processing language. what i'm surprised at is there are any number of online courses that basically don't cost anything for me to learn this. now that requires a certain amount of dedication and also i happen to know a lot of people who know a lot about this stuff, so i'm pointed in the right direction. but fundamentally we know that the cost of higher education, the rate of inflation for the past couple of decades has far outstripped the rate of the economy since it's very high. but i think one of the big answers to your question would be kind of the community college system is just really huge. and then also you see even colleges, carnegie melon is a case in point, in your first year now you take a computer programming course. okay? they just view it as a way to think, right? this is something that everyone should be familiar with. when i was in high school a long
11:17 pm
time ago, i took four tran, for instance, but that was the exception. that wasn't the rule back then. so what can we do to teach kids today the things about coding? and, again, when i was undersecretary, you travel around quite a bit, it was a lot of fun. there are a lot of these camps where kids would learn java and then make an app, right? and it makes it really fun for kids because i think the way computer science used to be taught was done in a really nerdy way. it wasn't done in a very exclusive way. and you heard the same thing about math education as well. not only is it not where you get your education but it's also where, you know, how the stuff is taught, which i think is really fascinating as well. different people responded differently to different types of education. how things are taught. >> thank you. so that was a great question and will wrap up our keynote. thank you very much, dr. doms. [ applause ]
11:18 pm
pope francis's visit to the u.n. continues with his speech to the u.n. general assembly. his speech is at 10:45. and later he gives a service at 9/11 memorial. >> on the next "washington journal," pope francis's visit to new york, his second city on his u.s. trip. and then an interview with tom roberts of the national catholic reporter. washington journal is live every morning at 7:00 a.m. eastern on c-span. we welcome your calls and comments on facebook and
11:19 pm
twitter. the hope's visit to the united states continues saturday as he travels from new york to philadelphia. live coverage starts at 4:30 p.m. eastern as pope francis speaks at independence hall. and the pontiff presidential candidate lawrence lessig talks about his suggestion of running for president. and on c-span 2's book tv saturday night at 10 p.m., bill o'reilly talks about his book "killing reagan."
11:20 pm
and on sunday, doug casey discusses his latest book on economics. and on c-span3, we're live from gettysburg college to mark the 125th birthday of president dwight d. eisenhower's birth, discussing his military and political career with his grandchildren, susan and mary eisenhower and a documentary film on the king and queen of afghanistan's visit to the united states. get our complete weekend schedule at c-span.org. >> up next, congressman darrell issa, answers questions on how congress will handle the issues of data and transparency going forward. his remarks are about 50
11:21 pm
minutes. hello, everyone. thank you all for being here and braving the traffic in support of open and structured data. my name is jonathan elliott. you really don't want to hear me talk as much as you want to hear this man to my left talk, so i will be quick. research data group provides compliance services and software tools to public companies to help them communicate with investors and comply with regulations with greater ease. and we've been in this industry for nearly 30 years. we're excited to see all the changes that have taken place recently. our country is pushing forward with real changes to help our system be more effective and efficient. those words do not normally correspond with government but we are making the move in the right direction. the single most important change in the past ten years has been the passing of the data act, and we are very, very enthusiastic
11:22 pm
and executive member because we understand that opening up government data can help everyone in the country. not just idealistically but it is going to be in a very practical sense. the organization and linking of information that will help individual citizens, politicians, investors, institutions, and municipalities make better decisions in almost every aspect of what they do. the data act is the first open data law and many people have a hard time trying to understand just how large an undertaking the transformation from static documents to structured and searchable data is. our speaker for this panel is representative darrell issa. he is the champion of the data act, and understands what lies ahead. he knows that successful implementation of the data act cannot be achieved by one team, one person, or one government agency. it requires a concerted effort from so many agencies and individuals. he also knows that there's more
11:23 pm
to be done and more leadership required to truly transform the way our government reports its information and he's going to touch on the next steps here today. without further ado, representative darrell issa. >> thank you. [ applause ] >> with that kind of an introduction i should just take the applause and leave. first of all, thank you very much. the one name that you didn't mention that without partners on the hill, things don't happen. and my partner in the data act on the senate side was senator warner. and i think it's extremely important to understand he was the one that went to harry reid and then the majority leader and demanded that the bill be moved. and although senator reid insisted that it be a senate bill, ultimately it's the senate, you have to expect that. but ultimately we made law together. the data act is just as it was said, a major piece of legislation, but it's just the
11:24 pm
start. you can write legislation, but unless you oversee it and implement it and you're just diligent day after day, it will be meaningless. the fact is, today, there are many cios who still are, in fact, not competent, nor do they have the financial controls, the budget controls, of their projects. another major stumbling block. it doesn't mean there hasn't been law passed. it means that we, in fact, have to stay on top of that. and we have partners in that effort. i think most congressmen have one thing that they can do very, very well and that is, they can talk about their next piece of
11:25 pm
legislation. so i want to get that out of the way so there not be any mystery. the financial transparency act, obviously, law, but the next steps are to insist that we make all data in government just as good and just as available. and some of it is hard. just before i was coming up, they started saying, well what happens if you're taking a picture of a pothole, how do you make a picture of a pothole machine searchable? well, if you're using a camera that's modern, you are going to have the gps location. you are going to have the time and date. you are going to have rich metadata that if not lost, does make that unique location for that unique picture at that unique time extremely valuable and searchable. it may not tell you why it was taken, it may not tell you whether it's been fixed, but at least it's a start. i want to mention one thing that i have a passion for and that's
11:26 pm
modernizing foia. the data act is a standard that helps a tool and the freedom of information act, is today, in my opinion, a great success that is a fraction of what it was intended to be and it could be. every day, countless individuals, companies, news organizations, and law firms try to receive information. the first thing that happens it goes to a human being who begins a search process who then begins looking through the data in order to redact information that's not going to be given. literally a human nightmare to try to do. under the data act, we envision that metadata will be so easily searched, that when you're looking for it, you won't even have to ask, because the vast majority of information that is being asked for, will already be available on-line with
11:27 pm
appropriate personal identifiable information and other fields that have been predetermined as at least not available openly, being removed. so foia will be limited to i looked at the data, the data indicates something more and i believe i have a right to some portion of what is redacted. knowing what you're asking for and cutting down the number of foia requests because the majority of what you want is available, on-line, searchable and to be developed is a good start to making government open and transparent. i think one of the most important things i can do at my age is tell the young people in the room how we got here and why we shouldn't be here, but why it was logical somehow to get here. nearly 40 years ago, actually 40 years ago, plus, i ran my first computer program. well, actually i ran part of it
11:28 pm
until the card popped up showing me i had a flaw in my program. yeah, yeah, giggle in the back, you haven't actually held a stack of cards with three failures in it only one of which you get shown because you have to run it again with that corrected before you find the next mistake, line by line by line. in those days, we all understood that each card was simply more or less a 0 and a 1, that everything was purely data and we were turning it into something. over the next few years, we turned computer programs into devices that could be run for all the errors. you could bypass an error and find a next one. we also began printing out massive amount of ascii
11:29 pm
characters on printers. absolutely useless information unless you read it. behind that if you had an index you could find out anything you wanted to find out about the data you were building. at that moment, whether it was a dek or a digital corporation for those who came after it, or an ibm or an hp, or a myriad of other companies, many of whom ncr and so on are not here today, had we said, oh, we have the beginnings of metadata, we have what we index, let's store in those characters the index, we would have been fine. but we didn't do it. what we did was we went along with proprietary indexing, proprietary calls, little characters that were embedded with no standard.
11:30 pm
and many organizations over the next decades built standard after standard after standard that were well you could have as many standards as you want and everyone picked a different one. today we know that we can build standards that everyone can use or export to or make available and still maintain their proprietary calls. that's the future now with us. so one of the questions is, how do we get from a law to implementation. and there are really three components to it. one component very, very clearly is public demand. the public has to look at the benefits they get from open data. all of us who know and can know where our airplane flight is coming in, or even when you're on the airplane, find out where you are, are benefitting from data that's been made open for the application industry. all of us who have an app, we all have an app, come on, everyone in the room has a weather app somewhere, okay, and you only use it when you worry, but it is there all the time. again, data made open.
11:31 pm
but imagine if all the spending of government to all the vendors was made open and available as appropriate for nonclassified work. imagine how quickly we could find out that the government, through no fault of its own, paid ten different prices for the same product. and, in fact, may buy once from the company that manufactures it, once from a distributor and several times from retailers, and not even be aware that way they went out for contracts they did that. imagine how much savings we could have. let's also imagine a world in which government stops and i said there were three parts. government stops making that progress and goes back. what do we do about it? is it natural for vendors to say, yeah, the data act is great but it might hurt my particular revenue stream downstream so i'm
11:32 pm
not going to do it. unless the executive branch says no, we really mean it, we're not looking for open software, but we are looking for open data and we insist on it. and imagine, as government goes from one program, one person, one time, to another, that congress simply closes her eyes and says we passed that law, we're good. i think you can quickly imagine if congress takes its eyes off the oversight then the weeks, months, years and decades will go by and we can still have legacy programs and post-legacy programs and post-post-legacy programs such as those programs from the '60s that the irs claims they're still using with computers that are pretty well from the '60s. we can still have that. we can pay a huge price for it. under the data act, the office of management and budget has huge responsibility.
11:33 pm
treasury has a huge opportunity. but when i said that it's the public, the executive branch and congress, that have the primary responsibility, i should have said, it's the public that must demand, it's the public that must continue to demand, it's the public that must ask why not. because the only way to get the executive branch to stay on it, is for it to be important in a political sense. the only way for congress to stay on it, is for it to be meaningful at -- with organizations like this that are dedicated to it. and so i charge all of you, we passed a law, i intend for the rest of my career to stay on top of it, to the best of my ability every day, and to work with others, but you have an opportunity and many members of the coalition are doing it right now, every time you build an app or try to build an app, to take
11:34 pm
advantage of data that's being made available, you market to the public the benefit of rich data sets and your frustrations need to be communicated in three ways to the executive branch, to the major representatives here today, to your congress, one might say i'm here today, and that was a good line, i'm glad somebody liked that. i'm it, i'm the congress for today. and lastly, you have to communicate it to the public. do not go quietly into it's going to happen next week, next month, next year, it's not in the budget this year. if you have a success, market it to public and tell them it's because of open data. and if you're being thwarted or delayed, make sure you go just as public with it. because ultimately somewhere, there's some bureaucrat, bless their hearts, i always say bless their hearts when i don't mean it, and they are just a matter of weeks or months or maybe
11:35 pm
years from retirement, but they just don't want to have that challenge. well, the people that work for them and then come afterwards, want it. and so for all the young, energetic, government workers, who want to be thought of as the leading edge of good technology, make sure you go public if there's somebody there who is looking and saying, that will happen on the next person's watch because i only have three years left until retirement and that's too much hassle. now, you notice i didn't mention government contractors. i didn't do so because my assumption is, that contractors do what is important and what is put into the bids, so one of the areas that i'm working with other members of congress is to ensure that congress begins pushing the executive branch to make sure that it's in the bid and that there's a benefit in the bid. no subcontractor or contractor for the government or government agency directly should ever be working with data modernizing a
11:36 pm
program and not have part of their incentive to take us from where we've been to where we know we have to be. and that's going to be a monumental change, on time, on budget, of course is important, on time, on budget and saving the american people countless billions of dollars over the next decades, by opening up data, that's got to be in the bid. you won't see it the day the software is delivered, but you will see it for a generation to come. just want to make sure i didn't miss anything. in closing, we're just starting. in closing, this is the opening round for open data. there are companies that will take advantage of it and make fortunes.
11:37 pm
there are non-profits who will take advantage of it and embarrass people in the administration, not just this one, but the next one and the one after. and if i have my way, the same level of open data will be both great for the public and questionable for members of congress, as data gets opened in all the branches. so i want to just close by saying, this is a start, i'm delighted that you're here, that there is, in fact, a coalition that dedicates itself to the same thing that senator warner and i were honored to be able to start. so the conference is a delight to attend. i look forward to next year having a list of accomplishments because i believe that this presidency, which was promised to be the most open and transparent, does have an opportunity to show that it can open up government at the areas that are least understood and least transparent, and do it before the lights go off for this administration. and i think they will.
11:38 pm
i think they've set a course. i think they've appointed good people. now the question is, will we hold them to a timeline that's the same as the timeline of the president? because if any timeline is offered to you, this year, and it's one day after january 20th, 2017, then it's not a timeline, it's a dream. we don't need dreams. we don't need promises. we need what will you deliver before january 20th, 2017. thank you very much for all being here. [ applause ] >> we're going to take some questions from the audience. i have a couple questions here as well. i'll just start. we'll take yours last. who else has a question. >> no, go ahead. >> one of the requirements of the data act is that they have to get all of their not only
11:39 pm
what they're reporting but recipients have to report this information, grant recipients. what do you see as the biggest obstacle in achieving that and getting the grant recipients to report? >> that's a great question. the reason it's such a great question is grant recipients were the obstacle for this bill passing when it sat for two years. there were two reasons that there's an obstacle. one, i can understand, you're a university professor, getting a couple million dollars every so many years, and you're used to loosely living up to the grant, but perhaps, you know, hiring an administrative assistant here or there that only loosely work on the program. the data act is intended to really follow the money and see whether it is auditable as being
11:40 pm
spent appropriately to whatever the grant was for. and we think that's important and we think that those who shy away from it often do so because it's nice to get a pot of money and i don't want to say laws were broken or anything else, but the co-mingling and the moving around of grant money has gone on at universities since i was -- way -- until the ncr 500 and me in university. so that's one challenge. the other challenge is, that we in government, we that write grants, we that take applications, until the administration realizes that no one should have to enter data twice. that every entity, every unique entity, should have a number and once it has a number, it's personally identifiable information. it's single data base should be there so that just like most of us when we log in, we expect to
11:41 pm
log in and it doesn't matter whether it's the cloud with google or our local device, we want to log in, we want it to say hi, daryl, and we want it to have all kinds of information already there so we don't have to enter it twice. that -- and if you assume you're the university of california and information is automatically populated, it's not asking you endlessly to give it essentially the same information, but at the most, asking you to fact check what comes up, then you see a reason for this information delivered this way to be valuable. that is government's responsibility. live up to the dream that you shouldn't have to enter again and again and again even if it's a different agency, the exact same information and allow that information to be valuable to the universities overseeing in the case of university of california, thousands of grants. because we think that's a value to the grant recipient that they don't have today which is to oversee and organize its grants in an easy way from a federal data base. so one, we can't do anything about except insist compliance. two, we can be part of making it
11:42 pm
better for the grant recipient so they're more incentivized to support this. >> you also talked -- hudson is waving at me. he's got an important question. please, go ahead. >> hi. my name is mary anne. i'm the cto of a company called x version and full disclosure i'm a software developer, so -- >> is that a confession or what? >> yes, it is. >> if you told me you were a lawyer, that would be a confession. >> well, um, one of the problems that we're seeing kind of like developing on the edge of the open data movement is that often the people who are releasing the data don't really understand what a personal identification sort of piece of data actually looks like. so they pick on the obvious things like names, social security number, phone number, address, but there are a lot of like nonobvious things that especially as more and more data is released from more and more sources, people like me can take multiple data sets can run them together and figure out like
11:43 pm
who's who in the data set. you have things like things being hashed incorrectly which was a problem the city of new york had earlier this year, so my question for you, my biggest fear as an open data advocate this will create a political backlash at some point, what policies are being put in place by the law to help these agencies sort of like come up and educate themselves technically so that they're not releasing data that will bite them later on? >> that's a technical term, right? no, you have hit on one of the great challenges of metadata that's not properly defined. the federal government has an endless amount of history, how far we define what name is, social security name is, so on. programs have been written without, if you will, compliant
11:44 pm
metadata identifiers and that has to happen. and that's -- that is a matter of going into it and saying, here is the federal standard for all of this. can we -- can our data be searched based on it. the last thing you want to do is deal with it as though you had five spreadsheets written by five different people who named the top of every cell a different name with a different width with a different whatever. you don't want that and don't need that and shouldn't have it. the fact is that government agencies need to put their data in a format wheres there's a standard comparison. having said that, your challenge as a software developer is in the current world, yes, you need to be able to -- and the post office happens to have great program that almost works. now come on. if you take your data sets because you've entered, you know, name, address, zip code and so on, you've named it as
11:45 pm
well as you can, the post office -- and you give them the data, they have a wonderful program that actually corrects almost everything. it will change your abbreviation for avenue or street to make it compliant. it, of course, will add the zip plus. a lot of the fuzzy logic that it takes is pretty amazing to take really bad typos in data entry for names, addresses and the like for postal and make it right. software for the interim is going to have to do a lot of that with government data. you're going to, for the short term, be getting a lot of data that you're exactly right, somebody embedded the name, a second or a third time, without
11:46 pm
a field indicator, and it's going to take some cleanup in logic. that's one of the opportunities, if you will, for software companies, particularly if they're working with the federal government on data act compliance, is to scrub the existing data, apply appropriate identifiable metadata, so it doesn't have to be further scrubbed in the future. and i believe that, you know, although they'll have to be funding from congress, that those earmarks, if you will, those actions to get data so that you're not as -- you sounded a little like a lawyer when you said in fear of litigation, you shouldn't be in fear of litigation. the government does need to scrub and clean up their data so that doesn't happen. if the post office can be part of the solution in the case of the data that i had in outlook, the fact is, that you and companies like you shouldn't have to worry about that data being reasonably scrubbed. hudson, who else have you picked? >> hudson, we need a microphone down front. >> i think we've got -- there we go.
11:47 pm
>> thank you. >> hi. i'm jeff myers with rei systems. i first want to say i think the data act is fantastic. it provides a huge amount of valuable data. but i think of a particular use case. i would like to be able to look across the federal government and say where is all spending on the same program, where is all the spending on the same activity even if it happens in different agencies. where is the spending on the same mission and not just because i care about those spending, but because i want to figure out where there's duplication or a need to coordinate. my question is, will you or is there an interest or commitment to taking the data act further and saying, for example, right now, agencies are required to identify the program, but one agency might say, you know, it's water quality audits and another might say water quality safety and another might say it's water quality research, will there be an opportunity to take the data act further to further use cases like the ones i've described? >> the answer is, yes, and if we get cooperation from the administration, we shouldn't need a new law.
11:48 pm
the office of management and budget in setting, if you will, or requiring that the -- all the agencies set a common standard, can do a lot of this. years ago, i had a simple task. everything is simple until you get the bill for it. all i wanted to know was, how many jet aircraft and prop, but mostly jet, does the government own? what models are they, how long have they been around, and who controls them? and this happened to be shortly after 9/11 so i was a junior member. they kind of laughed at me. as the years went on i kept asking it. it's amazing, even the department of defense has a whole joint group with multiple officers trying to figure out where all their aircraft are and who's controlling them and what they do. that's a pretty simple thing. i mean, these -- you spend a few million dollars each, there's only so many let's call it a thousand noncombat -- nonfighter
11:49 pm
aircraft, it shouldn't be that hard to figure it out. i can tell you, if you wanted that information today, it would cost you -- it would cost the government a fortuneo get it to you because the coast guard has their aircraft, and this group has one and so on. you're exactly right. interoperable standards where if it's the same thing it's named the same way, a similar thing, it has a number that is identifiable, or similar, so you can not only find exact matches but when there's a characteristic difference there's a unique metadata identifier. that's what we're getting to, is a standard setting for what you call something if it's identical and what you call something when there's a difference. omb has a responsibility to build, if you will, interagency cooperation, to get that. candidly, d.o.d. hasn't gotten
11:50 pm
there year and we're hoping that they will be among the first because if i've got a caterpillar d9 tractor in the army, navy, air force, marines, and i need a part, and that part is somewhere in the world under some agency, the last thing in the world i want is to have that asset down while somebody is waiting to buy something we already own and the other one is heading to property disposal. but that happens today. it costs us billions fro. hudson, who else? if you run out of questions, i have a second speech. >> my name is tony. i'm a consultant with deloitte. i've heard you talk a little bit about the data act supporting this concept of establishing a common language. i think it's a very powerful concept and i wanted to get your perspective on how that might
11:51 pm
impact congress's ability to support its function in its capacity to represent the people? >> okay. i'm going to answer the question as i interpret it. i think i heard you say that, you know, what will happen if the data act is fully implemented to where all data can be made meaningful by a common program searching multiple data bases, if you will, a little like google going out to every newspaper and seeing what they all wrote about their congressman. the questions that the american people have on whether their money is well spent, whether they're properly represented, whether the waiting time at a particular veterans administration center is based on an actual shortfall of doctors or, in fact, an inefficiency within the hospital, questions like that,
11:52 pm
if the -- if our democracy, our republic is a representative democracy, but the ability of every individual to know more before they ask a question of their congressman or for their congressman or woman to be able to get the information directly rather than months or years later, can have a dramatic effect. i would love nothing more than for my constituents who are waiting for va service, to know what the wait times are at every hospital, what the ratios are between the number of doctors and the amount of care, and be able to say, you know, my hospital in brunswick, ohio, is underperforming and as a result, i'm waiting longer and not getting care. what's wrong? that kind of a question is so much more powerful than i'm waiting what can you do for me and we try to get them into the
11:53 pm
hospital faster. so i see it as empowering to members. i do see it as producing a lot more constituent requests, but those will be very targeted requests because a lot of the information will already be information will already be either gleaned by the constituent or easily gotten by a case worker at a computer
11:54 pm
weeks or months from an answer so i see that as part of it. much of this will only happen if the software industry supporting tools. i have no illusions that, you know, leonard wright, one of my there's no way that she's going those data bases, but that developed and made available if searchable and open google i don't care, in a way right i just want it available to my workers. >> hi. my name is darla and i'm with >> with what? >> terra data. >> yes. >> and one of the things that when you're implementing law, a lot of the heavy lifting happens at the agency level, and as we is talk with agencies a lot of them have different sentiments about the implementation of the data act itself, particularly being able to get value via analysis and analytics. i've worked with data from the usa spending and it is currently as a testament to the need for standards in the data act. my question for you is, do agencies have the capacity to fully implement the act? and the reason that i ask that is, because some of them in casual conversation i hear, not
11:55 pm
that -- not conversations i've had myself, that some of them don't believe they have access to some of the data to be compliant with the act and so what would you say about that capacity issue? >> okay. i apologize, the agency you work with, i missed hearing that. >> i've worked for agencies before, but i don't currently work for one. >> okay. >> i work for terra data. >> okay. >> no, but i mean you said some agencies didn't think they had the ability. i wondered if you wanted to name one. you know, it's old habit from my days. the answer to a question like that quite frankly is a leadership question from the white house through omb. i agree that people at a given level may think they don't have the ability or may validly not have the ability and that's the
11:56 pm
reason that if the office management budget leads on behalf of the president, what they're going to very clearly do is say, we need a plan from every agency. how will you fully comply? what are your road blocks? what is your funding estimate? what are your short term -- and it should be short-term easy low-hanging fruit. what portion of your data is already in a format that can be easily made available? what portion of your data isn't? what guidance do you need for setting metadata standards? all those questions, some of them have been asked by omb and i don't want to short count the administration's willingness to do some of this, but you're right about one thing, it is typical for an agency to say, oh, another mandate from congress, it's unfunded because in their mind that -- whatever money they got wasn't for that.
11:57 pm
and i don't expect movement unless this process goes forward where agencies are directed to produce plans, they're given guidance, and in my estimation, when you look at the security exchange commission, who in many ways is ahead of it, but in many ways -- i can't say reconstitute -- i can never pronounce the word that says they don't want to do it -- but, you know, quite frankly some of the agencies, the fdic and others, are actually very far along toward having their data in the right format and very far along toward not providing it in some cases. so this is where the president's leadership is important. you need to both shed light on the agencies that are ready to
11:58 pm
go, have a lot of it, allow them to lead in the best practices determination in helping other agencies understand what they need to do, and at the same time, you need to say and the law as written has to be implemented and if you think we need to change something let me know because that's why we have thousands of people who are called legislative, you know, presidential legislative appointees. there's an army of people appointed by the president that are supposed to be working on legislative challenges, and we will meet with them at any time if they say they need a follow on to the data act. chairman chaffetzs will meet with them any time if they need a follow on. until they tell us what they don't have and what they need, my assumption is it's a lack of leadership and when you hear that from an agency, the only question i would ask you to say is, what have you done to find out what your capabilities are, what they're not and how you're going to get from here to there. you've been told where you have
11:59 pm
to end up. do you already know where you are. have you done an assessment of where you are or do you just say we can't do it? i would propose to all of you that if an agency can type a few keys and most can, at some level, and get a piece of information in a proper format where they know what the personally identifiable information are, what the
12:00 am
available information are, and when they get done with their system building that index and delivering them either on a screen or a piece of paper, the 25 entities that match all in nice rows and they export it to excel, if you can do that in an agency, then you already have everything except metadata

8 Views

info Stream Only

Uploaded by TV Archive on